Sample records for model-based descriptor consistent

  1. The implementation of aerial object recognition algorithm based on contour descriptor in FPGA-based on-board vision system

    NASA Astrophysics Data System (ADS)

    Babayan, Pavel; Smirnov, Sergey; Strotov, Valery

    2017-10-01

    This paper describes the aerial object recognition algorithm for on-board and stationary vision system. Suggested algorithm is intended to recognize the objects of a specific kind using the set of the reference objects defined by 3D models. The proposed algorithm based on the outer contour descriptor building. The algorithm consists of two stages: learning and recognition. Learning stage is devoted to the exploring of reference objects. Using 3D models we can build the database containing training images by rendering the 3D model from viewpoints evenly distributed on a sphere. Sphere points distribution is made by the geosphere principle. Gathered training image set is used for calculating descriptors, which will be used in the recognition stage of the algorithm. The recognition stage is focusing on estimating the similarity of the captured object and the reference objects by matching an observed image descriptor and the reference object descriptors. The experimental research was performed using a set of the models of the aircraft of the different types (airplanes, helicopters, UAVs). The proposed orientation estimation algorithm showed good accuracy in all case studies. The real-time performance of the algorithm in FPGA-based vision system was demonstrated.

  2. A 3D model retrieval approach based on Bayesian networks lightfield descriptor

    NASA Astrophysics Data System (ADS)

    Xiao, Qinhan; Li, Yanjun

    2009-12-01

    A new 3D model retrieval methodology is proposed by exploiting a novel Bayesian networks lightfield descriptor (BNLD). There are two key novelties in our approach: (1) a BN-based method for building lightfield descriptor; and (2) a 3D model retrieval scheme based on the proposed BNLD. To overcome the disadvantages of the existing 3D model retrieval methods, we explore BN for building a new lightfield descriptor. Firstly, 3D model is put into lightfield, about 300 binary-views can be obtained along a sphere, then Fourier descriptors and Zernike moments descriptors can be calculated out from binaryviews. Then shape feature sequence would be learned into a BN model based on BN learning algorithm; Secondly, we propose a new 3D model retrieval method by calculating Kullback-Leibler Divergence (KLD) between BNLDs. Beneficial from the statistical learning, our BNLD is noise robustness as compared to the existing methods. The comparison between our method and the lightfield descriptor-based approach is conducted to demonstrate the effectiveness of our proposed methodology.

  3. Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

    PubMed Central

    2013-01-01

    Background While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. Results The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. Conclusions While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side. PMID:24059743

  4. The implementation of contour-based object orientation estimation algorithm in FPGA-based on-board vision system

    NASA Astrophysics Data System (ADS)

    Alpatov, Boris; Babayan, Pavel; Ershov, Maksim; Strotov, Valery

    2016-10-01

    This paper describes the implementation of the orientation estimation algorithm in FPGA-based vision system. An approach to estimate an orientation of objects lacking axial symmetry is proposed. Suggested algorithm is intended to estimate orientation of a specific known 3D object based on object 3D model. The proposed orientation estimation algorithm consists of two stages: learning and estimation. Learning stage is devoted to the exploring of studied object. Using 3D model we can gather set of training images by capturing 3D model from viewpoints evenly distributed on a sphere. Sphere points distribution is made by the geosphere principle. Gathered training image set is used for calculating descriptors, which will be used in the estimation stage of the algorithm. The estimation stage is focusing on matching process between an observed image descriptor and the training image descriptors. The experimental research was performed using a set of images of Airbus A380. The proposed orientation estimation algorithm showed good accuracy in all case studies. The real-time performance of the algorithm in FPGA-based vision system was demonstrated.

  5. Development of structure-activity relationship for metal oxide nanoparticles

    NASA Astrophysics Data System (ADS)

    Liu, Rong; Zhang, Hai Yuan; Ji, Zhao Xia; Rallo, Robert; Xia, Tian; Chang, Chong Hyun; Nel, Andre; Cohen, Yoram

    2013-05-01

    Nanomaterial structure-activity relationships (nano-SARs) for metal oxide nanoparticles (NPs) toxicity were investigated using metrics based on dose-response analysis and consensus self-organizing map clustering. The NP cellular toxicity dataset included toxicity profiles consisting of seven different assays for human bronchial epithelial (BEAS-2B) and murine myeloid (RAW 264.7) cells, over a concentration range of 0.39-100 mg L-1 and exposure time up to 24 h, for twenty-four different metal oxide NPs. Various nano-SAR building models were evaluated, based on an initial pool of thirty NP descriptors. The conduction band energy and ionic index (often correlated with the hydration enthalpy) were identified as suitable NP descriptors that are consistent with suggested toxicity mechanisms for metal oxide NPs and metal ions. The best performing nano-SAR with the above two descriptors, built with support vector machine (SVM) model and of validated robustness, had a balanced classification accuracy of ~94%. An applicability domain for the present data was established with a reasonable confidence level of 80%. Given the potential role of nano-SARs in decision making, regarding the environmental impact of NPs, the class probabilities provided by the SVM nano-SAR enabled the construction of decision boundaries with respect to toxicity classification under different acceptance levels of false negative relative to false positive predictions.Nanomaterial structure-activity relationships (nano-SARs) for metal oxide nanoparticles (NPs) toxicity were investigated using metrics based on dose-response analysis and consensus self-organizing map clustering. The NP cellular toxicity dataset included toxicity profiles consisting of seven different assays for human bronchial epithelial (BEAS-2B) and murine myeloid (RAW 264.7) cells, over a concentration range of 0.39-100 mg L-1 and exposure time up to 24 h, for twenty-four different metal oxide NPs. Various nano-SAR building models were evaluated, based on an initial pool of thirty NP descriptors. The conduction band energy and ionic index (often correlated with the hydration enthalpy) were identified as suitable NP descriptors that are consistent with suggested toxicity mechanisms for metal oxide NPs and metal ions. The best performing nano-SAR with the above two descriptors, built with support vector machine (SVM) model and of validated robustness, had a balanced classification accuracy of ~94%. An applicability domain for the present data was established with a reasonable confidence level of 80%. Given the potential role of nano-SARs in decision making, regarding the environmental impact of NPs, the class probabilities provided by the SVM nano-SAR enabled the construction of decision boundaries with respect to toxicity classification under different acceptance levels of false negative relative to false positive predictions. Electronic supplementary information (ESI) available. See DOI: 10.1039/c3nr01533e

  6. QSPR using MOLGEN-QSPR: the challenge of fluoroalkane boiling points.

    PubMed

    Rücker, Christoph; Meringer, Markus; Kerber, Adalbert

    2005-01-01

    By means of the new software MOLGEN-QSPR, a multilinear regression model for the boiling points of lower fluoroalkanes is established. The model is based exclusively on simple descriptors derived directly from molecular structure and nevertheless describes a broader set of data more precisely than previous attempts that used either more demanding (quantum chemical) descriptors or more demanding (nonlinear) statistical methods such as neural networks. The model's internal consistency was confirmed by leave-one-out cross-validation. The model was used to predict all unknown boiling points of fluorobutanes, and the quality of predictions was estimated by means of comparison with boiling point predictions for fluoropentanes.

  7. Developing a CD-CBM Anticipatory Approach for Cavitation - Defining a Model Descriptor Consistent Between Processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Allgood, G.O.; Dress, W.B.; Kercel, S.W.

    1999-05-10

    A major problem with cavitation in pumps and other hydraulic devices is that there is no effective method for detecting or predicting its inception. The traditional approach is to declare the pump in cavitation when the total head pressure drops by some arbitrary value (typically 3o/0) in response to a reduction in pump inlet pressure. However, the pump is already cavitating at this point. A method is needed in which cavitation events are captured as they occur and characterized by their process dynamics. The object of this research was to identify specific features of cavitation that could be used asmore » a model-based descriptor in a context-dependent condition-based maintenance (CD-CBM) anticipatory prognostic and health assessment model. This descriptor was based on the physics of the phenomena, capturing the salient features of the process dynamics. An important element of this concept is the development and formulation of the extended process feature vector @) or model vector. Thk model-based descriptor encodes the specific information that describes the phenomena and its dynamics and is formulated as a data structure consisting of several elements. The first is a descriptive model abstracting the phenomena. The second is the parameter list associated with the functional model. The third is a figure of merit, a single number between [0,1] representing a confidence factor that the functional model and parameter list actually describes the observed data. Using this as a basis and applying it to the cavitation problem, any given location in a flow loop will have this data structure, differing in value but not content. The extended process feature vector is formulated as follows: E`> [ , {parameter Iist}, confidence factor]. (1) For this study, the model that characterized cavitation was a chirped-exponentially decaying sinusoid. Using the parameters defined by this model, the parameter list included frequency, decay, and chirp rate. Based on this, the process feature vector has the form: @=> [, {01 = a, ~= b, ~ = c}, cf = 0.80]. (2) In this experiment a reversible catastrophe was examined. The reason for this is that the same catastrophe could be repeated to ensure the statistical significance of the data.« less

  8. New molecular descriptors based on local properties at the molecular surface and a boiling-point model derived from them.

    PubMed

    Ehresmann, Bernd; de Groot, Marcel J; Alex, Alexander; Clark, Timothy

    2004-01-01

    New molecular descriptors based on statistical descriptions of the local ionization potential, local electron affinity, and the local polarizability at the surface of the molecule are proposed. The significance of these descriptors has been tested by calculating them for the Maybridge database in addition to our set of 26 descriptors reported previously. The new descriptors show little correlation with those already in use. Furthermore, the principal components of the extended set of descriptors for the Maybridge data show that especially the descriptors based on the local electron affinity extend the variance in our set of descriptors, which we have previously shown to be relevant to physical properties. The first nine principal components are shown to be most significant. As an example of the usefulness of the new descriptors, we have set up a QSPR model for boiling points using both the old and new descriptors.

  9. Improved Prediction of Blood-Brain Barrier Permeability Through Machine Learning with Combined Use of Molecular Property-Based Descriptors and Fingerprints.

    PubMed

    Yuan, Yaxia; Zheng, Fang; Zhan, Chang-Guo

    2018-03-21

    Blood-brain barrier (BBB) permeability of a compound determines whether the compound can effectively enter the brain. It is an essential property which must be accounted for in drug discovery with a target in the brain. Several computational methods have been used to predict the BBB permeability. In particular, support vector machine (SVM), which is a kernel-based machine learning method, has been used popularly in this field. For SVM training and prediction, the compounds are characterized by molecular descriptors. Some SVM models were based on the use of molecular property-based descriptors (including 1D, 2D, and 3D descriptors) or fragment-based descriptors (known as the fingerprints of a molecule). The selection of descriptors is critical for the performance of a SVM model. In this study, we aimed to develop a generally applicable new SVM model by combining all of the features of the molecular property-based descriptors and fingerprints to improve the accuracy for the BBB permeability prediction. The results indicate that our SVM model has improved accuracy compared to the currently available models of the BBB permeability prediction.

  10. Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and molecular fingerprints.

    PubMed

    Vogt, Martin; Bajorath, Jürgen

    2008-01-01

    Bayesian classifiers are increasingly being used to distinguish active from inactive compounds and search large databases for novel active molecules. We introduce an approach to directly combine the contributions of property descriptors and molecular fingerprints in the search for active compounds that is based on a Bayesian framework. Conventionally, property descriptors and fingerprints are used as alternative features for virtual screening methods. Following the approach introduced here, probability distributions of descriptor values and fingerprint bit settings are calculated for active and database molecules and the divergence between the resulting combined distributions is determined as a measure of biological activity. In test calculations on a large number of compound activity classes, this methodology was found to consistently perform better than similarity searching using fingerprints and multiple reference compounds or Bayesian screening calculations using probability distributions calculated only from property descriptors. These findings demonstrate that there is considerable synergy between different types of property descriptors and fingerprints in recognizing diverse structure-activity relationships, at least in the context of Bayesian modeling.

  11. Registration algorithm of point clouds based on multiscale normal features

    NASA Astrophysics Data System (ADS)

    Lu, Jun; Peng, Zhongtao; Su, Hang; Xia, GuiHua

    2015-01-01

    The point cloud registration technology for obtaining a three-dimensional digital model is widely applied in many areas. To improve the accuracy and speed of point cloud registration, a registration method based on multiscale normal vectors is proposed. The proposed registration method mainly includes three parts: the selection of key points, the calculation of feature descriptors, and the determining and optimization of correspondences. First, key points are selected from the point cloud based on the changes of magnitude of multiscale curvatures obtained by using principal components analysis. Then the feature descriptor of each key point is proposed, which consists of 21 elements based on multiscale normal vectors and curvatures. The correspondences in a pair of two point clouds are determined according to the descriptor's similarity of key points in the source point cloud and target point cloud. Correspondences are optimized by using a random sampling consistency algorithm and clustering technology. Finally, singular value decomposition is applied to optimized correspondences so that the rigid transformation matrix between two point clouds is obtained. Experimental results show that the proposed point cloud registration algorithm has a faster calculation speed, higher registration accuracy, and better antinoise performance.

  12. Developing a CD-CBM Anticipatory Approach for Cavitation - Defining a Model-Based Descriptor Consistent Across Processes, Phase 1 Final Report Context-Dependent Prognostics and Health Assessment: A New Paradigm for Condition-based Maintenance SBIR Topic No. N98-114

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Allgood, G.O.; Dress, W.B.; Kercel, S.W.

    1999-06-01

    The objective of this research, and subsequent testing, was to identify specific features of cavitation that could be used as a model-based descriptor in a context-dependent condition-based maintenance (CD-CBM) anticipatory prognostic and health assessment model. This descriptor is based on the physics of the phenomena, capturing the salient features of the process dynamics. The test methodology and approach were developed to make the cavitation features the dominant effect in the process and collected signatures. This would allow the accurate characterization of the salient cavitation features at different operational states. By developing such an abstraction, these attributes can be used asmore » a general diagnostic for a system or any of its components. In this study, the particular focus will be pumps. As many as 90% of pump failures are catastrophic. They seem to be operating normally and fail abruptly without warning. This is true whether the failure is sudden hardware damage requiring repair, such as a gasket failure, or a transition into an undesired operating mode, such as cavitation. This means that conventional diagnostic methods fail to predict 90% of incipient failures and that in addressing this problem, model-based methods can add value where it is actually needed.« less

  13. Determination of descriptors for polycyclic aromatic hydrocarbons and related compounds by chromatographic methods and liquid-liquid partition in totally organic biphasic systems.

    PubMed

    Ariyasena, Thiloka C; Poole, Colin F

    2014-09-26

    Retention factors on several columns and at various temperatures using gas chromatography and from reversed-phase liquid chromatography on a SunFire C18 column with various mobile phase compositions containing acetonitrile, methanol and tetrahydrofuran as strength adjusting solvents are combined with liquid-liquid partition coefficients in totally organic biphasic systems to calculate descriptors for 23 polycyclic aromatic hydrocarbons and eighteen related compounds of environmental interest. The use of a consistent protocol for the above measurements provides descriptors that are more self consistent for the estimation of physicochemical properties (octanol-water, air-octanol, air-water, aqueous solubility, and subcooled liquid vapor pressure). The descriptor in this report tend to have smaller values for the L and E descriptors and random differences in the B and S descriptors compared with literature sources. A simple atom fragment constant model is proposed for the estimation of descriptors from structure for polycyclic aromatic hydrocarbons. The new descriptors show no bias in the prediction of the air-water partition coefficient for polycyclic aromatic hydrocarbons unlike the literature values. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. Development of an online, publicly accessible naive Bayesian decision support tool for mammographic mass lesions based on the American College of Radiology (ACR) BI-RADS lexicon.

    PubMed

    Benndorf, Matthias; Kotter, Elmar; Langer, Mathias; Herda, Christoph; Wu, Yirong; Burnside, Elizabeth S

    2015-06-01

    To develop and validate a decision support tool for mammographic mass lesions based on a standardized descriptor terminology (BI-RADS lexicon) to reduce variability of practice. We used separate training data (1,276 lesions, 138 malignant) and validation data (1,177 lesions, 175 malignant). We created naïve Bayes (NB) classifiers from the training data with tenfold cross-validation. Our "inclusive model" comprised BI-RADS categories, BI-RADS descriptors, and age as predictive variables; our "descriptor model" comprised BI-RADS descriptors and age. The resulting NB classifiers were applied to the validation data. We evaluated and compared classifier performance with ROC-analysis. In the training data, the inclusive model yields an AUC of 0.959; the descriptor model yields an AUC of 0.910 (P < 0.001). The inclusive model is superior to the clinical performance (BI-RADS categories alone, P < 0.001); the descriptor model performs similarly. When applied to the validation data, the inclusive model yields an AUC of 0.935; the descriptor model yields an AUC of 0.876 (P < 0.001). Again, the inclusive model is superior to the clinical performance (P < 0.001); the descriptor model performs similarly. We consider our classifier a step towards a more uniform interpretation of combinations of BI-RADS descriptors. We provide our classifier at www.ebm-radiology.com/nbmm/index.html . • We provide a decision support tool for mammographic masses at www.ebm-radiology.com/nbmm/index.html . • Our tool may reduce variability of practice in BI-RADS category assignment. • A formal analysis of BI-RADS descriptors may enhance radiologists' diagnostic performance.

  15. Systems Biological Approach of Molecular Descriptors Connectivity: Optimal Descriptors for Oral Bioavailability Prediction

    PubMed Central

    Ahmed, Shiek S. S. J.; Ramakrishnan, V.

    2012-01-01

    Background Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. Results The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/−bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. Conclusion The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability. PMID:22815781

  16. Systems biological approach of molecular descriptors connectivity: optimal descriptors for oral bioavailability prediction.

    PubMed

    Ahmed, Shiek S S J; Ramakrishnan, V

    2012-01-01

    Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/-bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability.

  17. QSPR study of polychlorinated diphenyl ethers by molecular electronegativity distance vector (MEDV-4).

    PubMed

    Sun, Lili; Zhou, Liping; Yu, Yu; Lan, Yukun; Li, Zhiliang

    2007-01-01

    Polychlorinated diphenyl ethers (PCDEs) have received more and more concerns as a group of ubiquitous potential persistent organic pollutants (POPs). By using molecular electronegativity distance vector (MEDV-4), multiple linear regression (MLR) models are developed for sub-cooled liquid vapor pressures (P(L)), n-octanol/water partition coefficients (K(OW)) and sub-cooled liquid water solubilities (S(W,L)) of 209 PCDEs and diphenyl ether. The correlation coefficients (R) and the leave-one-out cross-validation (LOO) correlation coefficients (R(CV)) of all the 6-descriptor models for logP(L), logK(OW) and logS(W,L) are more than 0.98. By using stepwise multiple regression (SMR), the descriptors are selected and the resulting models are 5-descriptor model for logP(L), 4-descriptor model for logK(OW), and 6-descriptor model for logS(W,L), respectively. All these models exhibit excellent estimate capabilities for internal sample set and good predictive capabilities for external samples set. The consistency between observed and estimated/predicted values for logP(L) is the best (R=0.996, R(CV)=0.996), followed by logK(OW) (R=0.992, R(CV)=0.992) and logS(W,L) (R=0.983, R(CV)=0.980). By using MEDV-4 descriptors, the QSPR models can be used for prediction and the model predictions can hence extend the current database of experimental values.

  18. Contour-based object orientation estimation

    NASA Astrophysics Data System (ADS)

    Alpatov, Boris; Babayan, Pavel

    2016-04-01

    Real-time object orientation estimation is an actual problem of computer vision nowadays. In this paper we propose an approach to estimate an orientation of objects lacking axial symmetry. Proposed algorithm is intended to estimate orientation of a specific known 3D object, so 3D model is required for learning. The proposed orientation estimation algorithm consists of 2 stages: learning and estimation. Learning stage is devoted to the exploring of studied object. Using 3D model we can gather set of training images by capturing 3D model from viewpoints evenly distributed on a sphere. Sphere points distribution is made by the geosphere principle. It minimizes the training image set. Gathered training image set is used for calculating descriptors, which will be used in the estimation stage of the algorithm. The estimation stage is focusing on matching process between an observed image descriptor and the training image descriptors. The experimental research was performed using a set of images of Airbus A380. The proposed orientation estimation algorithm showed good accuracy (mean error value less than 6°) in all case studies. The real-time performance of the algorithm was also demonstrated.

  19. Renal Function Descriptors in Neonates: Which Creatinine-Based Formula Best Describes Vancomycin Clearance?

    PubMed

    Bhongsatiern, Jiraganya; Stockmann, Chris; Yu, Tian; Constance, Jonathan E; Moorthy, Ganesh; Spigarelli, Michael G; Desai, Pankaj B; Sherwin, Catherine M T

    2016-05-01

    Growth and maturational changes have been identified as significant covariates in describing variability in clearance of renally excreted drugs such as vancomycin. Because of immaturity of clearance mechanisms, quantification of renal function in neonates is of importance. Several serum creatinine (SCr)-based renal function descriptors have been developed in adults and children, but none are selectively derived for neonates. This review summarizes development of the neonatal kidney and discusses assessment of the renal function regarding estimation of glomerular filtration rate using renal function descriptors. Furthermore, identification of the renal function descriptors that best describe the variability of vancomycin clearance was performed in a sample study of a septic neonatal cohort. Population pharmacokinetic models were developed applying a combination of age-weight, renal function descriptors, or SCr alone. In addition to age and weight, SCr or renal function descriptors significantly reduced variability of vancomycin clearance. The population pharmacokinetic models with Léger and modified Schwartz formulas were selected as the optimal final models, although the other renal function descriptors and SCr provided reasonably good fit to the data, suggesting further evaluation of the final models using external data sets and cross validation. The present study supports incorporation of renal function descriptors in the estimation of vancomycin clearance in neonates. © 2015, The American College of Clinical Pharmacology.

  20. Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

    PubMed

    Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

    2010-06-30

    QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but also allows for analyzing the effect descriptors have on the statistical model's performance. The presented Bioclipse plugins equip scientists with graphical tools that make QSAR-ML easily accessible for the community.

  1. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    PubMed Central

    2010-01-01

    Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but also allows for analyzing the effect descriptors have on the statistical model's performance. The presented Bioclipse plugins equip scientists with graphical tools that make QSAR-ML easily accessible for the community. PMID:20591161

  2. An effective content-based image retrieval technique for image visuals representation based on the bag-of-visual-words model

    PubMed Central

    Jabeen, Safia; Mehmood, Zahid; Mahmood, Toqeer; Saba, Tanzila; Rehman, Amjad; Mahmood, Muhammad Tariq

    2018-01-01

    For the last three decades, content-based image retrieval (CBIR) has been an active research area, representing a viable solution for retrieving similar images from an image repository. In this article, we propose a novel CBIR technique based on the visual words fusion of speeded-up robust features (SURF) and fast retina keypoint (FREAK) feature descriptors. SURF is a sparse descriptor whereas FREAK is a dense descriptor. Moreover, SURF is a scale and rotation-invariant descriptor that performs better in the case of repeatability, distinctiveness, and robustness. It is robust to noise, detection errors, geometric, and photometric deformations. It also performs better at low illumination within an image as compared to the FREAK descriptor. In contrast, FREAK is a retina-inspired speedy descriptor that performs better for classification-based problems as compared to the SURF descriptor. Experimental results show that the proposed technique based on the visual words fusion of SURF-FREAK descriptors combines the features of both descriptors and resolves the aforementioned issues. The qualitative and quantitative analysis performed on three image collections, namely Corel-1000, Corel-1500, and Caltech-256, shows that proposed technique based on visual words fusion significantly improved the performance of the CBIR as compared to the feature fusion of both descriptors and state-of-the-art image retrieval techniques. PMID:29694429

  3. An effective content-based image retrieval technique for image visuals representation based on the bag-of-visual-words model.

    PubMed

    Jabeen, Safia; Mehmood, Zahid; Mahmood, Toqeer; Saba, Tanzila; Rehman, Amjad; Mahmood, Muhammad Tariq

    2018-01-01

    For the last three decades, content-based image retrieval (CBIR) has been an active research area, representing a viable solution for retrieving similar images from an image repository. In this article, we propose a novel CBIR technique based on the visual words fusion of speeded-up robust features (SURF) and fast retina keypoint (FREAK) feature descriptors. SURF is a sparse descriptor whereas FREAK is a dense descriptor. Moreover, SURF is a scale and rotation-invariant descriptor that performs better in the case of repeatability, distinctiveness, and robustness. It is robust to noise, detection errors, geometric, and photometric deformations. It also performs better at low illumination within an image as compared to the FREAK descriptor. In contrast, FREAK is a retina-inspired speedy descriptor that performs better for classification-based problems as compared to the SURF descriptor. Experimental results show that the proposed technique based on the visual words fusion of SURF-FREAK descriptors combines the features of both descriptors and resolves the aforementioned issues. The qualitative and quantitative analysis performed on three image collections, namely Corel-1000, Corel-1500, and Caltech-256, shows that proposed technique based on visual words fusion significantly improved the performance of the CBIR as compared to the feature fusion of both descriptors and state-of-the-art image retrieval techniques.

  4. Structural protein descriptors in 1-dimension and their sequence-based predictions.

    PubMed

    Kurgan, Lukasz; Disfani, Fatemeh Miri

    2011-09-01

    The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.

  5. Quantitative structure-retention relationships of polycyclic aromatic hydrocarbons gas-chromatographic retention indices.

    PubMed

    Drosos, Juan Carlos; Viola-Rhenals, Maricela; Vivas-Reyes, Ricardo

    2010-06-25

    Polycyclic aromatic compounds (PAHs) are of concern in environmental chemistry and toxicology. In the present work, a QSRR study was performed for 209 previously reported PAHs using quantum mechanics and other sources descriptors estimated by different approaches. The B3LYP/6-31G* level of theory was used for geometrical optimization and quantum mechanics related variables. A good linear relationship between gas-chromatographic retention index and electronic or topologic descriptors was found by stepwise linear regression analysis. The molecular polarizability (alpha) and the second order molecular connectivity Kier and Hall index ((2)chi) showed evidence of significant correlation with retention index by means of important squared coefficient of determination, (R(2)), values (R(2)=0.950 and 0.962, respectively). A one variable QSRR model is presented for each descriptor and both models demonstrates a significant predictive capacity established using the leave-many-out LMO (excluding 25% of rows) cross validation method's q(2) cross-validation coefficients q(2)(CV-LMO25%), (obtained q(2)(CV-LMO25%) 0.947 and 0.960, respectively). Furthermore, the physicochemical interpretation of selected descriptors allowed detailed explanation of the source of the observed statistical correlation. The model analysis suggests that only one descriptor is sufficient to establish a consistent retention index-structure relationship. Moderate or non-significant improve was observed for quantitative results or statistical validation parameters when introducing more terms in predictive equation. The one parameter QSRR proposed model offers a consistent scheme to predict chromatographic properties of PAHs compounds. Copyright 2010 Elsevier B.V. All rights reserved.

  6. Bio-activity of aminosulfonyl ureas in the light of nucleic acid bases and DNA base pair interaction.

    PubMed

    Mondal Roy, Sutapa

    2018-08-01

    The quantum chemical descriptors based on density functional theory (DFT) are applied to predict the biological activity (log IC 50 ) of one class of acyl-CoA: cholesterol O-acyltransferase (ACAT) inhibitors, viz. aminosulfonyl ureas. ACAT are very effective agents for reduction of triglyceride and cholesterol levels in human body. Successful two parameter quantitative structure-activity relationship (QSAR) models are developed with a combination of relevant global and local DFT based descriptors for prediction of biological activity of aminosulfonyl ureas. The global descriptors, electron affinity of the ACAT inhibitors (EA) and/or charge transfer (ΔN) between inhibitors and model biosystems (NA bases and DNA base pairs) along with the local group atomic charge on sulfonyl moiety (∑Q Sul ) of the inhibitors reveals more than 90% efficacy of the selected descriptors for predicting the experimental log (IC 50 ) values. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Multiscale Region-Level VHR Image Change Detection via Sparse Change Descriptor and Robust Discriminative Dictionary Learning

    PubMed Central

    Xu, Yuan; Ding, Kun; Huo, Chunlei; Zhong, Zisha; Li, Haichang; Pan, Chunhong

    2015-01-01

    Very high resolution (VHR) image change detection is challenging due to the low discriminative ability of change feature and the difficulty of change decision in utilizing the multilevel contextual information. Most change feature extraction techniques put emphasis on the change degree description (i.e., in what degree the changes have happened), while they ignore the change pattern description (i.e., how the changes changed), which is of equal importance in characterizing the change signatures. Moreover, the simultaneous consideration of the classification robust to the registration noise and the multiscale region-consistent fusion is often neglected in change decision. To overcome such drawbacks, in this paper, a novel VHR image change detection method is proposed based on sparse change descriptor and robust discriminative dictionary learning. Sparse change descriptor combines the change degree component and the change pattern component, which are encoded by the sparse representation error and the morphological profile feature, respectively. Robust change decision is conducted by multiscale region-consistent fusion, which is implemented by the superpixel-level cosparse representation with robust discriminative dictionary and the conditional random field model. Experimental results confirm the effectiveness of the proposed change detection technique. PMID:25918748

  8. SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling.

    PubMed

    Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin

    2013-03-01

    Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Integrative Approaches for Predicting in vivo Effects of Chemicals from their Structural Descriptors and the Results of Short-term Biological Assays

    PubMed Central

    Low, Yen S.; Sedykh, Alexander; Rusyn, Ivan; Tropsha, Alexander

    2017-01-01

    Cheminformatics approaches such as Quantitative Structure Activity Relationship (QSAR) modeling have been used traditionally for predicting chemical toxicity. In recent years, high throughput biological assays have been increasingly employed to elucidate mechanisms of chemical toxicity and predict toxic effects of chemicals in vivo. The data generated in such assays can be considered as biological descriptors of chemicals that can be combined with molecular descriptors and employed in QSAR modeling to improve the accuracy of toxicity prediction. In this review, we discuss several approaches for integrating chemical and biological data for predicting biological effects of chemicals in vivo and compare their performance across several data sets. We conclude that while no method consistently shows superior performance, the integrative approaches rank consistently among the best yet offer enriched interpretation of models over those built with either chemical or biological data alone. We discuss the outlook for such interdisciplinary methods and offer recommendations to further improve the accuracy and interpretability of computational models that predict chemical toxicity. PMID:24805064

  10. Influence of Texture and Colour in Breast TMA Classification

    PubMed Central

    Fernández-Carrobles, M. Milagro; Bueno, Gloria; Déniz, Oscar; Salido, Jesús; García-Rojo, Marcial; González-López, Lucía

    2015-01-01

    Breast cancer diagnosis is still done by observation of biopsies under the microscope. The development of automated methods for breast TMA classification would reduce diagnostic time. This paper is a step towards the solution for this problem and shows a complete study of breast TMA classification based on colour models and texture descriptors. The TMA images were divided into four classes: i) benign stromal tissue with cellularity, ii) adipose tissue, iii) benign and benign anomalous structures, and iv) ductal and lobular carcinomas. A relevant set of features was obtained on eight different colour models from first and second order Haralick statistical descriptors obtained from the intensity image, Fourier, Wavelets, Multiresolution Gabor, M-LBP and textons descriptors. Furthermore, four types of classification experiments were performed using six different classifiers: (1) classification per colour model individually, (2) classification by combination of colour models, (3) classification by combination of colour models and descriptors, and (4) classification by combination of colour models and descriptors with a previous feature set reduction. The best result shows an average of 99.05% accuracy and 98.34% positive predictive value. These results have been obtained by means of a bagging tree classifier with combination of six colour models and the use of 1719 non-correlated (correlation threshold of 97%) textural features based on Statistical, M-LBP, Gabor and Spatial textons descriptors. PMID:26513238

  11. ADME evaluation in drug discovery. 1. Applications of genetic algorithms to the prediction of blood-brain partitioning of a large set of drugs.

    PubMed

    Hou, Tingjun; Xu, Xiaojie

    2002-12-01

    In this study, the relationships between the brain-blood concentration ratio of 96 structurally diverse compounds with a large number of structurally derived descriptors were investigated. The linear models were based on molecular descriptors that can be calculated for any compound simply from a knowledge of its molecular structure. The linear correlation coefficients of the models were optimized by genetic algorithms (GAs), and the descriptors used in the linear models were automatically selected from 27 structurally derived descriptors. The GA optimizations resulted in a group of linear models with three or four molecular descriptors with good statistical significance. The change of descriptor use as the evolution proceeds demonstrates that the octane/water partition coefficient and the partial negative solvent-accessible surface area multiplied by the negative charge are crucial to brain-blood barrier permeability. Moreover, we found that the predictions using multiple QSPR models from GA optimization gave quite good results in spite of the diversity of structures, which was better than the predictions using the best single model. The predictions for the two external sets with 37 diverse compounds using multiple QSPR models indicate that the best linear models with four descriptors are sufficiently effective for predictive use. Considering the ease of computation of the descriptors, the linear models may be used as general utilities to screen the blood-brain barrier partitioning of drugs in a high-throughput fashion.

  12. Prediction of Mutagenicity of Chemicals from Their Calculated Molecular Descriptors: A Case Study with Structurally Homogeneous versus Diverse Datasets.

    PubMed

    Basak, Subhash C; Majumdar, Subhabrata

    2015-01-01

    Variation in high-dimensional data is often caused by a few latent factors, and hence dimension reduction or variable selection techniques are often useful in gathering useful information from the data. In this paper we consider two such recent methods: Interrelated two-way clustering and envelope models. We couple these methods with traditional statistical procedures like ridge regression and linear discriminant analysis, and apply them on two data sets which have more predictors than samples (i.e. n < p scenario) and several types of molecular descriptors. One of these datasets consists of a congeneric group of Amines while the other has a much diverse collection compounds. The difference of prediction results between these two datasets for both the methods supports the hypothesis that for a congeneric set of compounds, descriptors of a certain type are enough to provide good QSAR models, but as the data set grows diverse including a variety of descriptors can improve model quality considerably.

  13. Rapid prediction of chemical metabolism by human UDP-glucuronosyltransferase isoforms using quantum chemical descriptors derived with the electronegativity equalization method.

    PubMed

    Sorich, Michael J; McKinnon, Ross A; Miners, John O; Winkler, David A; Smith, Paul A

    2004-10-07

    This study aimed to evaluate in silico models based on quantum chemical (QC) descriptors derived using the electronegativity equalization method (EEM) and to assess the use of QC properties to predict chemical metabolism by human UDP-glucuronosyltransferase (UGT) isoforms. Various EEM-derived QC molecular descriptors were calculated for known UGT substrates and nonsubstrates. Classification models were developed using support vector machine and partial least squares discriminant analysis. In general, the most predictive models were generated with the support vector machine. Combining QC and 2D descriptors (from previous work) using a consensus approach resulted in a statistically significant improvement in predictivity (to 84%) over both the QC and 2D models and the other methods of combining the descriptors. EEM-derived QC descriptors were shown to be both highly predictive and computationally efficient. It is likely that EEM-derived QC properties will be generally useful for predicting ADMET and physicochemical properties during drug discovery.

  14. Toxicity prediction of ionic liquids based on Daphnia magna by using density functional theory

    NASA Astrophysics Data System (ADS)

    Nu’aim, M. N.; Bustam, M. A.

    2018-04-01

    By using a model called density functional theory, the toxicity of ionic liquids can be predicted and forecast. It is a theory that allowing the researcher to have a substantial tool for computation of the quantum state of atoms, molecules and solids, and molecular dynamics which also known as computer simulation method. It can be done by using structural feature based quantum chemical reactivity descriptor. The identification of ionic liquids and its Log[EC50] data are from literature data that available in Ismail Hossain thesis entitled “Synthesis, Characterization and Quantitative Structure Toxicity Relationship of Imidazolium, Pyridinium and Ammonium Based Ionic Liquids”. Each cation and anion of the ionic liquids were optimized and calculated. The geometry optimization and calculation from the software, produce the value of highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO). From the value of HOMO and LUMO, the value for other toxicity descriptors were obtained according to their formulas. The toxicity descriptor that involves are electrophilicity index, HOMO, LUMO, energy gap, chemical potential, hardness and electronegativity. The interrelation between the descriptors are being determined by using a multiple linear regression (MLR). From this MLR, all descriptors being analyzed and the descriptors that are significant were chosen. In order to develop the finest model equation for toxicity prediction of ionic liquids, the selected descriptors that are significant were used. The validation of model equation was performed with the Log[EC50] data from the literature and the final model equation was developed. A bigger range of ionic liquids which nearly 108 of ionic liquids can be predicted from this model equation.

  15. Predicting allergic contact dermatitis: a hierarchical structure activity relationship (SAR) approach to chemical classification using topological and quantum chemical descriptors

    NASA Astrophysics Data System (ADS)

    Basak, Subhash C.; Mills, Denise; Hawkins, Douglas M.

    2008-06-01

    A hierarchical classification study was carried out based on a set of 70 chemicals—35 which produce allergic contact dermatitis (ACD) and 35 which do not. This approach was implemented using a regular ridge regression computer code, followed by conversion of regression output to binary data values. The hierarchical descriptor classes used in the modeling include topostructural (TS), topochemical (TC), and quantum chemical (QC), all of which are based solely on chemical structure. The concordance, sensitivity, and specificity are reported. The model based on the TC descriptors was found to be the best, while the TS model was extremely poor.

  16. The effects of geometric uncertainties on computational modelling of knee biomechanics

    NASA Astrophysics Data System (ADS)

    Meng, Qingen; Fisher, John; Wilcox, Ruth

    2017-08-01

    The geometry of the articular components of the knee is an important factor in predicting joint mechanics in computational models. There are a number of uncertainties in the definition of the geometry of cartilage and meniscus, and evaluating the effects of these uncertainties is fundamental to understanding the level of reliability of the models. In this study, the sensitivity of knee mechanics to geometric uncertainties was investigated by comparing polynomial-based and image-based knee models and varying the size of meniscus. The results suggested that the geometric uncertainties in cartilage and meniscus resulting from the resolution of MRI and the accuracy of segmentation caused considerable effects on the predicted knee mechanics. Moreover, even if the mathematical geometric descriptors can be very close to the imaged-based articular surfaces, the detailed contact pressure distribution produced by the mathematical geometric descriptors was not the same as that of the image-based model. However, the trends predicted by the models based on mathematical geometric descriptors were similar to those of the imaged-based models.

  17. QSPR models for half-wave reduction potential of steroids: a comparative study between feature selection and feature extraction from subsets of or entire set of descriptors.

    PubMed

    Hemmateenejad, Bahram; Yazdani, Mahdieh

    2009-02-16

    Steroids are widely distributed in nature and are found in plants, animals, and fungi in abundance. A data set consists of a diverse set of steroids have been used to develop quantitative structure-electrochemistry relationship (QSER) models for their half-wave reduction potential. Modeling was established by means of multiple linear regression (MLR) and principle component regression (PCR) analyses. In MLR analysis, the QSPR models were constructed by first grouping descriptors and then stepwise selection of variables from each group (MLR1) and stepwise selection of predictor variables from the pool of all calculated descriptors (MLR2). Similar procedure was used in PCR analysis so that the principal components (or features) were extracted from different group of descriptors (PCR1) and from entire set of descriptors (PCR2). The resulted models were evaluated using cross-validation, chance correlation, application to prediction reduction potential of some test samples and accessing applicability domain. Both MLR approaches represented accurate results however the QSPR model found by MLR1 was statistically more significant. PCR1 approach produced a model as accurate as MLR approaches whereas less accurate results were obtained by PCR2 approach. In overall, the correlation coefficients of cross-validation and prediction of the QSPR models resulted from MLR1, MLR2 and PCR1 approaches were higher than 90%, which show the high ability of the models to predict reduction potential of the studied steroids.

  18. Object Tracking Using Adaptive Covariance Descriptor and Clustering-Based Model Updating for Visual Surveillance

    PubMed Central

    Qin, Lei; Snoussi, Hichem; Abdallah, Fahed

    2014-01-01

    We propose a novel approach for tracking an arbitrary object in video sequences for visual surveillance. The first contribution of this work is an automatic feature extraction method that is able to extract compact discriminative features from a feature pool before computing the region covariance descriptor. As the feature extraction method is adaptive to a specific object of interest, we refer to the region covariance descriptor computed using the extracted features as the adaptive covariance descriptor. The second contribution is to propose a weakly supervised method for updating the object appearance model during tracking. The method performs a mean-shift clustering procedure among the tracking result samples accumulated during a period of time and selects a group of reliable samples for updating the object appearance model. As such, the object appearance model is kept up-to-date and is prevented from contamination even in case of tracking mistakes. We conducted comparing experiments on real-world video sequences, which confirmed the effectiveness of the proposed approaches. The tracking system that integrates the adaptive covariance descriptor and the clustering-based model updating method accomplished stable object tracking on challenging video sequences. PMID:24865883

  19. The Development of Novel Chemical Fragment-Based Descriptors Using Frequent Common Subgraph Mining Approach and Their Application in QSAR Modeling.

    PubMed

    Khashan, Raed; Zheng, Weifan; Tropsha, Alexander

    2014-03-01

    We present a novel approach to generating fragment-based molecular descriptors. The molecules are represented by labeled undirected chemical graph. Fast Frequent Subgraph Mining (FFSM) is used to find chemical-fragments (subgraphs) that occur in at least a subset of all molecules in a dataset. The collection of frequent subgraphs (FSG) forms a dataset-specific descriptors whose values for each molecule are defined by the number of times each frequent fragment occurs in this molecule. We have employed the FSG descriptors to develop variable selection k Nearest Neighbor (kNN) QSAR models of several datasets with binary target property including Maximum Recommended Therapeutic Dose (MRTD), Salmonella Mutagenicity (Ames Genotoxicity), and P-Glycoprotein (PGP) data. Each dataset was divided into training, test, and validation sets to establish the statistical figures of merit reflecting the model validated predictive power. The classification accuracies of models for both training and test sets for all datasets exceeded 75 %, and the accuracy for the external validation sets exceeded 72 %. The model accuracies were comparable or better than those reported earlier in the literature for the same datasets. Furthermore, the use of fragment-based descriptors affords mechanistic interpretation of validated QSAR models in terms of essential chemical fragments responsible for the compounds' target property. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Categorical QSAR models for skin sensitization based on local lymph node assay measures and both ground and excited state 4D-fingerprint descriptors

    NASA Astrophysics Data System (ADS)

    Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Santos-Filho, Osvaldo A.; Esposito, Emilio X.; Hopfinger, Anton J.; Tseng, Yufeng J.

    2008-06-01

    In previous studies we have developed categorical QSAR models for predicting skin-sensitization potency based on 4D-fingerprint (4D-FP) descriptors and in vivo murine local lymph node assay (LLNA) measures. Only 4D-FP derived from the ground state (GMAX) structures of the molecules were used to build the QSAR models. In this study we have generated 4D-FP descriptors from the first excited state (EMAX) structures of the molecules. The GMAX, EMAX and the combined ground and excited state 4D-FP descriptors (GEMAX) were employed in building categorical QSAR models. Logistic regression (LR) and partial least square coupled logistic regression (PLS-CLR), found to be effective model building for the LLNA skin-sensitization measures in our previous studies, were used again in this study. This also permitted comparison of the prior ground state models to those involving first excited state 4D-FP descriptors. Three types of categorical QSAR models were constructed for each of the GMAX, EMAX and GEMAX datasets: a binary model (2-state), an ordinal model (3-state) and a binary-binary model (two-2-state). No significant differences exist among the LR 2-state model constructed for each of the three datasets. However, the PLS-CLR 3-state and 2-state models based on the EMAX and GEMAX datasets have higher predictivity than those constructed using only the GMAX dataset. These EMAX and GMAX categorical models are also more significant and predictive than corresponding models built in our previous QSAR studies of LLNA skin-sensitization measures.

  1. Analysis of A Drug Target-based Classification System using Molecular Descriptors.

    PubMed

    Lu, Jing; Zhang, Pin; Bi, Yi; Luo, Xiaomin

    2016-01-01

    Drug-target interaction is an important topic in drug discovery and drug repositioning. KEGG database offers a drug annotation and classification using a target-based classification system. In this study, we gave an investigation on five target-based classes: (I) G protein-coupled receptors; (II) Nuclear receptors; (III) Ion channels; (IV) Enzymes; (V) Pathogens, using molecular descriptors to represent each drug compound. Two popular feature selection methods, maximum relevance minimum redundancy and incremental feature selection, were adopted to extract the important descriptors. Meanwhile, an optimal prediction model based on nearest neighbor algorithm was constructed, which got the best result in identifying drug target-based classes. Finally, some key descriptors were discussed to uncover their important roles in the identification of drug-target classes.

  2. Image-Based Airborne LiDAR Point Cloud Encoding for 3d Building Model Retrieval

    NASA Astrophysics Data System (ADS)

    Chen, Yi-Chen; Lin, Chao-Hung

    2016-06-01

    With the development of Web 2.0 and cyber city modeling, an increasing number of 3D models have been available on web-based model-sharing platforms with many applications such as navigation, urban planning, and virtual reality. Based on the concept of data reuse, a 3D model retrieval system is proposed to retrieve building models similar to a user-specified query. The basic idea behind this system is to reuse these existing 3D building models instead of reconstruction from point clouds. To efficiently retrieve models, the models in databases are compactly encoded by using a shape descriptor generally. However, most of the geometric descriptors in related works are applied to polygonal models. In this study, the input query of the model retrieval system is a point cloud acquired by Light Detection and Ranging (LiDAR) systems because of the efficient scene scanning and spatial information collection. Using Point clouds with sparse, noisy, and incomplete sampling as input queries is more difficult than that by using 3D models. Because that the building roof is more informative than other parts in the airborne LiDAR point cloud, an image-based approach is proposed to encode both point clouds from input queries and 3D models in databases. The main goal of data encoding is that the models in the database and input point clouds can be consistently encoded. Firstly, top-view depth images of buildings are generated to represent the geometry surface of a building roof. Secondly, geometric features are extracted from depth images based on height, edge and plane of building. Finally, descriptors can be extracted by spatial histograms and used in 3D model retrieval system. For data retrieval, the models are retrieved by matching the encoding coefficients of point clouds and building models. In experiments, a database including about 900,000 3D models collected from the Internet is used for evaluation of data retrieval. The results of the proposed method show a clear superiority over related methods.

  3. Prediction of blood-brain partitioning: a model based on molecular electronegativity distance vector descriptors.

    PubMed

    Zhang, Yong-Hong; Xia, Zhi-Ning; Qin, Li-Tang; Liu, Shu-Shen

    2010-09-01

    The objective of this paper is to build a reliable model based on the molecular electronegativity distance vector (MEDV) descriptors for predicting the blood-brain barrier (BBB) permeability and to reveal the effects of the molecular structural segments on the BBB permeability. Using 70 structurally diverse compounds, the partial least squares regression (PLSR) models between the BBB permeability and the MEDV descriptors were developed and validated by the variable selection and modeling based on prediction (VSMP) technique. The estimation ability, stability, and predictive power of a model are evaluated by the estimated correlation coefficient (r), leave-one-out (LOO) cross-validation correlation coefficient (q), and predictive correlation coefficient (R(p)). It has been found that PLSR model has good quality, r=0.9202, q=0.7956, and R(p)=0.6649 for M1 model based on the training set of 57 samples. To search the most important structural factors affecting the BBB permeability of compounds, we performed the values of the variable importance in projection (VIP) analysis for MEDV descriptors. It was found that some structural fragments in compounds, such as -CH(3), -CH(2)-, =CH-, =C, triple bond C-, -CH<, =C<, =N-, -NH-, =O, and -OH, are the most important factors affecting the BBB permeability. (c) 2010. Published by Elsevier Inc.

  4. Model based on GRID-derived descriptors for estimating CYP3A4 enzyme stability of potential drug candidates

    NASA Astrophysics Data System (ADS)

    Crivori, Patrizia; Zamora, Ismael; Speed, Bill; Orrenius, Christian; Poggesi, Italo

    2004-03-01

    A number of computational approaches are being proposed for an early optimization of ADME (absorption, distribution, metabolism and excretion) properties to increase the success rate in drug discovery. The present study describes the development of an in silico model able to estimate, from the three-dimensional structure of a molecule, the stability of a compound with respect to the human cytochrome P450 (CYP) 3A4 enzyme activity. Stability data were obtained by measuring the amount of unchanged compound remaining after a standardized incubation with human cDNA-expressed CYP3A4. The computational method transforms the three-dimensional molecular interaction fields (MIFs) generated from the molecular structure into descriptors (VolSurf and Almond procedures). The descriptors were correlated to the experimental metabolic stability classes by a partial least squares discriminant procedure. The model was trained using a set of 1800 compounds from the Pharmacia collection and was validated using two test sets: the first one including 825 compounds from the Pharmacia collection and the second one consisting of 20 known drugs. This model correctly predicted 75% of the first and 85% of the second test set and showed a precision above 86% to correctly select metabolically stable compounds. The model appears a valuable tool in the design of virtual libraries to bias the selection toward more stable compounds. Abbreviations: ADME - absorption, distribution, metabolism and excretion; CYP - cytochrome P450; MIFs - molecular interaction fields; HTS - high throughput screening; DDI - drug-drug interactions; 3D - three-dimensional; PCA - principal components analysis; CPCA - consensus principal components analysis; PLS - partial least squares; PLSD - partial least squares discriminant; GRIND - grid independent descriptors; GRID - software originally created and developed by Professor Peter Goodford.

  5. Language and the pain experience.

    PubMed

    Wilson, Dianne; Williams, Marie; Butler, David

    2009-03-01

    People in persistent pain have been reported to pay increased attention to specific words or descriptors of pain. The amount of attention paid to pain or cues for pain (such as pain descriptors), has been shown to be a major factor in the modulation of persistent pain. This relationship suggests the possibility that language may have a role both in understanding and managing the persistent pain experience. The aim of this paper is to describe current models of neuromatrices for pain and language, consider the role of attention in persistent pain states and highlight discrepancies, in previous studies based on the McGill Pain Questionnaire (MPQ), of the role of attention on pain descriptors. The existence of a pain neuromatrix originally proposed by Melzack (1990) has been supported by emerging technologies. Similar technologies have recently allowed identification of multiple areas of involvement for the processing of auditory input and the construction of language. As with the construction of pain, this neuromatrix for speech and language may intersect with neural systems for broader cognitive functions such as attention, memory and emotion. A systematic search was undertaken to identify experimental or review studies, which specifically investigated the role of attention on pain descriptors (as cues for pain) in persistent pain patients. A total of 99 articles were retrieved from six databases, with 66 articles meeting the inclusion criteria. After duplicated articles were eliminated, the remaining 41 articles were reviewed in order to support a link between persistent pain, pain descriptors and attention. This review revealed a diverse range of specific pain descriptors, the majority of which were derived from the MPQ. Increased attention to pain descriptors was consistently reported to be associated with emotional state as well as being a significant factor in maintaining persistent pain. However, attempts to investigate the attentional bias of specific pain descriptors highlighted discrepancies between the studies. As well as the diversity of pain descriptors used in studies, they were inconsistently categorized into domains of pain. A lack of consistent bias towards certain pain descriptors was observed, and may be explained simply by the fact that the words provided are not those which subjects themselves would use. These findings suggest that the multidimensional and individual nature of the persistent pain experience may not be adequately explained by pain questionnaires such as the MPQ. Personalized pain descriptors may communicate the pain experience more appropriately, but may also contribute to an increased sensitivity of cortical pain processing areas by capturing increased attention for that individual. The language used as part of communication between therapists and people with persistent pain may provide an, as yet, unexplored adjunct strategy in management. Copyright (c) 2008 John Wiley & Sons, Ltd.

  6. Quantitative structure-activity relationship study of antioxidative peptide by using different sets of amino acids descriptors

    NASA Astrophysics Data System (ADS)

    Li, Yao-Wang; Li, Bo; He, Jiguo; Qian, Ping

    2011-07-01

    A database consisting of 214 tripeptides which contain either His or Tyr residue was applied to study quantitative structure-activity relationships (QSAR) of antioxidative tripeptides. Partial Least-Squares Regression analysis (PLSR) was conducted using parameters individually of each amino acid descriptor, including Divided Physico-chemical Property Scores (DPPS), Hydrophobic, Electronic, Steric, and Hydrogen (HESH), Vectors of Hydrophobic, Steric, and Electronic properties (VHSE), Molecular Surface-Weighted Holistic Invariant Molecular (MS-WHIM), isotropic surface area-electronic charge index (ISA-ECI) and Z-scale, to describe antioxidative tripeptides as X-variables and antioxidant activities measured with ferric thiocyanate methods were as Y-variable. After elimination of outliers by Hotelling's T 2 method and residual analysis, six significant models were obtained describing the entire data set. According to cumulative squared multiple correlation coefficients ( R2), cumulative cross-validation coefficients ( Q2) and relative standard deviation for calibration set (RSD c), the qualities of models using DPPS, HESH, ISA-ECI, and VHSE descriptors are better ( R2 > 0.6, Q2 > 0.5, RSD c < 0.39) than that of models using MS-WHIM and Z-scale descriptors ( R2 < 0.6, Q2 < 0.5, RSD c > 0.44). Furthermore, the predictive ability of models using DPPS descriptor is best among the six descriptors systems (cumulative multiple correlation coefficient for predict set ( Rext2) > 0.7). It was concluded that the DPPS is better to describe the amino acid of antioxidative tripeptides. The results of DPPS descriptor reveal that the importance of the center amino acid and the N-terminal amino acid are far more than the importance of the C-terminal amino acid for antioxidative tripeptides. The hydrophobic (positively to activity) and electronic (negatively to activity) properties of the N-terminal amino acid are suggested to play the most important significance to activity, followed by the hydrogen bond (positively to activity) of the center amino acid. The N-terminal amino acid should be a high hydrophobic and low electronic amino acid (such as Ala, Gly, Val, and Leu); the center amino acid would be an amino acid that possesses high hydrogen bond property (such as base amino acid Arg, Lys, and His). The structural characteristics of antioxidative peptide be found in this paper may contribute to the further research of antioxidative mechanism.

  7. 3D-QSAR studies of some reversible Acetyl cholinesterase inhibitors based on CoMFA and ligand protein interaction fingerprints using PC-LS-SVM and PLS-LS-SVM.

    PubMed

    Ghafouri, Hamidreza; Ranjbar, Mohsen; Sakhteman, Amirhossein

    2017-08-01

    A great challenge in medicinal chemistry is to develop different methods for structural design based on the pattern of the previously synthesized compounds. In this study two different QSAR methods were established and compared for a series of piperidine acetylcholinesterase inhibitors. In one novel approach, PC-LS-SVM and PLS-LS-SVM was used for modeling 3D interaction descriptors, and in the other method the same nonlinear techniques were used to build QSAR equations based on field descriptors. Different validation methods were used to evaluate the models and the results revealed the more applicability and predictive ability of the model generated by field descriptors (Q 2 LOO-CV =1, R 2 ext =0.97). External validation criteria revealed that both methods can be used in generating reasonable QSAR models. It was concluded that due to ability of interaction descriptors in prediction of binding mode, using this approach can be implemented in future 3D-QSAR softwares. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Branch length similarity entropy-based descriptors for shape representation

    NASA Astrophysics Data System (ADS)

    Kwon, Ohsung; Lee, Sang-Hee

    2017-11-01

    In previous studies, we showed that the branch length similarity (BLS) entropy profile could be successfully used for the shape recognition such as battle tanks, facial expressions, and butterflies. In the present study, we proposed new descriptors, roundness, symmetry, and surface roughness, for the recognition, which are more accurate and fast in the computation than the previous descriptors. The roundness represents how closely a shape resembles to a circle, the symmetry characterizes how much one shape is similar with another when the shape is moved in flip, and the surface roughness quantifies the degree of vertical deviations of a shape boundary. To evaluate the performance of the descriptors, we used the database of leaf images with 12 species. Each species consisted of 10 - 20 leaf images and the total number of images were 160. The evaluation showed that the new descriptors successfully discriminated the leaf species. We believe that the descriptors can be a useful tool in the field of pattern recognition.

  9. An Analysis of Descriptors of Volatile Organic Compounds and Their Impact on Rate Constant for Reaction with Hydroxyl Radicals

    DTIC Science & Technology

    2018-05-01

    the descriptors were correlated to experimental rate constants. The five descriptors fell into one of two categories: whole molecule descriptors or...model based on these correlations . Although that goal was not achieved in full, considerable progress has been made, and there is potential for a...readme.txt) and compiled. We then searched for correlations between the calculated properties from theory and the experimental measurements of reaction rate

  10. The effects of geometric uncertainties on computational modelling of knee biomechanics

    PubMed Central

    Fisher, John; Wilcox, Ruth

    2017-01-01

    The geometry of the articular components of the knee is an important factor in predicting joint mechanics in computational models. There are a number of uncertainties in the definition of the geometry of cartilage and meniscus, and evaluating the effects of these uncertainties is fundamental to understanding the level of reliability of the models. In this study, the sensitivity of knee mechanics to geometric uncertainties was investigated by comparing polynomial-based and image-based knee models and varying the size of meniscus. The results suggested that the geometric uncertainties in cartilage and meniscus resulting from the resolution of MRI and the accuracy of segmentation caused considerable effects on the predicted knee mechanics. Moreover, even if the mathematical geometric descriptors can be very close to the imaged-based articular surfaces, the detailed contact pressure distribution produced by the mathematical geometric descriptors was not the same as that of the image-based model. However, the trends predicted by the models based on mathematical geometric descriptors were similar to those of the imaged-based models. PMID:28879008

  11. Deep Learning for Lowtextured Image Matching

    NASA Astrophysics Data System (ADS)

    Kniaz, V. V.; Fedorenko, V. V.; Fomin, N. A.

    2018-05-01

    Low-textured objects pose challenges for an automatic 3D model reconstruction. Such objects are common in archeological applications of photogrammetry. Most of the common feature point descriptors fail to match local patches in featureless regions of an object. Hence, automatic documentation of the archeological process using Structure from Motion (SfM) methods is challenging. Nevertheless, such documentation is possible with the aid of a human operator. Deep learning-based descriptors have outperformed most of common feature point descriptors recently. This paper is focused on the development of a new Wide Image Zone Adaptive Robust feature Descriptor (WIZARD) based on the deep learning. We use a convolutional auto-encoder to compress discriminative features of a local path into a descriptor code. We build a codebook to perform point matching on multiple images. The matching is performed using the nearest neighbor search and a modified voting algorithm. We present a new "Multi-view Amphora" (Amphora) dataset for evaluation of point matching algorithms. The dataset includes images of an Ancient Greek vase found at Taman Peninsula in Southern Russia. The dataset provides color images, a ground truth 3D model, and a ground truth optical flow. We evaluated the WIZARD descriptor on the "Amphora" dataset to show that it outperforms the SIFT and SURF descriptors on the complex patch pairs.

  12. A novel model for DNA sequence similarity analysis based on graph theory.

    PubMed

    Qi, Xingqin; Wu, Qin; Zhang, Yusen; Fuller, Eddie; Zhang, Cun-Quan

    2011-01-01

    Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.

  13. Reproducibility of the NEPTUNE descriptor-based scoring system on whole-slide images and histologic and ultrastructural digital images.

    PubMed

    Barisoni, Laura; Troost, Jonathan P; Nast, Cynthia; Bagnasco, Serena; Avila-Casado, Carmen; Hodgin, Jeffrey; Palmer, Matthew; Rosenberg, Avi; Gasim, Adil; Liensziewski, Chrysta; Merlino, Lino; Chien, Hui-Ping; Chang, Anthony; Meehan, Shane M; Gaut, Joseph; Song, Peter; Holzman, Lawrence; Gibson, Debbie; Kretzler, Matthias; Gillespie, Brenda W; Hewitt, Stephen M

    2016-07-01

    The multicenter Nephrotic Syndrome Study Network (NEPTUNE) digital pathology scoring system employs a novel and comprehensive methodology to document pathologic features from whole-slide images, immunofluorescence and ultrastructural digital images. To estimate inter- and intra-reader concordance of this descriptor-based approach, data from 12 pathologists (eight NEPTUNE and four non-NEPTUNE) with experience from training to 30 years were collected. A descriptor reference manual was generated and a webinar-based protocol for consensus/cross-training implemented. Intra-reader concordance for 51 glomerular descriptors was evaluated on jpeg images by seven NEPTUNE pathologists scoring 131 glomeruli three times (Tests I, II, and III), each test following a consensus webinar review. Inter-reader concordance of glomerular descriptors was evaluated in 315 glomeruli by all pathologists; interstitial fibrosis and tubular atrophy (244 cases, whole-slide images) and four ultrastructural podocyte descriptors (178 cases, jpeg images) were evaluated once by six and five pathologists, respectively. Cohen's kappa for inter-reader concordance for 48/51 glomerular descriptors with sufficient observations was moderate (0.40

  14. Predicting Drug-induced Hepatotoxicity Using QSAR and Toxicogenomics Approaches

    PubMed Central

    Low, Yen; Uehara, Takeki; Minowa, Yohsuke; Yamada, Hiroshi; Ohno, Yasuo; Urushidani, Tetsuro; Sedykh, Alexander; Muratov, Eugene; Fourches, Denis; Zhu, Hao; Rusyn, Ivan; Tropsha, Alexander

    2014-01-01

    Quantitative Structure-Activity Relationship (QSAR) modeling and toxicogenomics are used independently as predictive tools in toxicology. In this study, we evaluated the power of several statistical models for predicting drug hepatotoxicity in rats using different descriptors of drug molecules, namely their chemical descriptors and toxicogenomic profiles. The records were taken from the Toxicogenomics Project rat liver microarray database containing information on 127 drugs (http://toxico.nibio.go.jp/datalist.html). The model endpoint was hepatotoxicity in the rat following 28 days of exposure, established by liver histopathology and serum chemistry. First, we developed multiple conventional QSAR classification models using a comprehensive set of chemical descriptors and several classification methods (k nearest neighbor, support vector machines, random forests, and distance weighted discrimination). With chemical descriptors alone, external predictivity (Correct Classification Rate, CCR) from 5-fold external cross-validation was 61%. Next, the same classification methods were employed to build models using only toxicogenomic data (24h after a single exposure) treated as biological descriptors. The optimized models used only 85 selected toxicogenomic descriptors and had CCR as high as 76%. Finally, hybrid models combining both chemical descriptors and transcripts were developed; their CCRs were between 68 and 77%. Although the accuracy of hybrid models did not exceed that of the models based on toxicogenomic data alone, the use of both chemical and biological descriptors enriched the interpretation of the models. In addition to finding 85 transcripts that were predictive and highly relevant to the mechanisms of drug-induced liver injury, chemical structural alerts for hepatotoxicity were also identified. These results suggest that concurrent exploration of the chemical features and acute treatment-induced changes in transcript levels will both enrich the mechanistic understanding of sub-chronic liver injury and afford models capable of accurate prediction of hepatotoxicity from chemical structure and short-term assay results. PMID:21699217

  15. A multivariate prediction model for Rho-dependent termination of transcription.

    PubMed

    Nadiras, Cédric; Eveno, Eric; Schwartz, Annie; Figueroa-Bossi, Nara; Boudvillain, Marc

    2018-06-21

    Bacterial transcription termination proceeds via two main mechanisms triggered either by simple, well-conserved (intrinsic) nucleic acid motifs or by the motor protein Rho. Although bacterial genomes can harbor hundreds of termination signals of either type, only intrinsic terminators are reliably predicted. Computational tools to detect the more complex and diversiform Rho-dependent terminators are lacking. To tackle this issue, we devised a prediction method based on Orthogonal Projections to Latent Structures Discriminant Analysis [OPLS-DA] of a large set of in vitro termination data. Using previously uncharacterized genomic sequences for biochemical evaluation and OPLS-DA, we identified new Rho-dependent signals and quantitative sequence descriptors with significant predictive value. Most relevant descriptors specify features of transcript C>G skewness, secondary structure, and richness in regularly-spaced 5'CC/UC dinucleotides that are consistent with known principles for Rho-RNA interaction. Descriptors collectively warrant OPLS-DA predictions of Rho-dependent termination with a ∼85% success rate. Scanning of the Escherichia coli genome with the OPLS-DA model identifies significantly more termination-competent regions than anticipated from transcriptomics and predicts that regions intrinsically refractory to Rho are primarily located in open reading frames. Altogether, this work delineates features important for Rho activity and describes the first method able to predict Rho-dependent terminators in bacterial genomes.

  16. Development of bovine serum albumin-water partition coefficients predictive models for ionogenic organic chemicals based on chemical form adjusted descriptors.

    PubMed

    Ding, Feng; Yang, Xianhai; Chen, Guosong; Liu, Jining; Shi, Lili; Chen, Jingwen

    2017-10-01

    The partition coefficients between bovine serum albumin (BSA) and water (K BSA/w ) for ionogenic organic chemicals (IOCs) were different greatly from those of neutral organic chemicals (NOCs). For NOCs, several excellent models were developed to predict their logK BSA/w . However, it was found that the conventional descriptors are inappropriate for modeling logK BSA/w of IOCs. Thus, alternative approaches are urgently needed to develop predictive models for K BSA/w of IOCs. In this study, molecular descriptors that can be used to characterize the ionization effects (e.g. chemical form adjusted descriptors) were calculated and used to develop predictive models for logK BSA/w of IOCs. The models developed had high goodness-of-fit, robustness, and predictive ability. The predictor variables selected to construct the models included the chemical form adjusted averages of the negative potentials on the molecular surface (V s-adj - ), the chemical form adjusted molecular dipole moment (dipolemoment adj ), the logarithm of the n-octanol/water distribution coefficient (logD). As these molecular descriptors can be calculated from their molecular structures directly, the developed model can be easily used to fill the logK BSA/w data gap for other IOCs within the applicability domain. Furthermore, the chemical form adjusted descriptors calculated in this study also could be used to construct predictive models on other endpoints of IOCs. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Chemical Sensor Array Response Modeling Using Quantitative Structure-Activity Relationships Technique

    NASA Astrophysics Data System (ADS)

    Shevade, Abhijit V.; Ryan, Margaret A.; Homer, Margie L.; Zhou, Hanying; Manfreda, Allison M.; Lara, Liana M.; Yen, Shiao-Pin S.; Jewell, April D.; Manatt, Kenneth S.; Kisor, Adam K.

    We have developed a Quantitative Structure-Activity Relationships (QSAR) based approach to correlate the response of chemical sensors in an array with molecular descriptors. A novel molecular descriptor set has been developed; this set combines descriptors of sensing film-analyte interactions, representing sensor response, with a basic analyte descriptor set commonly used in QSAR studies. The descriptors are obtained using a combination of molecular modeling tools and empirical and semi-empirical Quantitative Structure-Property Relationships (QSPR) methods. The sensors under investigation are polymer-carbon sensing films which have been exposed to analyte vapors at parts-per-million (ppm) concentrations; response is measured as change in film resistance. Statistically validated QSAR models have been developed using Genetic Function Approximations (GFA) for a sensor array for a given training data set. The applicability of the sensor response models has been tested by using it to predict the sensor activities for test analytes not considered in the training set for the model development. The validated QSAR sensor response models show good predictive ability. The QSAR approach is a promising computational tool for sensing materials evaluation and selection. It can also be used to predict response of an existing sensing film to new target analytes.

  18. Descriptions and identifications of strangers by youth and adult eyewitnesses.

    PubMed

    Pozzulo, Joanna D; Warren, Kelly L

    2003-04-01

    Two studies varying target gender and mode of target exposure were conducted to compare the quantity, nature, and accuracy of free recall person descriptions provided by youths and adults. In addition, the relation among age, identification accuracy, and number of descriptors reported was considered. Youths (10-14 years) reported fewer descriptors than adults. Exterior facial descriptors (e.g., hair items) were predominant and accurately reported by youths and adults. Accuracy was consistently problematic for youths when reporting body descriptors (e.g., height, weight) and interior facial features. Youths reported a similar number of descriptors when making accurate versus inaccurate identification decisions. This pattern also was consistent for adults. With target-absent lineups, the difference in the number of descriptors reported between adults and youths was greater when making a false positive versus correct rejection.

  19. Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules

    PubMed Central

    2014-01-01

    We present four models of solution free-energy prediction for druglike molecules utilizing cheminformatics descriptors and theoretically calculated thermodynamic values. We make predictions of solution free energy using physics-based theory alone and using machine learning/quantitative structure–property relationship (QSPR) models. We also develop machine learning models where the theoretical energies and cheminformatics descriptors are used as combined input. These models are used to predict solvation free energy. While direct theoretical calculation does not give accurate results in this approach, machine learning is able to give predictions with a root mean squared error (RMSE) of ∼1.1 log S units in a 10-fold cross-validation for our Drug-Like-Solubility-100 (DLS-100) dataset of 100 druglike molecules. We find that a model built using energy terms from our theoretical methodology as descriptors is marginally less predictive than one built on Chemistry Development Kit (CDK) descriptors. Combining both sets of descriptors allows a further but very modest improvement in the predictions. However, in some cases, this is a statistically significant enhancement. These results suggest that there is little complementarity between the chemical information provided by these two sets of descriptors, despite their different sources and methods of calculation. Our machine learning models are also able to predict the well-known Solubility Challenge dataset with an RMSE value of 0.9–1.0 log S units. PMID:24564264

  20. Effect of number of probes and their orientation on the calculation of several compressor face distortion descriptors

    NASA Technical Reports Server (NTRS)

    Stoll, F.; Tremback, J. W.; Arnaiz, H. H.

    1979-01-01

    A study was performed to determine the effects of the number and position of total pressure probes on the calculation of five compressor face distortion descriptors. This study used three sets of 320 steady state total pressure measurements that were obtained with a special rotating rake apparatus in wind tunnel tests of a mixed-compression inlet. The inlet was a one third scale model of the inlet on a YF-12 airplane, and it was tested in the wind tunnel at representative flight conditions at Mach numbers above 2.0. The study shows that large errors resulted in the calculation of the distortion descriptors even with a number of probes that were considered adequate in the past. There were errors as large as 30 and -50 percent in several distortion descriptors for a configuration consisting of eight rakes with five equal-area-weighted probes on each rake.

  1. OPERA models for predicting physicochemical properties and environmental fate endpoints.

    PubMed

    Mansouri, Kamel; Grulke, Chris M; Judson, Richard S; Williams, Antony J

    2018-03-08

    The collection of chemical structure information and associated experimental data for quantitative structure-activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2-15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q 2 of the models varied from 0.72 to 0.95, with an average of 0.86 and an R 2 test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission's Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure-activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency's CompTox Chemistry Dashboard.

  2. A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison

    PubMed Central

    Pfeiffenberger, Erik; Chaleil, Raphael A.G.; Moal, Iain H.

    2017-01-01

    ABSTRACT Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc. PMID:27935158

  3. Prediction on the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase based on gene expression programming.

    PubMed

    Li, Yuqin; You, Guirong; Jia, Baoxiu; Si, Hongzong; Yao, Xiaojun

    2014-01-01

    Quantitative structure-activity relationships (QSAR) were developed to predict the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase via heuristic method (HM) and gene expression programming (GEP). The descriptors of 33 pyrrolidine derivatives were calculated by the software CODESSA, which can calculate quantum chemical, topological, geometrical, constitutional, and electrostatic descriptors. HM was also used for the preselection of 5 appropriate molecular descriptors. Linear and nonlinear QSAR models were developed based on the HM and GEP separately and two prediction models lead to a good correlation coefficient (R (2)) of 0.93 and 0.94. The two QSAR models are useful in predicting the inhibition ratio of pyrrolidine derivatives on matrix metalloproteinase during the discovery of new anticancer drugs and providing theory information for studying the new drugs.

  4. Molecular descriptor subset selection in theoretical peptide quantitative structure-retention relationship model development using nature-inspired optimization algorithms.

    PubMed

    Žuvela, Petar; Liu, J Jay; Macur, Katarzyna; Bączek, Tomasz

    2015-10-06

    In this work, performance of five nature-inspired optimization algorithms, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), firefly algorithm (FA), and flower pollination algorithm (FPA), was compared in molecular descriptor selection for development of quantitative structure-retention relationship (QSRR) models for 83 peptides that originate from eight model proteins. The matrix with 423 descriptors was used as input, and QSRR models based on selected descriptors were built using partial least squares (PLS), whereas root mean square error of prediction (RMSEP) was used as a fitness function for their selection. Three performance criteria, prediction accuracy, computational cost, and the number of selected descriptors, were used to evaluate the developed QSRR models. The results show that all five variable selection methods outperform interval PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA is superior because of its lowest computational cost and higher accuracy (RMSEP of 5.534%) with a smaller number of variables (nine descriptors). The GA-QSRR model was validated initially through Y-randomization. In addition, it was successfully validated with an external testing set out of 102 peptides originating from Bacillus subtilis proteomes (RMSEP of 22.030%). Its applicability domain was defined, from which it was evident that the developed GA-QSRR exhibited strong robustness. All the sources of the model's error were identified, thus allowing for further application of the developed methodology in proteomics.

  5. 3D Riesz-wavelet based Covariance descriptors for texture classification of lung nodule tissue in CT.

    PubMed

    Cirujeda, Pol; Muller, Henning; Rubin, Daniel; Aguilera, Todd A; Loo, Billy W; Diehn, Maximilian; Binefa, Xavier; Depeursinge, Adrien

    2015-01-01

    In this paper we present a novel technique for characterizing and classifying 3D textured volumes belonging to different lung tissue types in 3D CT images. We build a volume-based 3D descriptor, robust to changes of size, rigid spatial transformations and texture variability, thanks to the integration of Riesz-wavelet features within a Covariance-based descriptor formulation. 3D Riesz features characterize the morphology of tissue density due to their response to changes in intensity in CT images. These features are encoded in a Covariance-based descriptor formulation: this provides a compact and flexible representation thanks to the use of feature variations rather than dense features themselves and adds robustness to spatial changes. Furthermore, the particular symmetric definite positive matrix form of these descriptors causes them to lay in a Riemannian manifold. Thus, descriptors can be compared with analytical measures, and accurate techniques from machine learning and clustering can be adapted to their spatial domain. Additionally we present a classification model following a "Bag of Covariance Descriptors" paradigm in order to distinguish three different nodule tissue types in CT: solid, ground-glass opacity, and healthy lung. The method is evaluated on top of an acquired dataset of 95 patients with manually delineated ground truth by radiation oncology specialists in 3D, and quantitative sensitivity and specificity values are presented.

  6. SVM prediction of ligand-binding sites in bacterial lipoproteins employing shape and physio-chemical descriptors.

    PubMed

    Kadam, Kiran; Prabhakar, Prashant; Jayaraman, V K

    2012-11-01

    Bacterial lipoproteins play critical roles in various physiological processes including the maintenance of pathogenicity and numbers of them are being considered as potential candidates for generating novel vaccines. In this work, we put forth an algorithm to identify and predict ligand-binding sites in bacterial lipoproteins. The method uses three types of pocket descriptors, namely fpocket descriptors, 3D Zernike descriptors and shell descriptors, and combines them with Support Vector Machine (SVM) method for the classification. The three types of descriptors represent shape-based properties of the pocket as well as its local physio-chemical features. All three types of descriptors, along with their hybrid combinations are evaluated with SVM and to improve classification performance, WEKA-InfoGain feature selection is applied. Results obtained in the study show that the classifier successfully differentiates between ligand-binding and non-binding pockets. For the combination of three types of descriptors, 10 fold cross-validation accuracy of 86.83% is obtained for training while the selected model achieved test Matthews Correlation Coefficient (MCC) of 0.534. Individually or in combination with new and existing methods, our model can be a very useful tool for the prediction of potential ligand-binding sites in bacterial lipoproteins.

  7. Modeling of adipose/blood partition coefficient for environmental chemicals.

    PubMed

    Papadaki, K C; Karakitsios, S P; Sarigiannis, D A

    2017-12-01

    A Quantitative Structure Activity Relationship (QSAR) model was developed in order to predict the adipose/blood partition coefficient of environmental chemical compounds. The first step of QSAR modeling was the collection of inputs. Input data included the experimental values of adipose/blood partition coefficient and two sets of molecular descriptors for 67 organic chemical compounds; a) the descriptors from Linear Free Energy Relationship (LFER) and b) the PaDEL descriptors. The datasets were split to training and prediction set and were analysed using two statistical methods; Genetic Algorithm based Multiple Linear Regression (GA-MLR) and Artificial Neural Networks (ANN). The models with LFER and PaDEL descriptors, coupled with ANN, produced satisfying performance results. The fitting performance (R 2 ) of the models, using LFER and PaDEL descriptors, was 0.94 and 0.96, respectively. The Applicability Domain (AD) of the models was assessed and then the models were applied to a large number of chemical compounds with unknown values of adipose/blood partition coefficient. In conclusion, the proposed models were checked for fitting, validity and applicability. It was demonstrated that they are stable, reliable and capable to predict the values of adipose/blood partition coefficient of "data poor" chemical compounds that fall within the applicability domain. Copyright © 2017. Published by Elsevier Ltd.

  8. Predicting the activity of drugs for a group of imidazopyridine anticoccidial compounds.

    PubMed

    Si, Hongzong; Lian, Ning; Yuan, Shuping; Fu, Aiping; Duan, Yun-Bo; Zhang, Kejun; Yao, Xiaojun

    2009-10-01

    Gene expression programming (GEP) is a novel machine learning technique. The GEP is used to build nonlinear quantitative structure-activity relationship model for the prediction of the IC(50) for the imidazopyridine anticoccidial compounds. This model is based on descriptors which are calculated from the molecular structure. Four descriptors are selected from the descriptors' pool by heuristic method (HM) to build multivariable linear model. The GEP method produced a nonlinear quantitative model with a correlation coefficient and a mean error of 0.96 and 0.24 for the training set, 0.91 and 0.52 for the test set, respectively. It is shown that the GEP predicted results are in good agreement with experimental ones.

  9. A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities.

    PubMed

    Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah

    2018-02-01

    Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.

  10. A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

    NASA Astrophysics Data System (ADS)

    Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah

    2018-02-01

    Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.

  11. How good is good? Students and assessors' perceptions of qualitative markers of performance.

    PubMed

    Ma, Heung Kan; Min, Cynthia; Neville, Alan; Eva, Kevin

    2013-01-01

    Qualitative markers of performance are routinely used for medical student assessment, though the extent to which such markers can be readily translated to actionable pieces of information remains uncertain. To explore (a) the perceived value to be indicated by descriptor phrases commonly used for describing student performance, (b) the perceived weight of the different performance domains (e.g. communication skills, work ethic, knowledge base, etc), and (c) whether or not the perceived value of the descriptors changes as a function of the performance domains. Five domains of performance were identified from the thematic coding of past medical student transcripts (N = 156). From the transcripts, 91 distinct descriptors indicating the language commonly used by assessors were also identified. From the list of 91 descriptors, Thurstone's method of equal-appearing intervals was used to extract 10 descriptors that were representative of the continuum of student performance. A modified paired comparisons method was then used to enable the relative ranking of each of 10 descriptors combined with each of 5 different domains of performance. A web-based survey was used to collect responses from participants (N = 209), which consisted of medical students and faculty members who were previously involved in student assessment. Results demonstrated that respondents did not simply sum positive and negative descriptors in a uniform manner. Rather, comments on some domains (e.g., "ability to apply patient centred medicine") were seen as particularly positive when associated with positive descriptors but not particularly negative when associated with negative descriptors. For others (e.g., "receptivity and responsiveness to feedback") the reverse was true. Comments on "knowledge-base" elicited a relatively muted perception at both ends of the scale. Finally, the results also revealed moderate misalignment in the perceptions of assessors and students. The findings from this study suggest that the use of any given descriptor conveys slightly different meaning dependent on the context in which it is used. This helps to address some key issues surrounding the application of qualitative markers to performance assessment in medical education.

  12. Fourier Descriptor Analysis and Unification of Voice Range Profile Contours: Method and Applications

    ERIC Educational Resources Information Center

    Pabon, Peter; Ternstrom, Sten; Lamarche, Anick

    2011-01-01

    Purpose: To describe a method for unified description, statistical modeling, and comparison of voice range profile (VRP) contours, even from diverse sources. Method: A morphologic modeling technique, which is based on Fourier descriptors (FDs), is applied to the VRP contour. The technique, which essentially involves resampling of the curve of the…

  13. Periodic table-based descriptors to encode cytotoxicity profile of metal oxide nanoparticles: a mechanistic QSTR approach.

    PubMed

    Kar, Supratik; Gajewicz, Agnieszka; Puzyn, Tomasz; Roy, Kunal; Leszczynski, Jerzy

    2014-09-01

    Nanotechnology has evolved as a frontrunner in the development of modern science. Current studies have established toxicity of some nanoparticles to human and environment. Lack of sufficient data and low adequacy of experimental protocols hinder comprehensive risk assessment of nanoparticles (NPs). In the present work, metal electronegativity (χ), the charge of the metal cation corresponding to a given oxide (χox), atomic number and valence electron number of the metal have been used as simple molecular descriptors to build up quantitative structure-toxicity relationship (QSTR) models for prediction of cytotoxicity of metal oxide NPs to bacteria Escherichia coli. These descriptors can be easily obtained from molecular formula and information acquired from periodic table in no time. It has been shown that a simple molecular descriptor χox can efficiently encode cytotoxicity of metal oxides leading to models with high statistical quality as well as interpretability. Based on this model and previously published experimental results, we have hypothesized the most probable mechanism of the cytotoxicity of metal oxide nanoparticles to E. coli. Moreover, the required information for descriptor calculation is independent of size range of NPs, nullifying a significant problem that various physical properties of NPs change for different size ranges. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. Web-4D-QSAR: A web-based application to generate 4D-QSAR descriptors.

    PubMed

    Ataide Martins, João Paulo; Rougeth de Oliveira, Marco Antônio; Oliveira de Queiroz, Mário Sérgio

    2018-06-05

    A web-based application is developed to generate 4D-QSAR descriptors using the LQTA-QSAR methodology, based on molecular dynamics (MD) trajectories and topology information retrieved from the GROMACS package. The LQTAGrid module calculates the intermolecular interaction energies at each grid point, considering probes and all aligned conformations resulting from MD simulations. These interaction energies are the independent variables or descriptors employed in a QSAR analysis. A friendly front end web interface, built using the Django framework and Python programming language, integrates all steps of the LQTA-QSAR methodology in a way that is transparent to the user, and in the backend, GROMACS and LQTAGrid are executed to generate 4D-QSAR descriptors to be used later in the process of QSAR model building. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.

  15. Blast and ballistic trajectories in combat casualties: a preliminary analysis using a cartesian positioning system with MDCT.

    PubMed

    Folio, Les R; Fischer, Tatjana; Shogan, Paul; Frew, Michael; Dwyer, Andrew; Provenzale, James M

    2011-08-01

    The purpose of this study is to determine the agreement with which radiologists identify wound paths in vivo on MDCT and calculate missile trajectories on the basis of Cartesian coordinates using a Cartesian positioning system (CPS). Three radiologists retrospectively identified 25 trajectories on MDCT in 19 casualties who sustained penetrating trauma in Iraq. Trajectories were described qualitatively in terms of directional path descriptors and quantitatively as trajectory vectors. Directional descriptors, trajectory angles, and angles between trajectories were calculated based on Cartesian coordinates of entrance and terminus or exit recorded in x, y image and table space (z) using a Trajectory Calculator created using spreadsheet software. The consistency of qualitative descriptor determinations was assessed in terms of frequency of observer agreement and multirater kappa statistics. Consistency of trajectory vectors was evaluated in terms of distribution of magnitude of the angles between vectors and the differences between their paraaxial and parasagittal angles. In 68% of trajectories, the observers' visual assessment of qualitative descriptors was congruent. Calculated descriptors agreed across observers in 60% of the trajectories. Estimated kappa also showed good agreement (0.65-0.79, p < 0.001); 70% of calculated paraaxial and parasagittal angles were within 20° across observers, and 61.3% of angles between trajectory vectors were within 20° across observers. Results show agreement of visually assessed and calculated qualitative descriptors and trajectory angles among observers. The Trajectory Calculator describes trajectories qualitatively similar to radiologists' visual assessment, showing the potential feasibility of automated trajectory analysis.

  16. Molecule kernels: a descriptor- and alignment-free quantitative structure-activity relationship approach.

    PubMed

    Mohr, Johannes A; Jain, Brijnesh J; Obermayer, Klaus

    2008-09-01

    Quantitative structure activity relationship (QSAR) analysis is traditionally based on extracting a set of molecular descriptors and using them to build a predictive model. In this work, we propose a QSAR approach based directly on the similarity between the 3D structures of a set of molecules measured by a so-called molecule kernel, which is independent of the spatial prealignment of the compounds. Predictors can be build using the molecule kernel in conjunction with the potential support vector machine (P-SVM), a recently proposed machine learning method for dyadic data. The resulting models make direct use of the structural similarities between the compounds in the test set and a subset of the training set and do not require an explicit descriptor construction. We evaluated the predictive performance of the proposed method on one classification and four regression QSAR datasets and compared its results to the results reported in the literature for several state-of-the-art descriptor-based and 3D QSAR approaches. In this comparison, the proposed molecule kernel method performed better than the other QSAR methods.

  17. The great descriptor melting pot: mixing descriptors for the common good of QSAR models.

    PubMed

    Tseng, Yufeng J; Hopfinger, Anton J; Esposito, Emilio Xavier

    2012-01-01

    The usefulness and utility of QSAR modeling depends heavily on the ability to estimate the values of molecular descriptors relevant to the endpoints of interest followed by an optimized selection of descriptors to form the best QSAR models from a representative set of the endpoints of interest. The performance of a QSAR model is directly related to its molecular descriptors. QSAR modeling, specifically model construction and optimization, has benefited from its ability to borrow from other unrelated fields, yet the molecular descriptors that form QSAR models have remained basically unchanged in both form and preferred usage. There are many types of endpoints that require multiple classes of descriptors (descriptors that encode 1D through multi-dimensional, 4D and above, content) needed to most fully capture the molecular features and interactions that contribute to the endpoint. The advantages of QSAR models constructed from multiple, and different, descriptor classes have been demonstrated in the exploration of markedly different, and principally biological systems and endpoints. Multiple examples of such QSAR applications using different descriptor sets are described and that examined. The take-home-message is that a major part of the future of QSAR analysis, and its application to modeling biological potency, ADME-Tox properties, general use in virtual screening applications, as well as its expanding use into new fields for building QSPR models, lies in developing strategies that combine and use 1D through nD molecular descriptors.

  18. Effective structural descriptors for natural and engineered radioactive waste confinement barriers

    NASA Astrophysics Data System (ADS)

    Lemmens, Laurent; Rogiers, Bart; De Craen, Mieke; Laloy, Eric; Jacques, Diederik; Huysmans, Marijke; Swennen, Rudy; Urai, Janos L.; Desbois, Guillaume

    2017-04-01

    The microstructure of a radioactive waste confinement barrier strongly influences its flow and transport properties. Numerical flow and transport simulations for these porous media at the pore scale therefore require input data that describe the microstructure as accurately as possible. To date, no imaging method can resolve all heterogeneities within important radioactive waste confinement barrier materials as hardened cement paste and natural clays at the micro scale (nm-cm). Therefore, it is necessary to merge information from different 2D and 3D imaging methods using porous media reconstruction techniques. To qualitatively compare the results of different reconstruction techniques, visual inspection might suffice. To quantitatively compare training-image based algorithms, Tan et al. (2014) proposed an algorithm using an analysis of distance. However, the ranking of the algorithm depends on the choice of the structural descriptor, in their case multiple-point or cluster-based histograms. We present here preliminary work in which we will review different structural descriptors and test their effectiveness, for capturing the main structural characteristics of radioactive waste confinement barrier materials, to determine the descriptors to use in the analysis of distance. The investigated descriptors are particle size distributions, surface area distributions, two point probability functions, multiple point histograms, linear functions and two point cluster functions. The descriptor testing consists of stochastically generating realizations from a reference image using the simulated annealing optimization procedure introduced by Karsanina et al. (2015). This procedure basically minimizes the differences between pre-specified descriptor values associated with the training image and the image being produced. The most efficient descriptor set can therefore be identified by comparing the image generation quality among the tested descriptor combinations. The assessment of the quality of the simulations will be made by combining all considered descriptors. Once the set of the most efficient descriptors is determined, they can be used in the analysis of distance, to rank different reconstruction algorithms in a more objective way in future work. Karsanina MV, Gerke KM, Skvortsova EB, Mallants D (2015) Universal Spatial Correlation Functions for Describing and Reconstructing Soil Microstructure. PLoS ONE 10(5): e0126515. doi:10.1371/journal.pone.0126515 Tan, Xiaojin, Pejman Tahmasebi, and Jef Caers. "Comparing training-image based algorithms using an analysis of distance." Mathematical Geosciences 46.2 (2014): 149-169.

  19. Towards a metadata scheme for the description of materials - the description of microstructures

    NASA Astrophysics Data System (ADS)

    Schmitz, Georg J.; Böttger, Bernd; Apel, Markus; Eiken, Janin; Laschet, Gottfried; Altenfeld, Ralph; Berger, Ralf; Boussinot, Guillaume; Viardin, Alexandre

    2016-01-01

    The property of any material is essentially determined by its microstructure. Numerical models are increasingly the focus of modern engineering as helpful tools for tailoring and optimization of custom-designed microstructures by suitable processing and alloy design. A huge variety of software tools is available to predict various microstructural aspects for different materials. In the general frame of an integrated computational materials engineering (ICME) approach, these microstructure models provide the link between models operating at the atomistic or electronic scales, and models operating on the macroscopic scale of the component and its processing. In view of an improved interoperability of all these different tools it is highly desirable to establish a standardized nomenclature and methodology for the exchange of microstructure data. The scope of this article is to provide a comprehensive system of metadata descriptors for the description of a 3D microstructure. The presented descriptors are limited to a mere geometric description of a static microstructure and have to be complemented by further descriptors, e.g. for properties, numerical representations, kinetic data, and others in the future. Further attributes to each descriptor, e.g. on data origin, data uncertainty, and data validity range are being defined in ongoing work. The proposed descriptors are intended to be independent of any specific numerical representation. The descriptors defined in this article may serve as a first basis for standardization and will simplify the data exchange between different numerical models, as well as promote the integration of experimental data into numerical models of microstructures. An HDF5 template data file for a simple, three phase Al-Cu microstructure being based on the defined descriptors complements this article.

  20. Towards a metadata scheme for the description of materials - the description of microstructures.

    PubMed

    Schmitz, Georg J; Böttger, Bernd; Apel, Markus; Eiken, Janin; Laschet, Gottfried; Altenfeld, Ralph; Berger, Ralf; Boussinot, Guillaume; Viardin, Alexandre

    2016-01-01

    The property of any material is essentially determined by its microstructure. Numerical models are increasingly the focus of modern engineering as helpful tools for tailoring and optimization of custom-designed microstructures by suitable processing and alloy design. A huge variety of software tools is available to predict various microstructural aspects for different materials. In the general frame of an integrated computational materials engineering (ICME) approach, these microstructure models provide the link between models operating at the atomistic or electronic scales, and models operating on the macroscopic scale of the component and its processing. In view of an improved interoperability of all these different tools it is highly desirable to establish a standardized nomenclature and methodology for the exchange of microstructure data. The scope of this article is to provide a comprehensive system of metadata descriptors for the description of a 3D microstructure. The presented descriptors are limited to a mere geometric description of a static microstructure and have to be complemented by further descriptors, e.g. for properties, numerical representations, kinetic data, and others in the future. Further attributes to each descriptor, e.g. on data origin, data uncertainty, and data validity range are being defined in ongoing work. The proposed descriptors are intended to be independent of any specific numerical representation. The descriptors defined in this article may serve as a first basis for standardization and will simplify the data exchange between different numerical models, as well as promote the integration of experimental data into numerical models of microstructures. An HDF5 template data file for a simple, three phase Al-Cu microstructure being based on the defined descriptors complements this article.

  1. Target recognition for ladar range image using slice image

    NASA Astrophysics Data System (ADS)

    Xia, Wenze; Han, Shaokun; Wang, Liang

    2015-12-01

    A shape descriptor and a complete shape-based recognition system using slice images as geometric feature descriptor for ladar range images are introduced. A slice image is a two-dimensional image generated by three-dimensional Hough transform and the corresponding mathematical transformation. The system consists of two processes, the model library construction and recognition. In the model library construction process, a series of range images are obtained after the model object is sampled at preset attitude angles. Then, all the range images are converted into slice images. The number of slice images is reduced by clustering analysis and finding a representation to reduce the size of the model library. In the recognition process, the slice image of the scene is compared with the slice image in the model library. The recognition results depend on the comparison. Simulated ladar range images are used to analyze the recognition and misjudgment rates, and comparison between the slice image representation method and moment invariants representation method is performed. The experimental results show that whether in conditions without noise or with ladar noise, the system has a high recognition rate and low misjudgment rate. The comparison experiment demonstrates that the slice image has better representation ability than moment invariants.

  2. On the Development and Use of Large Chemical Similarity Networks, Informatics Best Practices and Novel Chemical Descriptors Towards Materials Quantitative Structure Property Relationships

    NASA Astrophysics Data System (ADS)

    Krein, Michael

    After decades of development and use in a variety of application areas, Quantitative Structure Property Relationships (QSPRs) and related descriptor-based statistical learning methods have achieved a level of infamy due to their misuse. The field is rife with past examples of overtrained models, overoptimistic performance assessment, and outright cheating in the form of explicitly removing data to fit models. These actions do not serve the community well, nor are they beneficial to future predictions based on established models. In practice, in order to select combinations of descriptors and machine learning methods that might work best, one must consider the nature and size of the training and test datasets, be aware of existing hypotheses about the data, and resist the temptation to bias structure representation and modeling to explicitly fit the hypotheses. The definition and application of these best practices is important for obtaining actionable modeling outcomes, and for setting user expectations of modeling accuracy when predicting the endpoint values of unknowns. A wide variety of statistical learning approaches, descriptor types, and model validation strategies are explored herein, with the goals of helping end users understand the factors involved in creating and using QSPR models effectively, and to better understand relationships within the data, especially by looking at the problem space from multiple perspectives. Molecular relationships are commonly envisioned in a continuous high-dimensional space of numerical descriptors, referred to as chemistry space. Descriptor and similarity metric choice influence the partitioning of this space into regions corresponding to local structural similarity. These regions, known as domains of applicability, are most likely to be successfully modeled by a QSPR. In Chapter 2, the network topology and scaling relationships of several chemistry spaces are thoroughly investigated. Chemistry spaces studied include the ZINC data set, a qHTS PubChem bioassay, as well as the protein binding sites from the PDB. The characteristics of these networks are compared and contrasted with those of the bioassay Structure Activity Landscape Index (SALI) subnetwork, which maps discontinuities or cliffs in the structure activity landscape. Mapping this newly generated information over underlying chemistry space networks generated using different descriptors demonstrates local modeling capacity and can guide the choice of better local representations of chemistry space. Chapter 2 introduces and demonstrates this novel concept, which also enables future work in visualization and interpretation of chemical spaces. Initially, it was discovered that there were no community-available tools to leverage best-practice ideas to comprehensively build, compare, and interpret QSPRs. The Yet Another Modeling System (YAMS) tool performs a series of balanced, rational decisions in dataset preprocessing and parameter/feature selection over a choice of modeling methods. To date, YAMS is the only community-available informatics tool that performs such decisions consistently between methods while also providing multiple model performance comparisons and detailed descriptor importance information. The focus of the tool is thus to convey rich information about model quality and predictions that help to "close the loop" between modeling and experimental efforts, for example, in tailoring nanocomposite properties. Polymer nanocomposites (PNC) are complex material systems encompassing many potential structures, chemistries, and self assembled morphologies that could significantly impact commercial and military applications. There is a strong desire to characterize and understand the tradespace of nanocomposites, to identify the important factors relating nanostructure to materials properties and determine an effective way to control materials properties at the manufacturing scale. Due to the complexity of the systems, existing design approaches rely heavily on trial-and-error learning. By leveraging existing experimental data, Materials Quantitative Structure-Property Relationships (MQSPRs) relate molecular structures to the polar and dispersive components of corresponding surface tensions. In turn, existing theories relate polymer and nanofiller polar and dispersive surface tension components to the dispersion state and interfacial polymer relaxation times. These quantities may, in the future, be used as input to continuum mechanics approaches shown able to predict the thermomechanical response of nanocomposites. For a polymer dataset and a particle dataset, multiple structural representations and descriptor sets are benchmarked, including a set of high performance surface-property descriptors developed as part of this work. The systematic variation of structural representations as part of the informatics approach reveals important insight in modeling polymers, and should become common practice when defining new problem spaces.

  3. Quantitative structure-activation barrier relationship modeling for Diels-Alder ligations utilizing quantum chemical structural descriptors.

    PubMed

    Nandi, Sisir; Monesi, Alessandro; Drgan, Viktor; Merzel, Franci; Novič, Marjana

    2013-10-30

    In the present study, we show the correlation of quantum chemical structural descriptors with the activation barriers of the Diels-Alder ligations. A set of 72 non-catalysed Diels-Alder reactions were subjected to quantitative structure-activation barrier relationship (QSABR) under the framework of theoretical quantum chemical descriptors calculated solely from the structures of diene and dienophile reactants. Experimental activation barrier data were obtained from literature. Descriptors were computed using Hartree-Fock theory using 6-31G(d) basis set as implemented in Gaussian 09 software. Variable selection and model development were carried out by stepwise multiple linear regression methodology. Predictive performance of the quantitative structure-activation barrier relationship (QSABR) model was assessed by training and test set concept and by calculating leave-one-out cross-validated Q2 and predictive R2 values. The QSABR model can explain and predict 86.5% and 80% of the variances, respectively, in the activation energy barrier training data. Alternatively, a neural network model based on back propagation of errors was developed to assess the nonlinearity of the sought correlations between theoretical descriptors and experimental reaction barriers. A reasonable predictability for the activation barrier of the test set reactions was obtained, which enabled an exploration and interpretation of the significant variables responsible for Diels-Alder interaction between dienes and dienophiles. Thus, studies in the direction of QSABR modelling that provide efficient and fast prediction of activation barriers of the Diels-Alder reactions turn out to be a meaningful alternative to transition state theory based computation.

  4. Innovative design method of automobile profile based on Fourier descriptor

    NASA Astrophysics Data System (ADS)

    Gao, Shuyong; Fu, Chaoxing; Xia, Fan; Shen, Wei

    2017-10-01

    Aiming at the innovation of the contours of automobile side, this paper presents an innovative design method of vehicle side profile based on Fourier descriptor. The design flow of this design method is: pre-processing, coordinate extraction, standardization, discrete Fourier transform, simplified Fourier descriptor, exchange descriptor innovation, inverse Fourier transform to get the outline of innovative design. Innovative concepts of the innovative methods of gene exchange among species and the innovative methods of gene exchange among different species are presented, and the contours of the innovative design are obtained separately. A three-dimensional model of a car is obtained by referring to the profile curve which is obtained by exchanging xenogeneic genes. The feasibility of the method proposed in this paper is verified by various aspects.

  5. Segmentation, modeling and classification of the compact objects in a pile

    NASA Technical Reports Server (NTRS)

    Gupta, Alok; Funka-Lea, Gareth; Wohn, Kwangyoen

    1990-01-01

    The problem of interpreting dense range images obtained from the scene of a heap of man-made objects is discussed. A range image interpretation system consisting of segmentation, modeling, verification, and classification procedures is described. First, the range image is segmented into regions and reasoning is done about the physical support of these regions. Second, for each region several possible three-dimensional interpretations are made based on various scenarios of the objects physical support. Finally each interpretation is tested against the data for its consistency. The superquadric model is selected as the three-dimensional shape descriptor, plus tapering deformations along the major axis. Experimental results obtained from some complex range images of mail pieces are reported to demonstrate the soundness and the robustness of our approach.

  6. Towards the chemometric dissection of peptide - HLA-A*0201 binding affinity: comparison of local and global QSAR models

    NASA Astrophysics Data System (ADS)

    Doytchinova, Irini A.; Walshe, Valerie; Borrow, Persephone; Flower, Darren R.

    2005-03-01

    The affinities of 177 nonameric peptides binding to the HLA-A*0201 molecule were measured using a FACS-based MHC stabilisation assay and analysed using chemometrics. Their structures were described by global and local descriptors, QSAR models were derived by genetic algorithm, stepwise regression and PLS. The global molecular descriptors included molecular connectivity χ indices, κ shape indices, E-state indices, molecular properties like molecular weight and log P, and three-dimensional descriptors like polarizability, surface area and volume. The local descriptors were of two types. The first used a binary string to indicate the presence of each amino acid type at each position of the peptide. The second was also position-dependent but used five z-scales to describe the main physicochemical properties of the amino acids forming the peptides. The models were developed using a representative training set of 131 peptides and validated using an independent test set of 46 peptides. It was found that the global descriptors could not explain the variance in the training set nor predict the affinities of the test set accurately. Both types of local descriptors gave QSAR models with better explained variance and predictive ability. The results suggest that, in their interactions with the MHC molecule, the peptide acts as a complicated ensemble of multiple amino acids mutually potentiating each other.

  7. Improved nucleic acid descriptors for siRNA efficacy prediction.

    PubMed

    Sciabola, Simone; Cao, Qing; Orozco, Modesto; Faustino, Ignacio; Stanton, Robert V

    2013-02-01

    Although considerable progress has been made recently in understanding how gene silencing is mediated by the RNAi pathway, the rational design of effective sequences is still a challenging task. In this article, we demonstrate that including three-dimensional descriptors improved the discrimination between active and inactive small interfering RNAs (siRNAs) in a statistical model. Five descriptor types were used: (i) nucleotide position along the siRNA sequence, (ii) nucleotide composition in terms of presence/absence of specific combinations of di- and trinucleotides, (iii) nucleotide interactions by means of a modified auto- and cross-covariance function, (iv) nucleotide thermodynamic stability derived by the nearest neighbor model representation and (v) nucleic acid structure flexibility. The duplex flexibility descriptors are derived from extended molecular dynamics simulations, which are able to describe the sequence-dependent elastic properties of RNA duplexes, even for non-standard oligonucleotides. The matrix of descriptors was analysed using three statistical packages in R (partial least squares, random forest, and support vector machine), and the most predictive model was implemented in a modeling tool we have made publicly available through SourceForge. Our implementation of new RNA descriptors coupled with appropriate statistical algorithms resulted in improved model performance for the selection of siRNA candidates when compared with publicly available siRNA prediction tools and previously published test sets. Additional validation studies based on in-house RNA interference projects confirmed the robustness of the scoring procedure in prospective studies.

  8. Quantum descriptors for predictive toxicology of halogenated aliphatic hydrocarbons.

    PubMed

    Trohalaki, S; Pachter, R

    2003-04-01

    In order to improve Quantitative Structure-Activity Relationships (QSARs) for halogenated aliphatics (HA) and to better understand the biophysical mechanism of toxic response to these ubiquitous chemicals, we employ improved quantum-mechanical descriptors to account for HA electrophilicity. We demonstrate that, unlike the lowest unoccupied molecular orbital energy, ELUMO, which was previously used as a descriptor, the electron affinity can be systematically improved by application of higher levels of theory. We also show that employing the reciprocal of ELUMO, which is more consistent with frontier molecular orbital (FMO) theory, improves the correlations with in vitro toxicity data. We offer explanations based on FMO theory for a result from our previous work, in which the LUMO energies of HA anions correlated surprisingly well with in vitro toxicity data. Additional descriptors are also suggested and interpreted in terms of the accepted biophysical mechanism of toxic response to HAs and new QSARs are derived for various chemical categories that compose the data set employed. These alternate descriptors provide important insight and could benefit other classes of compounds where the biophysical mechanism of toxic response involves dissociative attachment.

  9. Land use and land cover classification for rural residential areas in China using soft-probability cascading of multifeatures

    NASA Astrophysics Data System (ADS)

    Zhang, Bin; Liu, Yueyan; Zhang, Zuyu; Shen, Yonglin

    2017-10-01

    A multifeature soft-probability cascading scheme to solve the problem of land use and land cover (LULC) classification using high-spatial-resolution images to map rural residential areas in China is proposed. The proposed method is used to build midlevel LULC features. Local features are frequently considered as low-level feature descriptors in a midlevel feature learning method. However, spectral and textural features, which are very effective low-level features, are neglected. The acquisition of the dictionary of sparse coding is unsupervised, and this phenomenon reduces the discriminative power of the midlevel feature. Thus, we propose to learn supervised features based on sparse coding, a support vector machine (SVM) classifier, and a conditional random field (CRF) model to utilize the different effective low-level features and improve the discriminability of midlevel feature descriptors. First, three kinds of typical low-level features, namely, dense scale-invariant feature transform, gray-level co-occurrence matrix, and spectral features, are extracted separately. Second, combined with sparse coding and the SVM classifier, the probabilities of the different LULC classes are inferred to build supervised feature descriptors. Finally, the CRF model, which consists of two parts: unary potential and pairwise potential, is employed to construct an LULC classification map. Experimental results show that the proposed classification scheme can achieve impressive performance when the total accuracy reached about 87%.

  10. Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods.

    PubMed

    Martínez, María Jimena; Ponzoni, Ignacio; Díaz, Mónica F; Vazquez, Gustavo E; Soto, Axel J

    2015-01-01

    The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert's knowledge in the selection process is needed for increase the confidence in the final set of descriptors. In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist's expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Graphical abstractVIDEAN allows the visual analysis of candidate subsets of descriptors for QSAR/QSPR. In the two panels on the top, users can interactively explore numerical correlations as well as co-occurrences in the candidate subsets through two interactive graphs.

  11. Prediction of Radical Scavenging Activities of Anthocyanins Applying Adaptive Neuro-Fuzzy Inference System (ANFIS) with Quantum Chemical Descriptors

    PubMed Central

    Jhin, Changho; Hwang, Keum Taek

    2014-01-01

    Radical scavenging activity of anthocyanins is well known, but only a few studies have been conducted by quantum chemical approach. The adaptive neuro-fuzzy inference system (ANFIS) is an effective technique for solving problems with uncertainty. The purpose of this study was to construct and evaluate quantitative structure-activity relationship (QSAR) models for predicting radical scavenging activities of anthocyanins with good prediction efficiency. ANFIS-applied QSAR models were developed by using quantum chemical descriptors of anthocyanins calculated by semi-empirical PM6 and PM7 methods. Electron affinity (A) and electronegativity (χ) of flavylium cation, and ionization potential (I) of quinoidal base were significantly correlated with radical scavenging activities of anthocyanins. These descriptors were used as independent variables for QSAR models. ANFIS models with two triangular-shaped input fuzzy functions for each independent variable were constructed and optimized by 100 learning epochs. The constructed models using descriptors calculated by both PM6 and PM7 had good prediction efficiency with Q-square of 0.82 and 0.86, respectively. PMID:25153627

  12. Integration of QSAR and SAR methods for the mechanistic interpretation of predictive models for carcinogenicity

    PubMed Central

    Fjodorova, Natalja; Novič, Marjana

    2012-01-01

    The knowledge-based Toxtree expert system (SAR approach) was integrated with the statistically based counter propagation artificial neural network (CP ANN) model (QSAR approach) to contribute to a better mechanistic understanding of a carcinogenicity model for non-congeneric chemicals using Dragon descriptors and carcinogenic potency for rats as a response. The transparency of the CP ANN algorithm was demonstrated using intrinsic mapping technique specifically Kohonen maps. Chemical structures were represented by Dragon descriptors that express the structural and electronic features of molecules such as their shape and electronic surrounding related to reactivity of molecules. It was illustrated how the descriptors are correlated with particular structural alerts (SAs) for carcinogenicity with recognized mechanistic link to carcinogenic activity. Moreover, the Kohonen mapping technique enables one to examine the separation of carcinogens and non-carcinogens (for rats) within a family of chemicals with a particular SA for carcinogenicity. The mechanistic interpretation of models is important for the evaluation of safety of chemicals. PMID:24688639

  13. Automatic Mrf-Based Registration of High Resolution Satellite Video Data

    NASA Astrophysics Data System (ADS)

    Platias, C.; Vakalopoulou, M.; Karantzalos, K.

    2016-06-01

    In this paper we propose a deformable registration framework for high resolution satellite video data able to automatically and accurately co-register satellite video frames and/or register them to a reference map/image. The proposed approach performs non-rigid registration, formulates a Markov Random Fields (MRF) model, while efficient linear programming is employed for reaching the lowest potential of the cost function. The developed approach has been applied and validated on satellite video sequences from Skybox Imaging and compared with a rigid, descriptor-based registration method. Regarding the computational performance, both the MRF-based and the descriptor-based methods were quite efficient, with the first one converging in some minutes and the second in some seconds. Regarding the registration accuracy the proposed MRF-based method significantly outperformed the descriptor-based one in all the performing experiments.

  14. Development of a model for predicting reaction rate constants of organic chemicals with ozone at different temperatures.

    PubMed

    Li, Xuehua; Zhao, Wenxing; Li, Jing; Jiang, Jingqiu; Chen, Jianji; Chen, Jingwen

    2013-08-01

    To assess the persistence and fate of volatile organic compounds in the troposphere, the rate constants for the reaction with ozone (kO3) are needed. As kO3 values are only available for hundreds of compounds, and experimental determination of kO3 is costly and time-consuming, it is of importance to develop predictive models on kO3. In this study, a total of 379 logkO3 values at different temperatures were used to develop and validate a model for the prediction of kO3, based on quantum chemical descriptors, Dragon descriptors and structural fragments. Molecular descriptors were screened by stepwise multiple linear regression, and the model was constructed by partial least-squares regression. The cross validation coefficient QCUM(2) of the model is 0.836, and the external validation coefficient Qext(2) is 0.811, indicating that the model has high robustness and good predictive performance. The most significant descriptor explaining logkO3 is the BELm2 descriptor with connectivity information weighted atomic masses. kO3 increases with increasing BELm2, and decreases with increasing ionization potential. The applicability domain of the proposed model was visualized by the Williams plot. The developed model can be used to predict kO3 at different temperatures for a wide range of organic chemicals, including alkenes, cycloalkenes, haloalkenes, alkynes, oxygen-containing compounds, nitrogen-containing compounds (except primary amines) and aromatic compounds. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. QSRR modeling for diverse drugs using different feature selection methods coupled with linear and nonlinear regressions.

    PubMed

    Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan

    2012-12-01

    A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. A novel class sensitive hashing technique for large-scale content-based remote sensing image retrieval

    NASA Astrophysics Data System (ADS)

    Reato, Thomas; Demir, Begüm; Bruzzone, Lorenzo

    2017-10-01

    This paper presents a novel class sensitive hashing technique in the framework of large-scale content-based remote sensing (RS) image retrieval. The proposed technique aims at representing each image with multi-hash codes, each of which corresponds to a primitive (i.e., land cover class) present in the image. To this end, the proposed method consists of a three-steps algorithm. The first step is devoted to characterize each image by primitive class descriptors. These descriptors are obtained through a supervised approach, which initially extracts the image regions and their descriptors that are then associated with primitives present in the images. This step requires a set of annotated training regions to define primitive classes. A correspondence between the regions of an image and the primitive classes is built based on the probability of each primitive class to be present at each region. All the regions belonging to the specific primitive class with a probability higher than a given threshold are highly representative of that class. Thus, the average value of the descriptors of these regions is used to characterize that primitive. In the second step, the descriptors of primitive classes are transformed into multi-hash codes to represent each image. This is achieved by adapting the kernel-based supervised locality sensitive hashing method to multi-code hashing problems. The first two steps of the proposed technique, unlike the standard hashing methods, allow one to represent each image by a set of primitive class sensitive descriptors and their hash codes. Then, in the last step, the images in the archive that are very similar to a query image are retrieved based on a multi-hash-code-matching scheme. Experimental results obtained on an archive of aerial images confirm the effectiveness of the proposed technique in terms of retrieval accuracy when compared to the standard hashing methods.

  17. Predicting human skin absorption of chemicals: development of a novel quantitative structure activity relationship.

    PubMed

    Luo, Wen; Medrek, Sarah; Misra, Jatin; Nohynek, Gerhard J

    2007-02-01

    The objective of this study was to construct and validate a quantitative structure-activity relationship model for skin absorption. Such models are valuable tools for screening and prioritization in safety and efficacy evaluation, and risk assessment of drugs and chemicals. A database of 340 chemicals with percutaneous absorption was assembled. Two models were derived from the training set consisting 306 chemicals (90/10 random split). In addition to the experimental K(ow) values, over 300 2D and 3D atomic and molecular descriptors were analyzed using MDL's QsarIS computer program. Subsequently, the models were validated using both internal (leave-one-out) and external validation (test set) procedures. Using the stepwise regression analysis, three molecular descriptors were determined to have significant statistical correlation with K(p) (R2 = 0.8225): logK(ow), X0 (quantification of both molecular size and the degree of skeletal branching), and SsssCH (count of aromatic carbon groups). In conclusion, two models to estimate skin absorption were developed. When compared to other skin absorption QSAR models in the literature, our model incorporated more chemicals and explored a large number of descriptors. Additionally, our models are reasonably predictive and have met both internal and external statistical validations.

  18. From QSAR to QSIIR: Searching for Enhanced Computational Toxicology Models

    PubMed Central

    Zhu, Hao

    2017-01-01

    Quantitative Structure Activity Relationship (QSAR) is the most frequently used modeling approach to explore the dependency of biological, toxicological, or other types of activities/properties of chemicals on their molecular features. In the past two decades, QSAR modeling has been used extensively in drug discovery process. However, the predictive models resulted from QSAR studies have limited use for chemical risk assessment, especially for animal and human toxicity evaluations, due to the low predictivity of new compounds. To develop enhanced toxicity models with independently validated external prediction power, novel modeling protocols were pursued by computational toxicologists based on rapidly increasing toxicity testing data in recent years. This chapter reviews the recent effort in our laboratory to incorporate the biological testing results as descriptors in the toxicity modeling process. This effort extended the concept of QSAR to Quantitative Structure In vitro-In vivo Relationship (QSIIR). The QSIIR study examples provided in this chapter indicate that the QSIIR models that based on the hybrid (biological and chemical) descriptors are indeed superior to the conventional QSAR models that only based on chemical descriptors for several animal toxicity endpoints. We believe that the applications introduced in this review will be of interest and value to researchers working in the field of computational drug discovery and environmental chemical risk assessment. PMID:23086837

  19. A new texture descriptor based on local micro-pattern for detection of architectural distortion in mammographic images

    NASA Astrophysics Data System (ADS)

    de Oliveira, Helder C. R.; Moraes, Diego R.; Reche, Gustavo A.; Borges, Lucas R.; Catani, Juliana H.; de Barros, Nestor; Melo, Carlos F. E.; Gonzaga, Adilson; Vieira, Marcelo A. C.

    2017-03-01

    This paper presents a new local micro-pattern texture descriptor for the detection of Architectural Distortion (AD) in digital mammography images. AD is a subtle contraction of breast parenchyma that may represent an early sign of breast cancer. Due to its subtlety and variability, AD is more difficult to detect compared to microcalcifications and masses, and is commonly found in retrospective evaluations of false-negative mammograms. Several computer-based systems have been proposed for automatic detection of AD, but their performance are still unsatisfactory. The proposed descriptor, Local Mapped Pattern (LMP), is a generalization of the Local Binary Pattern (LBP), which is considered one of the most powerful feature descriptor for texture classification in digital images. Compared to LBP, the LMP descriptor captures more effectively the minor differences between the local image pixels. Moreover, LMP is a parametric model which can be optimized for the desired application. In our work, the LMP performance was compared to the LBP and four Haralick's texture descriptors for the classification of 400 regions of interest (ROIs) extracted from clinical mammograms. ROIs were selected and divided into four classes: AD, normal tissue, microcalcifications and masses. Feature vectors were used as input to a multilayer perceptron neural network, with a single hidden layer. Results showed that LMP is a good descriptor to distinguish AD from other anomalies in digital mammography. LMP performance was slightly better than the LBP and comparable to Haralick's descriptors (mean classification accuracy = 83%).

  20. Real-Time Visual Tracking through Fusion Features

    PubMed Central

    Ruan, Yang; Wei, Zhenzhong

    2016-01-01

    Due to their high-speed, correlation filters for object tracking have begun to receive increasing attention. Traditional object trackers based on correlation filters typically use a single type of feature. In this paper, we attempt to integrate multiple feature types to improve the performance, and we propose a new DD-HOG fusion feature that consists of discriminative descriptors (DDs) and histograms of oriented gradients (HOG). However, fusion features as multi-vector descriptors cannot be directly used in prior correlation filters. To overcome this difficulty, we propose a multi-vector correlation filter (MVCF) that can directly convolve with a multi-vector descriptor to obtain a single-channel response that indicates the location of an object. Experiments on the CVPR2013 tracking benchmark with the evaluation of state-of-the-art trackers show the effectiveness and speed of the proposed method. Moreover, we show that our MVCF tracker, which uses the DD-HOG descriptor, outperforms the structure-preserving object tracker (SPOT) in multi-object tracking because of its high-speed and ability to address heavy occlusion. PMID:27347951

  1. Quantitative structure-retention relationships for gas chromatographic retention indices of alkylbenzenes with molecular graph descriptors.

    PubMed

    Ivanciuc, O; Ivanciuc, T; Klein, D J; Seitz, W A; Balaban, A T

    2001-02-01

    Quantitative structure-retention relationships (QSRR) represent statistical models that quantify the connection between the molecular structure and the chromatographic retention indices of organic compounds, allowing the prediction of retention indices of novel, not yet synthesized compounds, solely from their structural descriptors. Using multiple linear regression, QSRR models for the gas chromatographic Kováts retention indices of 129 alkylbenzenes are generated using molecular graph descriptors. The correlational ability of structural descriptors computed from 10 molecular matrices is investigated, showing that the novel reciprocal matrices give numerical indices with improved correlational ability. A QSRR equation with 5 graph descriptors gives the best calibration and prediction results, demonstrating the usefulness of the molecular graph descriptors in modeling chromatographic retention parameters. The sequential orthogonalization of descriptors suggests simpler QSRR models by eliminating redundant structural information.

  2. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

    PubMed

    Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.

  3. Functional Constructivism: In Search of Formal Descriptors.

    PubMed

    Trofimova, Irina

    2017-10-01

    The Functional Constructivism (FC) paradigm is an alternative to behaviorism and considers behavior as being generated every time anew, based on an individual's capacities, environmental resources and demands. Walter Freeman's work provided us with evidence supporting the FC principles. In this paper we make parallels between gradual construction processes leading to the formation of individual behavior and habits, and evolutionary processes leading to the establishment of biological systems. Referencing evolutionary theory, several formal descriptors of such processes are proposed. These FC descriptors refer to the most universal aspects for constructing consistent structures: expansion of degrees of freedom, integration processes based on internal and external compatibility between systems and maintenance processes, all given in four different classes of systems: (a) Zone of Proximate Development (poorly defined) systems; (b) peer systems with emerging reproduction of multiple siblings; (c) systems with internalized integration of behavioral elements ('cruise controls'); and (d) systems capable of handling low-probability, not yet present events. The recursive dynamics within this set of descriptors acting on (traditional) downward, upward and horizontal directions of evolution, is conceptualized as diagonal evolution, or di-evolution. Two examples applying these FC descriptors to taxonomy are given: classification of the functionality of neuro-transmitters and temperament traits; classification of mental disorders. The paper is an early step towards finding a formal language describing universal tendencies in highly diverse, complex and multi-level transient systems known in ecology and biology as 'contingency cycles.'

  4. Static sign language recognition using 1D descriptors and neural networks

    NASA Astrophysics Data System (ADS)

    Solís, José F.; Toxqui, Carina; Padilla, Alfonso; Santiago, César

    2012-10-01

    A frame work for static sign language recognition using descriptors which represents 2D images in 1D data and artificial neural networks is presented in this work. The 1D descriptors were computed by two methods, first one consists in a correlation rotational operator.1 and second is based on contour analysis of hand shape. One of the main problems in sign language recognition is segmentation; most of papers report a special color in gloves or background for hand shape analysis. In order to avoid the use of gloves or special clothing, a thermal imaging camera was used to capture images. Static signs were picked up from 1 to 9 digits of American Sign Language, a multilayer perceptron reached 100% recognition with cross-validation.

  5. Analysis of calibrated seafloor backscatter for habitat classification methodology and case study of 158 spots in the Bay of Biscay and Celtic Sea

    NASA Astrophysics Data System (ADS)

    Fezzani, Ridha; Berger, Laurent

    2018-06-01

    An automated signal-based method was developed in order to analyse the seafloor backscatter data logged by calibrated multibeam echosounder. The processing consists first in the clustering of each survey sub-area into a small number of homogeneous sediment types, based on the backscatter average level at one or several incidence angles. Second, it uses their local average angular response to extract discriminant descriptors, obtained by fitting the field data to the Generic Seafloor Acoustic Backscatter parametric model. Third, the descriptors are used for seafloor type classification. The method was tested on the multi-year data recorded by a calibrated 90-kHz Simrad ME70 multibeam sonar operated in the Bay of Biscay, France and Celtic Sea, Ireland. It was applied for seafloor-type classification into 12 classes, to a dataset of 158 spots surveyed for demersal and benthic fauna study and monitoring. Qualitative analyses and classified clusters using extracted parameters show a good discriminatory potential, indicating the robustness of this approach.

  6. An object-based approach for areal rainfall estimation and validation of atmospheric models

    NASA Astrophysics Data System (ADS)

    Troemel, Silke; Simmer, Clemens

    2010-05-01

    An object-based approach for areal rainfall estimation is applied to pseudo-radar data simulated of a weatherforecast model as well as to real radar volume data. The method aims at an as fully as possible exploitation of three-dimensional radar signals produced by precipitation generating systems during their lifetime to enhance areal rainfall estimation. Therefore tracking of radar-detected precipitation-centroids is performed and rain events are investigated using so-called Integral Radar Volume Descriptors (IRVD) containing relevant information of the underlying precipitation process. Some investigated descriptors are statistical quantities from the radar reflectivities within the boundary of a tracked rain cell like the area mean reflectivity or the compactness of a cell; others evaluate the mean vertical structure during the tracking period at the near surface reflectivity-weighted center of the cell like the mean effective efficiency or the mean echo top height. The stage of evolution of a system is given by the trend in the brightband fraction or related quantities. Furthermore, two descriptors not directly derived from radar data are considered: the mean wind shear and an orographic rainfall amplifier. While in case of pseudo-radar data a model based on a small set of IRVDs alone provides rainfall estimates of high accuracy, the application of such a model to the real world remains within the accuracies achievable with a constant Z-R-relationship. However, a combined model based on single IRVDs and the Marshall-Palmer Z-R-estimator already provides considerable enhancements even though the resolution of the data base used has room for improvement. The mean echo top height, the mean effective efficiency, the empirical standard deviation and the Marshall-Palmer estimator are detected for the final rainfall estimator. High correlations between storm height and rain rates, a shift of the probability distribution to higher values with increasing effective efficiency, and the possibility to classify continental and maritime systems using the effective efficiency confirm the informative value of the qualified descriptors. The IRVDs especially correct for the underestimation in case of intense rain events, and the information content of descriptors is most likely higher than demonstrated so far. We used quite sparse information about meteorological variables needed for the calculation of some IRVDs from single radiosoundings, and several descriptors suffered from the range-dependent vertical resolution of the reflectivity profile. Inclusion of neighbouring radars and assimilation runs of weather forecasting models will further enhance the accuracy of rainfall estimates. Finally, the clear difference between the IRVD selection from the pseudo-radar data and from the real world data hint to a new object-based avenue for the validation of higher resolution atmospheric models and for evaluating their potential to digest radar observations in data assimilation schemes.

  7. Experimental and computational prediction of glass transition temperature of drugs.

    PubMed

    Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S

    2014-12-22

    Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.

  8. Revealing cell cycle control by combining model-based detection of periodic expression with novel cis-regulatory descriptors

    PubMed Central

    Andersson, Claes R; Hvidsten, Torgeir R; Isaksson, Anders; Gustafsson, Mats G; Komorowski, Jan

    2007-01-01

    Background We address the issue of explaining the presence or absence of phase-specific transcription in budding yeast cultures under different conditions. To this end we use a model-based detector of gene expression periodicity to divide genes into classes depending on their behavior in experiments using different synchronization methods. While computational inference of gene regulatory circuits typically relies on expression similarity (clustering) in order to find classes of potentially co-regulated genes, this method instead takes advantage of known time profile signatures related to the studied process. Results We explain the regulatory mechanisms of the inferred periodic classes with cis-regulatory descriptors that combine upstream sequence motifs with experimentally determined binding of transcription factors. By systematic statistical analysis we show that periodic classes are best explained by combinations of descriptors rather than single descriptors, and that different combinations correspond to periodic expression in different classes. We also find evidence for additive regulation in that the combinations of cis-regulatory descriptors associated with genes periodically expressed in fewer conditions are frequently subsets of combinations associated with genes periodically expression in more conditions. Finally, we demonstrate that our approach retrieves combinations that are more specific towards known cell-cycle related regulators than the frequently used clustering approach. Conclusion The results illustrate how a model-based approach to expression analysis may be particularly well suited to detect biologically relevant mechanisms. Our new approach makes it possible to provide more refined hypotheses about regulatory mechanisms of the cell cycle and it can easily be adjusted to reveal regulation of other, non-periodic, cellular processes. PMID:17939860

  9. Chemical graphs, molecular matrices and topological indices in chemoinformatics and quantitative structure-activity relationships.

    PubMed

    Ivanciuc, Ovidiu

    2013-06-01

    Chemical and molecular graphs have fundamental applications in chemoinformatics, quantitative structureproperty relationships (QSPR), quantitative structure-activity relationships (QSAR), virtual screening of chemical libraries, and computational drug design. Chemoinformatics applications of graphs include chemical structure representation and coding, database search and retrieval, and physicochemical property prediction. QSPR, QSAR and virtual screening are based on the structure-property principle, which states that the physicochemical and biological properties of chemical compounds can be predicted from their chemical structure. Such structure-property correlations are usually developed from topological indices and fingerprints computed from the molecular graph and from molecular descriptors computed from the three-dimensional chemical structure. We present here a selection of the most important graph descriptors and topological indices, including molecular matrices, graph spectra, spectral moments, graph polynomials, and vertex topological indices. These graph descriptors are used to define several topological indices based on molecular connectivity, graph distance, reciprocal distance, distance-degree, distance-valency, spectra, polynomials, and information theory concepts. The molecular descriptors and topological indices can be developed with a more general approach, based on molecular graph operators, which define a family of graph indices related by a common formula. Graph descriptors and topological indices for molecules containing heteroatoms and multiple bonds are computed with weighting schemes based on atomic properties, such as the atomic number, covalent radius, or electronegativity. The correlation in QSPR and QSAR models can be improved by optimizing some parameters in the formula of topological indices, as demonstrated for structural descriptors based on atomic connectivity and graph distance.

  10. Probabilistic Elastic Part Model: A Pose-Invariant Representation for Real-World Face Verification.

    PubMed

    Li, Haoxiang; Hua, Gang

    2018-04-01

    Pose variation remains to be a major challenge for real-world face recognition. We approach this problem through a probabilistic elastic part model. We extract local descriptors (e.g., LBP or SIFT) from densely sampled multi-scale image patches. By augmenting each descriptor with its location, a Gaussian mixture model (GMM) is trained to capture the spatial-appearance distribution of the face parts of all face images in the training corpus, namely the probabilistic elastic part (PEP) model. Each mixture component of the GMM is confined to be a spherical Gaussian to balance the influence of the appearance and the location terms, which naturally defines a part. Given one or multiple face images of the same subject, the PEP-model builds its PEP representation by sequentially concatenating descriptors identified by each Gaussian component in a maximum likelihood sense. We further propose a joint Bayesian adaptation algorithm to adapt the universally trained GMM to better model the pose variations between the target pair of faces/face tracks, which consistently improves face verification accuracy. Our experiments show that we achieve state-of-the-art face verification accuracy with the proposed representations on the Labeled Face in the Wild (LFW) dataset, the YouTube video face database, and the CMU MultiPIE dataset.

  11. Action Recognition Using Nonnegative Action Component Representation and Sparse Basis Selection.

    PubMed

    Wang, Haoran; Yuan, Chunfeng; Hu, Weiming; Ling, Haibin; Yang, Wankou; Sun, Changyin

    2014-02-01

    In this paper, we propose using high-level action units to represent human actions in videos and, based on such units, a novel sparse model is developed for human action recognition. There are three interconnected components in our approach. First, we propose a new context-aware spatial-temporal descriptor, named locally weighted word context, to improve the discriminability of the traditionally used local spatial-temporal descriptors. Second, from the statistics of the context-aware descriptors, we learn action units using the graph regularized nonnegative matrix factorization, which leads to a part-based representation and encodes the geometrical information. These units effectively bridge the semantic gap in action recognition. Third, we propose a sparse model based on a joint l2,1-norm to preserve the representative items and suppress noise in the action units. Intuitively, when learning the dictionary for action representation, the sparse model captures the fact that actions from the same class share similar units. The proposed approach is evaluated on several publicly available data sets. The experimental results and analysis clearly demonstrate the effectiveness of the proposed approach.

  12. Evaluation of Controlled Vocabularies by Inter-Indexer Consistency

    ERIC Educational Resources Information Center

    Monreal, Concha Soler; Gil-Leiva, Isidoro

    2011-01-01

    Introduction: Several controlled vocabularies are used for indexing three journal articles to check if better or equal consistency rates are achieved with a list of descriptors than with a standard thesaurus and augmented thesaurus. Method: A terminology set for library and information Science was used to build a list of descriptors with…

  13. RED: a set of molecular descriptors based on Renyi entropy.

    PubMed

    Delgado-Soler, Laura; Toral, Raul; Tomás, M Santos; Rubio-Martinez, Jaime

    2009-11-01

    New molecular descriptors, RED (Renyi entropy descriptors), based on the generalized entropies introduced by Renyi are presented. Topological descriptors based on molecular features have proven to be useful for describing molecular profiles. Renyi entropy is used as a variability measure to contract a feature-pair distribution composing the descriptor vector. The performance of RED descriptors was tested for the analysis of different sets of molecular distances, virtual screening, and pharmacological profiling. A free parameter of the Renyi entropy has been optimized for all the considered applications.

  14. How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space.

    PubMed

    Koutsoukas, Alexios; Paricharak, Shardul; Galloway, Warren R J D; Spring, David R; Ijzerman, Adriaan P; Glen, Robert C; Marcus, David; Bender, Andreas

    2014-01-27

    Chemical diversity is a widely applied approach to select structurally diverse subsets of molecules, often with the objective of maximizing the number of hits in biological screening. While many methods exist in the area, few systematic comparisons using current descriptors in particular with the objective of assessing diversity in bioactivity space have been published, and this shortage is what the current study is aiming to address. In this work, 13 widely used molecular descriptors were compared, including fingerprint-based descriptors (ECFP4, FCFP4, MACCS keys), pharmacophore-based descriptors (TAT, TAD, TGT, TGD, GpiDAPH3), shape-based descriptors (rapid overlay of chemical structures (ROCS) and principal moments of inertia (PMI)), a connectivity-matrix-based descriptor (BCUT), physicochemical-property-based descriptors (prop2D), and a more recently introduced molecular descriptor type (namely, "Bayes Affinity Fingerprints"). We assessed both the similar behavior of the descriptors in assessing the diversity of chemical libraries, and their ability to select compounds from libraries that are diverse in bioactivity space, which is a property of much practical relevance in screening library design. This is particularly evident, given that many future targets to be screened are not known in advance, but that the library should still maximize the likelihood of containing bioactive matter also for future screening campaigns. Overall, our results showed that descriptors based on atom topology (i.e., fingerprint-based descriptors and pharmacophore-based descriptors) correlate well in rank-ordering compounds, both within and between descriptor types. On the other hand, shape-based descriptors such as ROCS and PMI showed weak correlation with the other descriptors utilized in this study, demonstrating significantly different behavior. We then applied eight of the molecular descriptors compared in this study to sample a diverse subset of sample compounds (4%) from an initial population of 2587 compounds, covering the 25 largest human activity classes from ChEMBL and measured the coverage of activity classes by the subsets. Here, it was found that "Bayes Affinity Fingerprints" achieved an average coverage of 92% of activity classes. Using the descriptors ECFP4, GpiDAPH3, TGT, and random sampling, 91%, 84%, 84%, and 84% of the activity classes were represented in the selected compounds respectively, followed by BCUT, prop2D, MACCS, and PMI (in order of decreasing performance). In addition, we were able to show that there is no visible correlation between compound diversity in PMI space and in bioactivity space, despite frequent utilization of PMI plots to this end. To summarize, in this work, we assessed which descriptors select compounds with high coverage of bioactivity space, and can hence be used for diverse compound selection for biological screening. In cases where multiple descriptors are to be used for diversity selection, this work describes which descriptors behave complementarily, and can hence be used jointly to focus on different aspects of diversity in chemical space.

  15. Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules.

    PubMed

    Pronobis, Wiktor; Tkatchenko, Alexandre; Müller, Klaus-Robert

    2018-06-12

    Machine learning (ML) based prediction of molecular properties across chemical compound space is an important and alternative approach to efficiently estimate the solutions of highly complex many-electron problems in chemistry and physics. Statistical methods represent molecules as descriptors that should encode molecular symmetries and interactions between atoms. Many such descriptors have been proposed; all of them have advantages and limitations. Here, we propose a set of general two-body and three-body interaction descriptors which are invariant to translation, rotation, and atomic indexing. By adapting the successfully used kernel ridge regression methods of machine learning, we evaluate our descriptors on predicting several properties of small organic molecules calculated using density-functional theory. We use two data sets. The GDB-7 set contains 6868 molecules with up to 7 heavy atoms of type CNO. The GDB-9 set is composed of 131722 molecules with up to 9 heavy atoms containing CNO. When trained on 5000 random molecules, our best model achieves an accuracy of 0.8 kcal/mol (on the remaining 1868 molecules of GDB-7) and 1.5 kcal/mol (on the remaining 126722 molecules of GDB-9) respectively. Applying a linear regression model on our novel many-body descriptors performs almost equal to a nonlinear kernelized model. Linear models are readily interpretable: a feature importance ranking measure helps to obtain qualitative and quantitative insights on the importance of two- and three-body molecular interactions for predicting molecular properties computed with quantum-mechanical methods.

  16. Adaptive Granulation-Based Prediction for Energy System of Steel Industry.

    PubMed

    Wang, Tianyu; Han, Zhongyang; Zhao, Jun; Wang, Wei

    2018-01-01

    The flow variation tendency of byproduct gas plays a crucial role for energy scheduling in steel industry. An accurate prediction of its future trends will be significantly beneficial for the economic profits of steel enterprise. In this paper, a long-term prediction model for the energy system is proposed by providing an adaptive granulation-based method that considers the production semantics involved in the fluctuation tendency of the energy data, and partitions them into a series of information granules. To fully reflect the corresponding data characteristics of the formed unequal-length temporal granules, a 3-D feature space consisting of the timespan, the amplitude and the linetype is designed as linguistic descriptors. In particular, a collaborative-conditional fuzzy clustering method is proposed to granularize the tendency-based feature descriptors and specifically measure the amplitude variation of industrial data which plays a dominant role in the feature space. To quantify the performance of the proposed method, a series of real-world industrial data coming from the energy data center of a steel plant is employed to conduct the comparative experiments. The experimental results demonstrate that the proposed method successively satisfies the requirements of the practically viable prediction.

  17. High-order statistics of weber local descriptors for image representation.

    PubMed

    Han, Xian-Hua; Chen, Yen-Wei; Xu, Gang

    2015-06-01

    Highly discriminant visual features play a key role in different image classification applications. This study aims to realize a method for extracting highly-discriminant features from images by exploring a robust local descriptor inspired by Weber's law. The investigated local descriptor is based on the fact that human perception for distinguishing a pattern depends not only on the absolute intensity of the stimulus but also on the relative variance of the stimulus. Therefore, we firstly transform the original stimulus (the images in our study) into a differential excitation-domain according to Weber's law, and then explore a local patch, called micro-Texton, in the transformed domain as Weber local descriptor (WLD). Furthermore, we propose to employ a parametric probability process to model the Weber local descriptors, and extract the higher-order statistics to the model parameters for image representation. The proposed strategy can adaptively characterize the WLD space using generative probability model, and then learn the parameters for better fitting the training space, which would lead to more discriminant representation for images. In order to validate the efficiency of the proposed strategy, we apply three different image classification applications including texture, food images and HEp-2 cell pattern recognition, which validates that our proposed strategy has advantages over the state-of-the-art approaches.

  18. An infrastructure to mine molecular descriptors for ligand selection on virtual screening.

    PubMed

    Seus, Vinicius Rosa; Perazzo, Giovanni Xavier; Winck, Ana T; Werhli, Adriano V; Machado, Karina S

    2014-01-01

    The receptor-ligand interaction evaluation is one important step in rational drug design. The databases that provide the structures of the ligands are growing on a daily basis. This makes it impossible to test all the ligands for a target receptor. Hence, a ligand selection before testing the ligands is needed. One possible approach is to evaluate a set of molecular descriptors. With the aim of describing the characteristics of promising compounds for a specific receptor we introduce a data warehouse-based infrastructure to mine molecular descriptors for virtual screening (VS). We performed experiments that consider as target the receptor HIV-1 protease and different compounds for this protein. A set of 9 molecular descriptors are taken as the predictive attributes and the free energy of binding is taken as a target attribute. By applying the J48 algorithm over the data we obtain decision tree models that achieved up to 84% of accuracy. The models indicate which molecular descriptors and their respective values are relevant to influence good FEB results. Using their rules we performed ligand selection on ZINC database. Our results show important reduction in ligands selection to be applied in VS experiments; for instance, the best selection model picked only 0.21% of the total amount of drug-like ligands.

  19. Characterizing Changes in the Rate of Protein-Protein Dissociation upon Interface Mutation Using Hotspot Energy and Organization

    PubMed Central

    Agius, Rudi; Torchala, Mieczyslaw; Moal, Iain H.; Fernández-Recio, Juan; Bates, Paul A.

    2013-01-01

    Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects. PMID:24039569

  20. Structure-reactivity modeling using mixture-based representation of chemical reactions.

    PubMed

    Polishchuk, Pavel; Madzhidov, Timur; Gimadiev, Timur; Bodrov, Andrey; Nugmanov, Ramil; Varnek, Alexandre

    2017-09-01

    We describe a novel approach of reaction representation as a combination of two mixtures: a mixture of reactants and a mixture of products. In turn, each mixture can be encoded using an earlier reported approach involving simplex descriptors (SiRMS). The feature vector representing these two mixtures results from either concatenated product and reactant descriptors or the difference between descriptors of products and reactants. This reaction representation doesn't need an explicit labeling of a reaction center. The rigorous "product-out" cross-validation (CV) strategy has been suggested. Unlike the naïve "reaction-out" CV approach based on a random selection of items, the proposed one provides with more realistic estimation of prediction accuracy for reactions resulting in novel products. The new methodology has been applied to model rate constants of E2 reactions. It has been demonstrated that the use of the fragment control domain applicability approach significantly increases prediction accuracy of the models. The models obtained with new "mixture" approach performed better than those required either explicit (Condensed Graph of Reaction) or implicit (reaction fingerprints) reaction center labeling.

  1. QSAR as a random event: modeling of nanoparticles uptake in PaCa2 cancer cells.

    PubMed

    Toropov, Andrey A; Toropova, Alla P; Puzyn, Tomasz; Benfenati, Emilio; Gini, Giuseppina; Leszczynska, Danuta; Leszczynski, Jerzy

    2013-06-01

    Quantitative structure-property/activity relationships (QSPRs/QSARs) are a tool to predict various endpoints for various substances. The "classic" QSPR/QSAR analysis is based on the representation of the molecular structure by the molecular graph. However, simplified molecular input-line entry system (SMILES) gradually becomes most popular representation of the molecular structure in the databases available on the Internet. Under such circumstances, the development of molecular descriptors calculated directly from SMILES becomes attractive alternative to "classic" descriptors. The CORAL software (http://www.insilico.eu/coral) is provider of SMILES-based optimal molecular descriptors which are aimed to correlate with various endpoints. We analyzed data set on nanoparticles uptake in PaCa2 pancreatic cancer cells. The data set includes 109 nanoparticles with the same core but different surface modifiers (small organic molecules). The concept of a QSAR as a random event is suggested in opposition to "classic" QSARs which are based on the only one distribution of available data into the training and the validation sets. In other words, five random splits into the "visible" training set and the "invisible" validation set were examined. The SMILES-based optimal descriptors (obtained by the Monte Carlo technique) for these splits are calculated with the CORAL software. The statistical quality of all these models is good. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. Major Source of Error in QSPR Prediction of Intrinsic Thermodynamic Solubility of Drugs: Solid vs Nonsolid State Contributions?

    PubMed

    Abramov, Yuriy A

    2015-06-01

    The main purpose of this study is to define the major limiting factor in the accuracy of the quantitative structure-property relationship (QSPR) models of the thermodynamic intrinsic aqueous solubility of the drug-like compounds. For doing this, the thermodynamic intrinsic aqueous solubility property was suggested to be indirectly "measured" from the contributions of solid state, ΔGfus, and nonsolid state, ΔGmix, properties, which are estimated by the corresponding QSPR models. The QSPR models of ΔGfus and ΔGmix properties were built based on a set of drug-like compounds with available accurate measurements of fusion and thermodynamic solubility properties. For consistency ΔGfus and ΔGmix models were developed using similar algorithms and descriptor sets, and validated against the similar test compounds. Analysis of the relative performances of these two QSPR models clearly demonstrates that it is the solid state contribution which is the limiting factor in the accuracy and predictive power of the QSPR models of the thermodynamic intrinsic solubility. The performed analysis outlines a necessity of development of new descriptor sets for an accurate description of the long-range order (periodicity) phenomenon in the crystalline state. The proposed approach to the analysis of limitations and suggestions for improvement of QSPR-type models may be generalized to other applications in the pharmaceutical industry.

  3. Quaternion-Based Texture Analysis of Multiband Satellite Images: Application to the Estimation of Aboveground Biomass in the East Region of Cameroon.

    PubMed

    Djiongo Kenfack, Cedrigue Boris; Monga, Olivier; Mpong, Serge Moto; Ndoundam, René

    2018-03-01

    Within the last decade, several approaches using quaternion numbers to handle and model multiband images in a holistic manner were introduced. The quaternion Fourier transform can be efficiently used to model texture in multidimensional data such as color images. For practical application, multispectral satellite data appear as a primary source for measuring past trends and monitoring changes in forest carbon stocks. In this work, we propose a texture-color descriptor based on the quaternion Fourier transform to extract relevant information from multiband satellite images. We propose a new multiband image texture model extraction, called FOTO++, in order to address biomass estimation issues. The first stage consists in removing noise from the multispectral data while preserving the edges of canopies. Afterward, color texture descriptors are extracted thanks to a discrete form of the quaternion Fourier transform, and finally the support vector regression method is used to deduce biomass estimation from texture indices. Our texture features are modeled using a vector composed with the radial spectrum coming from the amplitude of the quaternion Fourier transform. We conduct several experiments in order to study the sensitivity of our model to acquisition parameters. We also assess its performance both on synthetic images and on real multispectral images of Cameroonian forest. The results show that our model is more robust to acquisition parameters than the classical Fourier Texture Ordination model (FOTO). Our scheme is also more accurate for aboveground biomass estimation. We stress that a similar methodology could be implemented using quaternion wavelets. These results highlight the potential of the quaternion-based approach to study multispectral satellite images.

  4. Prediction of boiling points of organic compounds by QSPR tools.

    PubMed

    Dai, Yi-min; Zhu, Zhi-ping; Cao, Zhong; Zhang, Yue-fei; Zeng, Ju-lan; Li, Xun

    2013-07-01

    The novel electro-negativity topological descriptors of YC, WC were derived from molecular structure by equilibrium electro-negativity of atom and relative bond length of molecule. The quantitative structure-property relationships (QSPR) between descriptors of YC, WC as well as path number parameter P3 and the normal boiling points of 80 alkanes, 65 unsaturated hydrocarbons and 70 alcohols were obtained separately. The high-quality prediction models were evidenced by coefficient of determination (R(2)), the standard error (S), average absolute errors (AAE) and predictive parameters (Qext(2),RCV(2),Rm(2)). According to the regression equations, the influences of the length of carbon backbone, the size, the degree of branching of a molecule and the role of functional groups on the normal boiling point were analyzed. Comparison results with reference models demonstrated that novel topological descriptors based on the equilibrium electro-negativity of atom and the relative bond length were useful molecular descriptors for predicting the normal boiling points of organic compounds. Copyright © 2013 Elsevier Inc. All rights reserved.

  5. A 3D QSAR CoMFA study of non-peptide angiotensin II receptor antagonists

    NASA Astrophysics Data System (ADS)

    Belvisi, Laura; Bravi, Gianpaolo; Catalano, Giovanna; Mabilia, Massimo; Salimbeni, Aldo; Scolastico, Carlo

    1996-12-01

    A series of non-peptide angiotensin II receptor antagonists was investigated with the aim of developing a 3D QSAR model using comparative molecular field analysis descriptors and approaches. The main goals of the study were dictated by an interest in methodologies and an understanding of the binding requirements to the AT1 receptor. Consistency with the previously derived activity models was always checked to contemporarily test the validity of the various hypotheses. The specific conformations chosen for the study, the procedures invoked to superimpose all structures, the conditions employed to generate steric and electrostatic field values and the various PCA/PLS runs are discussed in detail. The effect of experimental design techniques to select objects (molecules) and variables (descriptors) with respect to the predictive power of the QSAR models derived was especially analysed.

  6. Pain Quality Descriptors in Community-Dwelling Older Adults with Nonmalignant Pain

    PubMed Central

    Thakral, Manu; Shi, Ling; Foust, Janice B.; Patel, Kushang V.; Shmerling, Robert H.; Bean, Jonathan F.; Leveille, Suzanne G.

    2016-01-01

    This study aimed to characterize the prevalence of various pain qualities in older adults with chronic non-malignant pain and determine the association of pain quality to other pain characteristics namely: severity, interference distribution, and pain-associated conditions. In the population-based MOBILIZE Boston Study, 560 participants aged≥70 years reported chronic pain in the baseline assessment, which included a home interview and clinic exam. Pain quality was assessed using a modified version of the McGill Pain Questionnaire (MPQ) consisting of 20 descriptors, from which 3 categories were derived: cognitive/affective, sensory and neuropathic. Presence of ≥2 pain-associated conditions was significantly associated with 18 of the 20 pain quality descriptors. Sensory descriptors were endorsed by nearly all older adults with chronic pain (93%), followed by cognitive/affective (83.4%) and neuropathic descriptors (68.6%). Neuropathic descriptors were associated with the greatest number of pain-associated conditions including osteoarthritis of the hand and knee. More than half of participants (59%) endorsed descriptors in all 3 categories and had more severe pain and interference, and multi-site or widespread pain than those endorsing 1 or 2 categories. Strong associations were observed between pain quality and measures of pain severity, interference, and distribution (p<.0001). Findings from this study indicate that older adults have multiple pain-associated conditions which likely reflect multiple physiological mechanisms for pain. Linking pain qualities with other associated pain characteristics serves to develop a multidimensional approach to geriatric pain assessment. Future research is needed to investigate the physiological mechanisms responsible for the variability in pain qualities endorsed by older adults. PMID:27842050

  7. Pain quality descriptors in community-dwelling older adults with nonmalignant pain.

    PubMed

    Thakral, Manu; Shi, Ling; Foust, Janice B; Patel, Kushang V; Shmerling, Robert H; Bean, Jonathan F; Leveille, Suzanne G

    2016-12-01

    This study aimed to characterize the prevalence of various pain qualities in older adults with chronic nonmalignant pain and determine the association of pain quality to other pain characteristics namely: severity, interference, distribution, and pain-associated conditions. In the population-based MOBILIZE Boston Study, 560 participants aged ≥70 years reported chronic pain in the baseline assessment, which included a home interview and clinic exam. Pain quality was assessed using a modified version of the McGill Pain Questionnaire (MPQ) consisting of 20 descriptors from which 3 categories were derived: cognitive/affective, sensory, and neuropathic. Presence of ≥2 pain-associated conditions was significantly associated with 18 of the 20 pain quality descriptors. Sensory descriptors were endorsed by nearly all older adults with chronic pain (93%), followed by cognitive/affective (83.4%) and neuropathic descriptors (68.6%). Neuropathic descriptors were associated with the greatest number of pain-associated conditions including osteoarthritis of the hand and knee. More than half of participants (59%) endorsed descriptors in all 3 categories and had more severe pain and interference, and multisite or widespread pain than those endorsing 1 or 2 categories. Strong associations were observed between pain quality and measures of pain severity, interference, and distribution (P < 0.0001). Findings from this study indicate that older adults have multiple pain-associated conditions that likely reflect multiple physiological mechanisms for pain. Linking pain qualities with other associated pain characteristics serve to develop a multidimensional approach to geriatric pain assessment. Future research is needed to investigate the physiological mechanisms responsible for the variability in pain qualities endorsed by older adults.

  8. Simple idea to generate fragment and pharmacophore descriptors and their implications in chemical informatics.

    PubMed

    Catana, Cornel

    2009-03-01

    Using a well-defined set of fragments/pharmacophores, a new methodology to calculate fragment/ pharmacophore descriptors for any molecule onto which at least one fragment/pharmacophore can be mapped is presented. To each fragment/pharmacophore present in a molecule, we attach a descriptor that is calculated by identifying the molecule's atoms onto which it maps and summing over its constituent atomic descriptors. The attached descriptors are named C-fragment/pharmacophore descriptors, and this methodology can be applied to any descriptors defined at the atomic level, such as the partition coefficient, molar refractivity, electrotopological state, etc. By using this methodology, the same fragment/pharmacophore can be shown to have different values in different molecules resulting in better discrimination power. As we know, fragment and pharmacophore fingerprints have a lot of applications in chemical informatics. This study has attempted to find the impact of replacing the traditional value of "1" in a fingerprint with real numbers derived form C-fragment/pharmacophore descriptors. One way to do this is to assess the utility of C-fragment/ pharmacophore descriptors in modeling different end points. Here, we exemplify with data from CYP and hERG. The fact that, in many cases, the obtained models were fairly successful and C-fragment descriptors were ranked among the top ones supports the idea that they play an important role in correlation. When we modeled hERG with C-pharmacophore descriptors, however, the model performances decreased slightly, and we attribute this, mainly to the fact that there is no technique capable of handling multiple instances (states). We hope this will open new research, especially in the emerging field of machine learning. Further research is needed to see the impact of C-fragment/pharmacophore descriptors in similarity/dissimilarity applications.

  9. Anomaly Detection Based on Local Nearest Neighbor Distance Descriptor in Crowded Scenes

    PubMed Central

    Hu, Shiqiang; Zhang, Huanlong; Luo, Lingkun

    2014-01-01

    We propose a novel local nearest neighbor distance (LNND) descriptor for anomaly detection in crowded scenes. Comparing with the commonly used low-level feature descriptors in previous works, LNND descriptor has two major advantages. First, LNND descriptor efficiently incorporates spatial and temporal contextual information around the video event that is important for detecting anomalous interaction among multiple events, while most existing feature descriptors only contain the information of single event. Second, LNND descriptor is a compact representation and its dimensionality is typically much lower than the low-level feature descriptor. Therefore, not only the computation time and storage requirement can be accordingly saved by using LNND descriptor for the anomaly detection method with offline training fashion, but also the negative aspects caused by using high-dimensional feature descriptor can be avoided. We validate the effectiveness of LNND descriptor by conducting extensive experiments on different benchmark datasets. Experimental results show the promising performance of LNND-based method against the state-of-the-art methods. It is worthwhile to notice that the LNND-based approach requires less intermediate processing steps without any subsequent processing such as smoothing but achieves comparable event better performance. PMID:25105164

  10. Structural alignment of protein descriptors - a combinatorial model.

    PubMed

    Antczak, Maciej; Kasprzak, Marta; Lukasiak, Piotr; Blazewicz, Jacek

    2016-09-17

    Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D models often exhibit deviations from the native structure. A way to confirm a model is a comparison with other structures. The structural alignment of a pair of proteins can be defined with the use of a concept of protein descriptors. The protein descriptors are local substructures of protein molecules, which allow us to divide the original problem into a set of subproblems and, consequently, to propose a more efficient algorithmic solution. In the literature, one can find many applications of the descriptors concept that prove its usefulness for insight into protein 3D structures, but the proposed approaches are presented rather from the biological perspective than from the computational or algorithmic point of view. Efficient algorithms for identification and structural comparison of descriptors can become crucial components of methods for structural quality assessment as well as tertiary structure prediction. In this paper, we propose a new combinatorial model and new polynomial-time algorithms for the structural alignment of descriptors. The model is based on the maximum-size assignment problem, which we define here and prove that it can be solved in polynomial time. We demonstrate suitability of this approach by comparison with an exact backtracking algorithm. Besides a simplification coming from the combinatorial modeling, both on the conceptual and complexity level, we gain with this approach high quality of obtained results, in terms of 3D alignment accuracy and processing efficiency. All the proposed algorithms were developed and integrated in a computationally efficient tool descs-standalone, which allows the user to identify and structurally compare descriptors of biological molecules, such as proteins and RNAs. Both PDB (Protein Data Bank) and mmCIF (macromolecular Crystallographic Information File) formats are supported. The proposed tool is available as an open source project stored on GitHub ( https://github.com/mantczak/descs-standalone ).

  11. Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties.

    PubMed

    Gupta, Rishi R; Gifford, Eric M; Liston, Ted; Waller, Chris L; Hohman, Moses; Bunin, Barry A; Ekins, Sean

    2010-11-01

    Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and positive predicted value (PPV) = 0.64], equivalent to those of models built with commercial Molecular Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 molecules and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we observed a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of commercial software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.

  12. On the Relationship between Molecular Hit Rates in High-Throughput Screening and Molecular Descriptors.

    PubMed

    Hansson, Mari; Pemberton, John; Engkvist, Ola; Feierberg, Isabella; Brive, Lars; Jarvis, Philip; Zander-Balderud, Linda; Chen, Hongming

    2014-06-01

    High-throughput screening (HTS) is widely used in the pharmaceutical industry to identify novel chemical starting points for drug discovery projects. The current study focuses on the relationship between molecular hit rate in recent in-house HTS and four common molecular descriptors: lipophilicity (ClogP), size (heavy atom count, HEV), fraction of sp(3)-hybridized carbons (Fsp3), and fraction of molecular framework (f(MF)). The molecular hit rate is defined as the fraction of times the molecule has been assigned as active in the HTS campaigns where it has been screened. Beta-binomial statistical models were built to model the molecular hit rate as a function of these descriptors. The advantage of the beta-binomial statistical models is that the correlation between the descriptors is taken into account. Higher degree polynomial terms of the descriptors were also added into the beta-binomial statistic model to improve the model quality. The relative influence of different molecular descriptors on molecular hit rate has been estimated, taking into account that the descriptors are correlated to each other through applying beta-binomial statistical modeling. The results show that ClogP has the largest influence on the molecular hit rate, followed by Fsp3 and HEV. f(MF) has only a minor influence besides its correlation with the other molecular descriptors. © 2013 Society for Laboratory Automation and Screening.

  13. Ability of bottle cap color to facilitate accurate glaucoma patient-physician communication regarding medication identity

    PubMed Central

    Dave, Pujan; Villarreal, Guadalupe; Friedman, David S.; Kahook, Malik Y.; Ramulu, Pradeep Y.

    2015-01-01

    Objective To determine the accuracy of patient-physician communication regarding topical ophthalmic medication use based on bottle cap color, particularly amongst individuals who may have acquired color vision deficiency from glaucoma. Design Cross-sectional, clinical study. Participants Patients ≥ 18 years old with primary open-angle, primary angle-closure, pseudoexfoliation, or pigment dispersion glaucoma, bilateral visual acuity of 20/400 or better, and no concurrent conditions that may affect color vision. Methods One hundred patients provided color descriptions of 11 distinct medication bottle caps. Patient-produced color descriptors were then presented to three physicians. Each physician matched each color descriptor to the medication they thought the descriptor was describing. Main Outcome Measures Frequency of patient-physician agreement, occurring when all three physicians accurately matched the patient-produced color descriptor to the correct medication. Multivariate regression models evaluated whether patient-physician agreement decreased with degree of better-eye visual field (VF) damage, color descriptor heterogeneity, and/or color vision deficiency, as determined by Hardy-Rand-Rittler (HRR) score and the Lanthony D15 testing index (D15 CCI). Results Subjects had a mean age of 69 (±11) years, with mean VF mean deviation of −4.7 (±6.0) and −10.9 (±8.4) dB in the better- and worse-seeing eyes, respectively. Patients produced 102 unique color descriptors to describe the colors of the 11 tested bottle caps. Among individual patients, the mean number of medications demonstrating patient-physician agreement was 6.1/11 (55.5%). Agreement was less than 15% for 4 medications (prednisolone acetate [generic], betaxolol HCl [Betoptic], brinzolamide/brimonidine [Simbrinza], and latanoprost [Xalatan]). Lower HRR scores and higher D15 CCI (both indicating worse color vision) were associated with greater VF damage (p<0.001). Extent of color vision deficiency and color descriptor heterogeneity were the only significant predictors of patient-physician agreement in multivariate models (odds of agreement = 0.90 per 1 point decrement in HRR score, p<0.001; odds of agreement = 0.30 for medications exhibiting high heterogeneity [≥ 11 descriptors], p=0.007). Conclusions Physician understanding of patient medication usage based solely on bottle cap color is frequently incorrect, particularly in glaucoma patients who may have color vision deficiency. Errors based on communication using bottle cap color alone may be common and could lead to confusion and harm. PMID:26260280

  14. Multiple QSAR models, pharmacophore pattern and molecular docking analysis for anticancer activity of α, β-unsaturated carbonyl-based compounds, oxime and oxime ether analogues

    NASA Astrophysics Data System (ADS)

    Masand, Vijay H.; El-Sayed, Nahed N. E.; Bambole, Mukesh U.; Quazi, Syed A.

    2018-04-01

    Multiple discrete quantitative structure-activity relationships (QSARs) models were constructed for the anticancer activity of α, β-unsaturated carbonyl-based compounds, oxime and oxime ether analogues with a variety of substituents like sbnd Br, sbnd OH, -OMe, etc. at different positions. A big pool of descriptors was considered for QSAR model building. Genetic algorithm (GA), available in QSARINS-Chem, was executed to choose optimum number and set of descriptors to create the multi-linear regression equations for a dataset of sixty-nine compounds. The newly developed five parametric models were subjected to exhaustive internal and external validation along with Y-scrambling using QSARINS-Chem, according to the OECD principles for QSAR model validation. The models were built using easily interpretable descriptors and accepted after confirming statistically robustness with high external predictive ability. The five parametric models were found to have R2 = 0.80 to 0.86, R2ex = 0.75 to 0.84, and CCCex = 0.85 to 0.90. The models indicate that frequency of nitrogen and oxygen atoms separated by five bonds from each other and internal electronic environment of the molecule have correlation with the anticancer activity.

  15. On the Development and Use of Large Chemical Similarity Networks, Informatics Best Practices and Novel Chemical Descriptors towards Materials Quantitative Structure Property Relationships

    ERIC Educational Resources Information Center

    Krein, Michael

    2011-01-01

    After decades of development and use in a variety of application areas, Quantitative Structure Property Relationships (QSPRs) and related descriptor-based statistical learning methods have achieved a level of infamy due to their misuse. The field is rife with past examples of overtrained models, overoptimistic performance assessment, and outright…

  16. [Quantitative relationship between gas chromatographic retention time and structural parameters of alkylphenols].

    PubMed

    Ruan, Xiaofang; Zhang, Ruisheng; Yao, Xiaojun; Liu, Mancang; Fan, Botao

    2007-03-01

    Alkylphenols are a group of permanent pollutants in the environment and could adversely disturb the human endocrine system. It is therefore important to effectively separate and measure the alkylphenols. To guide the chromatographic analysis of these compounds in practice, the development of quantitative relationship between the molecular structure and the retention time of alkylphenols becomes necessary. In this study, topological, constitutional, geometrical, electrostatic and quantum-chemical descriptors of 44 alkylphenols were calculated using a software, CODESSA, and these descriptors were pre-selected using the heuristic method. As a result, three-descriptor linear model (LM) was developed to describe the relationship between the molecular structure and the retention time of alkylphenols. Meanwhile, the non-linear regression model was also developed based on support vector machine (SVM) using the same three descriptors. The correlation coefficient (R(2)) for the LM and SVM was 0.98 and 0. 92, and the corresponding root-mean-square error was 0. 99 and 2. 77, respectively. By comparing the stability and prediction ability of the two models, it was found that the linear model was a better method for describing the quantitative relationship between the retention time of alkylphenols and the molecular structure. The results obtained suggested that the linear model could be applied for the chromatographic analysis of alkylphenols with known molecular structural parameters.

  17. Metal Oxide Nanomaterial QNAR Models: Available Structural Descriptors and Understanding of Toxicity Mechanisms

    PubMed Central

    Ying, Jiali; Zhang, Ting; Tang, Meng

    2015-01-01

    Metal oxide nanomaterials are widely used in various areas; however, the divergent published toxicology data makes it difficult to determine whether there is a risk associated with exposure to metal oxide nanomaterials. The application of quantitative structure activity relationship (QSAR) modeling in metal oxide nanomaterials toxicity studies can reduce the need for time-consuming and resource-intensive nanotoxicity tests. The nanostructure and inorganic composition of metal oxide nanomaterials makes this approach different from classical QSAR study; this review lists and classifies some structural descriptors, such as size, cation charge, and band gap energy, in recent metal oxide nanomaterials quantitative nanostructure activity relationship (QNAR) studies and discusses the mechanism of metal oxide nanomaterials toxicity based on these descriptors and traditional nanotoxicity tests. PMID:28347085

  18. Improving virtual screening predictive accuracy of Human kallikrein 5 inhibitors using machine learning models.

    PubMed

    Fang, Xingang; Bagui, Sikha; Bagui, Subhash

    2017-08-01

    The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Human action recognition based on point context tensor shape descriptor

    NASA Astrophysics Data System (ADS)

    Li, Jianjun; Mao, Xia; Chen, Lijiang; Wang, Lan

    2017-07-01

    Motion trajectory recognition is one of the most important means to determine the identity of a moving object. A compact and discriminative feature representation method can improve the trajectory recognition accuracy. This paper presents an efficient framework for action recognition using a three-dimensional skeleton kinematic joint model. First, we put forward a rotation-scale-translation-invariant shape descriptor based on point context (PC) and the normal vector of hypersurface to jointly characterize local motion and shape information. Meanwhile, an algorithm for extracting the key trajectory based on the confidence coefficient is proposed to reduce the randomness and computational complexity. Second, to decrease the eigenvalue decomposition time complexity, a tensor shape descriptor (TSD) based on PC that can globally capture the spatial layout and temporal order to preserve the spatial information of each frame is proposed. Then, a multilinear projection process is achieved by tensor dynamic time warping to map the TSD to a low-dimensional tensor subspace of the same size. Experimental results show that the proposed shape descriptor is effective and feasible, and the proposed approach obtains considerable performance improvement over the state-of-the-art approaches with respect to accuracy on a public action dataset.

  20. Heritabilities and genetic correlations in the same traits across different strata of herds created according to continuous genomic, genetic, and phenotypic descriptors.

    PubMed

    Yin, Tong; König, Sven

    2018-03-01

    The most common approach in dairy cattle to prove genotype by environment interactions is a multiple-trait model application, and considering the same traits in different environments as different traits. We enhanced such concepts by defining continuous phenotypic, genetic, and genomic herd descriptors, and applying random regression sire models. Traits of interest were test-day traits for milk yield, fat percentage, protein percentage, and somatic cell score, considering 267,393 records from 32,707 first-lactation Holstein cows. Cows were born in the years 2010 to 2013, and kept in 52 large-scale herds from 2 federal states of north-east Germany. The average number of genotyped cows per herd (45,613 single nucleotide polymorphism markers per cow) was 133.5 (range: 45 to 415 genotyped cows). Genomic herd descriptors were (1) the level of linkage disequilibrium (r 2 ) within specific chromosome segments, and (2) the average allele frequency for single nucleotide polymorphisms in close distance to a functional mutation. Genetic herd descriptors were the (1) intra-herd inbreeding coefficient, and (2) the percentage of daughters from foreign sires. Phenotypic herd descriptors were (1) herd size, and (2) the herd mean for nonreturn rate. Most correlations among herd descriptors were close to 0, indicating independence of genomic, genetic, and phenotypic characteristics. Heritabilities for milk yield increased with increasing intra-herd linkage disequilibrium, inbreeding, and herd size. Genetic correlations in same traits between adjacent levels of herd descriptors were close to 1, but declined for descriptor levels in greater distance. Genetic correlation declines were more obvious for somatic cell score, compared with test-day traits with larger heritabilities (fat percentage and protein percentage). Also, for milk yield, alterations of herd descriptor levels had an obvious effect on heritabilities and genetic correlations. By trend, multiple trait model results (based on created discrete herd classes) confirmed the random regression estimates. Identified alterations of breeding values in dependency of herd descriptors suggest utilization of specific sires for specific herd structures, offering new possibilities to improve sire selection strategies. Regarding genomic selection designs and genetic gain transfer into commercial herds, cow herds for the utilization in cow training sets should reflect the genomic, genetic, and phenotypic pattern of the broad population. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  1. Secure access control and large scale robust representation for online multimedia event detection.

    PubMed

    Liu, Changyu; Lu, Bin; Li, Huiling

    2014-01-01

    We developed an online multimedia event detection (MED) system. However, there are a secure access control issue and a large scale robust representation issue when we want to integrate traditional event detection algorithms into the online environment. For the first issue, we proposed a tree proxy-based and service-oriented access control (TPSAC) model based on the traditional role based access control model. Verification experiments were conducted on the CloudSim simulation platform, and the results showed that the TPSAC model is suitable for the access control of dynamic online environments. For the second issue, inspired by the object-bank scene descriptor, we proposed a 1000-object-bank (1000OBK) event descriptor. Feature vectors of the 1000OBK were extracted from response pyramids of 1000 generic object detectors which were trained on standard annotated image datasets, such as the ImageNet dataset. A spatial bag of words tiling approach was then adopted to encode these feature vectors for bridging the gap between the objects and events. Furthermore, we performed experiments in the context of event classification on the challenging TRECVID MED 2012 dataset, and the results showed that the robust 1000OBK event descriptor outperforms the state-of-the-art approaches.

  2. A Method for Automatic Surface Inspection Using a Model-Based 3D Descriptor.

    PubMed

    Madrigal, Carlos A; Branch, John W; Restrepo, Alejandro; Mery, Domingo

    2017-10-02

    Automatic visual inspection allows for the identification of surface defects in manufactured parts. Nevertheless, when defects are on a sub-millimeter scale, detection and recognition are a challenge. This is particularly true when the defect generates topological deformations that are not shown with strong contrast in the 2D image. In this paper, we present a method for recognizing surface defects in 3D point clouds. Firstly, we propose a novel 3D local descriptor called the Model Point Feature Histogram (MPFH) for defect detection. Our descriptor is inspired from earlier descriptors such as the Point Feature Histogram (PFH). To construct the MPFH descriptor, the models that best fit the local surface and their normal vectors are estimated. For each surface model, its contribution weight to the formation of the surface region is calculated and from the relative difference between models of the same region a histogram is generated representing the underlying surface changes. Secondly, through a classification stage, the points on the surface are labeled according to five types of primitives and the defect is detected. Thirdly, the connected components of primitives are projected to a plane, forming a 2D image. Finally, 2D geometrical features are extracted and by a support vector machine, the defects are recognized. The database used is composed of 3D simulated surfaces and 3D reconstructions of defects in welding, artificial teeth, indentations in materials, ceramics and 3D models of defects. The quantitative and qualitative results showed that the proposed method of description is robust to noise and the scale factor, and it is sufficiently discriminative for detecting some surface defects. The performance evaluation of the proposed method was performed for a classification task of the 3D point cloud in primitives, reporting an accuracy of 95%, which is higher than for other state-of-art descriptors. The rate of recognition of defects was close to 94%.

  3. A Method for Automatic Surface Inspection Using a Model-Based 3D Descriptor

    PubMed Central

    Branch, John W.

    2017-01-01

    Automatic visual inspection allows for the identification of surface defects in manufactured parts. Nevertheless, when defects are on a sub-millimeter scale, detection and recognition are a challenge. This is particularly true when the defect generates topological deformations that are not shown with strong contrast in the 2D image. In this paper, we present a method for recognizing surface defects in 3D point clouds. Firstly, we propose a novel 3D local descriptor called the Model Point Feature Histogram (MPFH) for defect detection. Our descriptor is inspired from earlier descriptors such as the Point Feature Histogram (PFH). To construct the MPFH descriptor, the models that best fit the local surface and their normal vectors are estimated. For each surface model, its contribution weight to the formation of the surface region is calculated and from the relative difference between models of the same region a histogram is generated representing the underlying surface changes. Secondly, through a classification stage, the points on the surface are labeled according to five types of primitives and the defect is detected. Thirdly, the connected components of primitives are projected to a plane, forming a 2D image. Finally, 2D geometrical features are extracted and by a support vector machine, the defects are recognized. The database used is composed of 3D simulated surfaces and 3D reconstructions of defects in welding, artificial teeth, indentations in materials, ceramics and 3D models of defects. The quantitative and qualitative results showed that the proposed method of description is robust to noise and the scale factor, and it is sufficiently discriminative for detecting some surface defects. The performance evaluation of the proposed method was performed for a classification task of the 3D point cloud in primitives, reporting an accuracy of 95%, which is higher than for other state-of-art descriptors. The rate of recognition of defects was close to 94%. PMID:28974037

  4. Stochastic Analysis and Design of Heterogeneous Microstructural Materials System

    NASA Astrophysics Data System (ADS)

    Xu, Hongyi

    Advanced materials system refers to new materials that are comprised of multiple traditional constituents but complex microstructure morphologies, which lead to superior properties over the conventional materials. To accelerate the development of new advanced materials system, the objective of this dissertation is to develop a computational design framework and the associated techniques for design automation of microstructure materials systems, with an emphasis on addressing the uncertainties associated with the heterogeneity of microstructural materials. Five key research tasks are identified: design representation, design evaluation, design synthesis, material informatics and uncertainty quantification. Design representation of microstructure includes statistical characterization and stochastic reconstruction. This dissertation develops a new descriptor-based methodology, which characterizes 2D microstructures using descriptors of composition, dispersion and geometry. Statistics of 3D descriptors are predicted based on 2D information to enable 2D-to-3D reconstruction. An efficient sequential reconstruction algorithm is developed to reconstruct statistically equivalent random 3D digital microstructures. In design evaluation, a stochastic decomposition and reassembly strategy is developed to deal with the high computational costs and uncertainties induced by material heterogeneity. The properties of Representative Volume Elements (RVE) are predicted by stochastically reassembling SVE elements with stochastic properties into a coarse representation of the RVE. In design synthesis, a new descriptor-based design framework is developed, which integrates computational methods of microstructure characterization and reconstruction, sensitivity analysis, Design of Experiments (DOE), metamodeling and optimization the enable parametric optimization of the microstructure for achieving the desired material properties. Material informatics is studied to efficiently reduce the dimension of microstructure design space. This dissertation develops a machine learning-based methodology to identify the key microstructure descriptors that highly impact properties of interest. In uncertainty quantification, a comparative study on data-driven random process models is conducted to provide guidance for choosing the most accurate model in statistical uncertainty quantification. Two new goodness-of-fit metrics are developed to provide quantitative measurements of random process models' accuracy. The benefits of the proposed methods are demonstrated by the example of designing the microstructure of polymer nanocomposites. This dissertation provides material-generic, intelligent modeling/design methodologies and techniques to accelerate the process of analyzing and designing new microstructural materials system.

  5. Inductive generalization with familiar categories: developmental changes in children's reliance on perceptual similarity and kind information

    PubMed Central

    Godwin, Karrie E.; Fisher, Anna V.

    2015-01-01

    Inductive generalization is ubiquitous in human cognition; however, the factors underpinning this ability early in development remain contested. The present study was designed to (1) test the predictions of the naïve theory and a similarity-based account and (2) examine the mechanism by which labels promote induction. In Experiment 1, 3- to 5-year-old children made inferences about highly familiar categories. The results were not fully consistent with either theoretical account. In contrast to the predictions of the naïve theory approach, the youngest children in the study did not ignore perceptually compelling lures in favor of category-match items; in contrast to the predictions of the similarity-based account, no group of participants favored perceptually compelling lures in the presence of dissimilar-looking category-match items. In Experiment 2 we investigated the mechanisms by which labels promote induction by examining the influence of different label types, namely category labels (e.g., the target and category-match both labeled as bird) and descriptor labels (e.g., the target and the perceptual lure both labeled as brown) on induction performance. In contrast to the predictions of the naïve theory approach, descriptor labels but not category labels affected induction in 3-year-old children. Consistent with the predictions of the similarity-based account, descriptor labels affected the performance of children in all age groups included in the study. The implications of these findings for the developmental account of induction are discussed. PMID:26217254

  6. Inductive generalization with familiar categories: developmental changes in children's reliance on perceptual similarity and kind information.

    PubMed

    Godwin, Karrie E; Fisher, Anna V

    2015-01-01

    Inductive generalization is ubiquitous in human cognition; however, the factors underpinning this ability early in development remain contested. The present study was designed to (1) test the predictions of the naïve theory and a similarity-based account and (2) examine the mechanism by which labels promote induction. In Experiment 1, 3- to 5-year-old children made inferences about highly familiar categories. The results were not fully consistent with either theoretical account. In contrast to the predictions of the naïve theory approach, the youngest children in the study did not ignore perceptually compelling lures in favor of category-match items; in contrast to the predictions of the similarity-based account, no group of participants favored perceptually compelling lures in the presence of dissimilar-looking category-match items. In Experiment 2 we investigated the mechanisms by which labels promote induction by examining the influence of different label types, namely category labels (e.g., the target and category-match both labeled as bird) and descriptor labels (e.g., the target and the perceptual lure both labeled as brown) on induction performance. In contrast to the predictions of the naïve theory approach, descriptor labels but not category labels affected induction in 3-year-old children. Consistent with the predictions of the similarity-based account, descriptor labels affected the performance of children in all age groups included in the study. The implications of these findings for the developmental account of induction are discussed.

  7. Virtual lock-and-key approach: the in silico revival of Fischer model by means of molecular descriptors.

    PubMed

    Lauria, Antonino; Tutone, Marco; Almerico, Anna Maria

    2011-09-01

    In the last years the application of computational methodologies in the medicinal chemistry fields has found an amazing development. All the efforts were focused on the searching of new leads featuring a close affinity on a specific biological target. Thus, different molecular modeling approaches in simulation of molecular behavior for a specific biological target were employed. In spite of the increasing reliability of computational methodologies, not always the designed lead, once synthesized and screened, are suitable for the chosen biological target. To give another chance to these compounds, this work tries to resume the old concept of Fischer lock-and-key model. The same can be done for the "re-purposing" of old drugs. In fact, it is known that drugs may have many physiological targets, therefore it may be useful to identify them. This aspect, called "polypharmacology", is known to be therapeutically essential in the different treatments. The proposed protocol, the virtual lock-and-key approach (VLKA), consists in the "virtualization" of biological targets through the respectively known inhibitors. In order to release a real lock it is necessary the key fits the pins of the lock. The molecular descriptors could be considered as pins. A tested compound can be considered a potential inhibitor of a biological target if the values of its molecular descriptors fall in the calculated range values for the set of known inhibitors. The proposed protocol permits to transform a biological target in a "lock model" starting from its known inhibitors. To release a real lock all pins must fit. In the proposed protocol, it was supposed that the higher is the number of fit pins, the higher will be the affinity to the considered biological target. Therefore, each biological target was converted in a sequence of "weighted" molecular descriptor range values (locks) by using the structural features of the known inhibitors. Each biological target lock was tested by performing a molecular descriptors "fitting" on known inhibitors not used in the model construction (keys or test set). The results showed a good predictive capability of the protocol (confidence level 80%). This method gives interesting and convenient results because of the user-defined descriptors and biological targets choice in the process of new inhibitors discovery. Copyright © 2011 Elsevier Masson SAS. All rights reserved.

  8. Nanoparticle surface characterization and clustering through concentration-dependent surface adsorption modeling.

    PubMed

    Chen, Ran; Zhang, Yuntao; Sahneh, Faryad Darabi; Scoglio, Caterina M; Wohlleben, Wendel; Haase, Andrea; Monteiro-Riviere, Nancy A; Riviere, Jim E

    2014-09-23

    Quantitative characterization of nanoparticle interactions with their surrounding environment is vital for safe nanotechnological development and standardization. A recent quantitative measure, the biological surface adsorption index (BSAI), has demonstrated promising applications in nanomaterial surface characterization and biological/environmental prediction. This paper further advances the approach beyond the application of five descriptors in the original BSAI to address the concentration dependence of the descriptors, enabling better prediction of the adsorption profile and more accurate categorization of nanomaterials based on their surface properties. Statistical analysis on the obtained adsorption data was performed based on three different models: the original BSAI, a concentration-dependent polynomial model, and an infinite dilution model. These advancements in BSAI modeling showed a promising development in the application of quantitative predictive modeling in biological applications, nanomedicine, and environmental safety assessment of nanomaterials.

  9. A systematic approach to prioritize drug targets using machine learning, a molecular descriptor-based classification model, and high-throughput screening of plant derived molecules: a case study in oral cancer.

    PubMed

    Randhawa, Vinay; Kumar Singh, Anil; Acharya, Vishal

    2015-12-01

    Systems-biology inspired identification of drug targets and machine learning-based screening of small molecules which modulate their activity have the potential to revolutionize modern drug discovery by complementing conventional methods. To utilize the effectiveness of such pipelines, we first analyzed the dysregulated gene pairs between control and tumor samples and then implemented an ensemble-based feature selection approach to prioritize targets in oral squamous cell carcinoma (OSCC) for therapeutic exploration. Based on the structural information of known inhibitors of CXCR4-one of the best targets identified in this study-a feature selection was implemented for the identification of optimal structural features (molecular descriptor) based on which a classification model was generated. Furthermore, the CXCR4-centered descriptor-based classification model was finally utilized to screen a repository of plant derived small-molecules to obtain potential inhibitors. The application of our methodology may assist effective selection of the best targets which may have previously been overlooked, that in turn will lead to the development of new oral cancer medications. The small molecules identified in this study can be ideal candidates for trials as potential novel anti-oral cancer agents. Importantly, distinct steps of this whole study may provide reference for the analysis of other complex human diseases.

  10. The importance of molecular structures, endpoints' values, and predictivity parameters in QSAR research: QSAR analysis of a series of estrogen receptor binders.

    PubMed

    Li, Jiazhong; Gramatica, Paola

    2010-11-01

    Quantitative structure-activity relationship (QSAR) methodology aims to explore the relationship between molecular structures and experimental endpoints, producing a model for the prediction of new data; the predictive performance of the model must be checked by external validation. Clearly, the qualities of chemical structure information and experimental endpoints, as well as the statistical parameters used to verify the external predictivity have a strong influence on QSAR model reliability. Here, we emphasize the importance of these three aspects by analyzing our models on estrogen receptor binders (Endocrine disruptor knowledge base (EDKB) database). Endocrine disrupting chemicals, which mimic or antagonize the endogenous hormones such as estrogens, are a hot topic in environmental and toxicological sciences. QSAR shows great values in predicting the estrogenic activity and exploring the interactions between the estrogen receptor and ligands. We have verified our previously published model for additional external validation on new EDKB chemicals. Having found some errors in the used 3D molecular conformations, we redevelop a new model using the same data set with corrected structures, the same method (ordinary least-square regression, OLS) and DRAGON descriptors. The new model, based on some different descriptors, is more predictive on external prediction sets. Three different formulas to calculate correlation coefficient for the external prediction set (Q2 EXT) were compared, and the results indicated that the new proposal of Consonni et al. had more reasonable results, consistent with the conclusions from regression line, Williams plot and root mean square error (RMSE) values. Finally, the importance of reliable endpoints values has been highlighted by comparing the classification assignments of EDKB with those of another estrogen receptor binders database (METI): we found that 16.1% assignments of the common compounds were opposite (20 among 124 common compounds). In order to verify the real assignments for these inconsistent compounds, we predicted these samples, as a blind external set, by our regression models and compared the results with the two databases. The results indicated that most of the predictions were consistent with METI. Furthermore, we built a kNN classification model using the 104 consistent compounds to predict those inconsistent ones, and most of the predictions were also in agreement with METI database.

  11. Model of twelve properties of a set of organic solvents with graph-theoretical and/or experimental parameters.

    PubMed

    Pogliani, Lionello

    2010-01-30

    Twelve properties of a highly heterogeneous class of organic solvents have been modeled with a graph-theoretical molecular connectivity modified (MC) method, which allows to encode the core electrons and the hydrogen atoms. The graph-theoretical method uses the concepts of simple, general, and complete graphs, where these last types of graphs are used to encode the core electrons. The hydrogen atoms have been encoded by the aid of a graph-theoretical perturbation parameter, which contributes to the definition of the valence delta, delta(v), a key parameter in molecular connectivity studies. The model of the twelve properties done with a stepwise search algorithm is always satisfactory, and it allows to check the influence of the hydrogen content of the solvent molecules on the choice of the type of descriptor. A similar argument holds for the influence of the halogen atoms on the type of core electron representation. In some cases the molar mass, and in a minor way, special "ad hoc" parameters have been used to improve the model. A very good model of the surface tension could be obtained by the aid of five experimental parameters. A mixed model method based on experimental parameters plus molecular connectivity indices achieved, instead, to consistently improve the model quality of five properties. To underline is the importance of the boiling point temperatures as descriptors in these last two model methodologies. Copyright 2009 Wiley Periodicals, Inc.

  12. Impact of metal ionic characteristics on adsorption potential of Ficus carica leaves using QSPR modeling.

    PubMed

    Batool, Fozia; Iqbal, Shahid; Akbar, Jamshed

    2018-04-03

    The present study describes Quantitative Structure Property Relationship (QSPR) modeling to relate metal ions characteristics with adsorption potential of Ficus carica leaves for 13 selected metal ions (Ca +2 , Cr +3 , Co +2 , Cu +2 , Cd +2 , K +1 , Mg +2 , Mn +2 , Na +1 , Ni +2 , Pb +2 , Zn +2 , and Fe +2 ) to generate QSPR model. A set of 21 characteristic descriptors were selected and relationship of these metal characteristics with adsorptive behavior of metal ions was investigated. Stepwise Multiple Linear Regression (SMLR) analysis and Artificial Neural Network (ANN) were applied for descriptors selection and model generation. Langmuir and Freundlich isotherms were also applied on adsorption data to generate proper correlation for experimental findings. Model generated indicated covalent index as the most significant descriptor, which is responsible for more than 90% predictive adsorption (α = 0.05). Internal validation of model was performed by measuring [Formula: see text] (0.98). The results indicate that present model is a useful tool for prediction of adsorptive behavior of different metal ions based on their ionic characteristics.

  13. Calculation of the octanol-water partition coefficient of armchair polyhex BN nanotubes

    NASA Astrophysics Data System (ADS)

    Mohammadinasab, E.; Pérez-Sánchez, H.; Goodarzi, M.

    2017-12-01

    A predictive model for determination partition coefficient (log P) of armchair polyhex BN nanotubes by using simple descriptors was built. The relationship between the octanol-water log P and quantum chemical descriptors, electric moments, and topological indices of some armchair polyhex BN nanotubes with various lengths and fixed circumference are represented. Based on density functional theory electric moments and physico-chemical properties of those nanotubes are calculated.

  14. Computer-assisted shape descriptors for skull morphology in craniosynostosis.

    PubMed

    Shim, Kyu Won; Lee, Min Jin; Lee, Myung Chul; Park, Eun Kyung; Kim, Dong Seok; Hong, Helen; Kim, Yong Oock

    2016-03-01

    Our aim was to develop a novel method for characterizing common skull deformities with high sensitivity and specificity, based on two-dimensional (2D) shape descriptors in computed tomography (CT) images. Between 2003 and 2014, 44 normal subjects and 39 infants with craniosynostosis (sagittal, 29; bicoronal, 10) enrolled for analysis. Mean age overall was 16 months (range, 1-120 months), with a male:female ratio of 56:29. Two reference planes, sagittal (S-plane: through top of lateral ventricle) and coronal (C-plane: at maximum dimension of fourth ventricle), were utilized to formulate three 2D shape descriptors (cranial index [CI], cranial radius index [CR], and cranial extreme spot index [CES]), which were then applied to S- and C-plane target images of both groups. In infants with sagittal craniosynostosis, CI in S-plane (S-CI) usually was <1.0 (mean, 0.78; range, 0.67-0.95), with CR consistently at 3 and a characteristic CES pattern of two discrete hot spots oriented diagonally. In the bicoronal craniosynostosis subset, CI was >1.0 (mean 1.11; range, 1.04-1.25), with CR at -3 and a CES pattern of four discrete diagonally oriented hot spots. Scatter plots underscored the highly intuitive joint performance of CI and CES in distinguishing normal and deformed states. Altogether, these novel 2D shape descriptors enabled effective discrimination of sagittal and bicoronal skull deformities. Newly developed 2D shape descriptors for cranial CT imaging enabled recognition of common skull deformities with statistical significance, perhaps providing impetus for automated CT-based diagnosis of craniosynostosis.

  15. Solubility of organic compounds in octanol: Improved predictions based on the geometrical fragment approach.

    PubMed

    Mathieu, Didier

    2017-09-01

    Two new models are introduced to predict the solubility of chemicals in octanol (S oct ), taking advantage of the extensive character of log(S oct ) through a decomposition of molecules into so-called geometrical fragments (GF). They are extensively validated and their compliance with regulatory requirements is demonstrated. The first model requires just a molecular formula as input. Despite an extreme simplicity, it performs as well as an advanced random forest model involving 86 descriptors, with a root mean square error (RMSE) of 0.64 log units for an external test set of 100 molecules. For the second one, which requires the melting point T m as input, introducing GF descriptors reduces the RMSE from about 0.7 to <0.5 log units, a performance that could previously be obtained only through the use of Abraham descriptors. A script is provided for easy application of the models, taking into account the limits of their applicability domains. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Random forest models to predict aqueous solubility.

    PubMed

    Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O

    2007-01-01

    Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.

  17. The contribution of the hydrogen bond acidity on the lipophilicity of drugs estimated from chromatographic measurements.

    PubMed

    Pallicer, Juan M; Pascual, Rosalia; Port, Adriana; Rosés, Martí; Ràfols, Clara; Bosch, Elisabeth

    2013-02-14

    The influence of the hydrogen bond acidity when the 1-octanol/water partition coefficient (log P(o/w)) of drugs is determined from chromatographic measurements was studied in this work. This influence was firstly evaluated by means of the comparison between the Abraham solvation parameter model when it is applied to express the 1-octanol/water partitioning and the chromatographic retention, expressed as the solute polarity p. Then, several hydrogen bond acidity descriptors were compared in order to determine properly the log P(o/w) of drugs. These descriptors were obtained from different software and comprise two-dimensional parameters such as the calculated Abraham hydrogen bond acidity A and three-dimensional descriptors like HDCA-2 from CODESSA program or WO1 and DRDODO descriptors calculated from Volsurf+software. The additional HOMO-LUMO polarizability descriptor should be added when the three-dimensional descriptors are used to complement the chromatographic retention. The models generated using these descriptors were compared studying the correlations between the determined log P(o/w) values and the reference ones. The comparison showed that there was no significant difference between the tested models and any of them was able to determine the log P(o/w) of drugs from a single chromatographic measurement and the correspondent molecular descriptors terms. However, the model that involved the calculated A descriptor was simpler and it is thus recommended for practical uses. Copyright © 2012 Elsevier B.V. All rights reserved.

  18. Structure-activity relationships between sterols and their thermal stability in oil matrix.

    PubMed

    Hu, Yinzhou; Xu, Junli; Huang, Weisu; Zhao, Yajing; Li, Maiquan; Wang, Mengmeng; Zheng, Lufei; Lu, Baiyi

    2018-08-30

    Structure-activity relationships between 20 sterols and their thermal stabilities were studied in a model oil system. All sterol degradations were found to be consistent with a first-order kinetic model with determination of coefficient (R 2 ) higher than 0.9444. The number of double bonds in the sterol structure was negatively correlated with the thermal stability of sterol, whereas the length of the branch chain was positively correlated with the thermal stability of sterol. A quantitative structure-activity relationship (QSAR) model to predict thermal stability of sterol was developed by using partial least squares regression (PLSR) combined with genetic algorithm (GA). A regression model was built with R 2 of 0.806. Almost all sterol degradation constants can be predicted accurately with R 2 of cross-validation equals to 0.680. Four important variables were selected in optimal QSAR model and the selected variables were observed to be related with information indices, RDF descriptors, and 3D-MoRSE descriptors. Copyright © 2018 Elsevier Ltd. All rights reserved.

  19. Modeling the tendency for music to induce movement in humans: first correlations with low-level audio descriptors across music genres.

    PubMed

    Madison, Guy; Gouyon, Fabien; Ullén, Fredrik; Hörnström, Kalle

    2011-10-01

    Groove is often described as the experience of music that makes people tap their feet and want to dance. A high degree of consistency in ratings of groove across listeners indicates that physical properties of the sound signal contribute to groove (Madison, 2006). Here, correlations were assessed between listeners' ratings and a number of quantitative descriptors of rhythmic properties for one hundred music examples from five distinct traditional music genres. Groove was related to several different rhythmic properties, some of which were genre-specific and some of which were general across genres. Two descriptors corresponding to the density of events between beats and the salience of the beat, respectively, were strongly correlated with groove across domains. In contrast, systematic deviations from strict positions on the metrical grid, so-called microtiming, did not play any significant role. The results are discussed from a functional perspective of rhythmic music to enable and facilitate entrainment and precise synchronization among individuals.

  20. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches.

    PubMed

    Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.

  1. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches

    PubMed Central

    Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969

  2. Genome-wide association analysis to identify genotype × environment interaction for milk protein yield and level of somatic cell score as environmental descriptors in German Holsteins.

    PubMed

    Streit, M; Reinhardt, F; Thaller, G; Bennewitz, J

    2013-01-01

    Genotype by environment interaction (G × E) has been widely reported in dairy cattle. If the environment can be measured on a continuous scale, reaction norms can be applied to study G × E. The average herd milk production level has frequently been used as an environmental descriptor because it is influenced by the level of feeding or the feeding regimen. Another important environmental factor is the level of udder health and hygiene, for which the average herd somatic cell count might be a descriptor. In the present study, we conducted a genome-wide association analysis to identify single nucleotide polymorphisms (SNP) that affect intercept and slope of milk protein yield reaction norms when using the average herd test-day solution for somatic cell score as an environmental descriptor. Sire estimates for intercept and slope of the reaction norms were calculated from around 12 million daughter records, using linear reaction norm models. Sires were genotyped for ~54,000 SNP. The sire estimates were used as observations in the association analysis, using 1,797 sires. Significant SNP were confirmed in an independent validation set consisting of 500 sires. A known major gene affecting protein yield was included as a covariable in the statistical model. Sixty (21) SNP were confirmed for intercept with P ≤ 0.01 (P ≤ 0.001) in the validation set, and 28 and 11 SNP, respectively, were confirmed for slope. Most but not all SNP affecting slope also affected intercept. Comparison with an earlier study revealed that SNP affecting slope were, in general, also significant for slope when the environment was modeled by the average herd milk production level, although the two environmental descriptors were poorly correlated. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  3. Descriptor selection for banana accessions based on univariate and multivariate analysis.

    PubMed

    Brandão, L P; Souza, C P F; Pereira, V M; Silva, S O; Santos-Serejo, J A; Ledo, C A S; Amorim, E P

    2013-05-14

    Our objective was to establish a minimum number of morphological descriptors for the characterization of banana germplasm and evaluate the efficiency of removal of redundant characters, based on univariate and multivariate statistical analyses. Phenotypic characterization was made of 77 accessions from Bahia, Brazil, using 92 descriptors. The selection of the descriptors was carried out by principal components analysis (quantitative) and by entropy (multi-category). Efficiency of elimination was analyzed by a comparative study between the clusters formed, taking into consideration all 92 descriptors and smaller groups. The selected descriptors were analyzed with the Ward-MLM procedure and a combined matrix formed by the Gower algorithm. We were able to reduce the number of descriptors used for characterizing the banana germplasm (42%). The correlation between the matrices considering the 92 descriptors and the selected ones was 0.82, showing that the reduction in the number of descriptors did not influence estimation of genetic variability between the banana accessions. We conclude that removing these descriptors caused no loss of information, considering the groups formed from pre-established criteria, including subgroup/subspecies.

  4. A Global Covariance Descriptor for Nuclear Atypia Scoring in Breast Histopathology Images.

    PubMed

    Khan, Adnan Mujahid; Sirinukunwattana, Korsuk; Rajpoot, Nasir

    2015-09-01

    Nuclear atypia scoring is a diagnostic measure commonly used to assess tumor grade of various cancers, including breast cancer. It provides a quantitative measure of deviation in visual appearance of cell nuclei from those in normal epithelial cells. In this paper, we present a novel image-level descriptor for nuclear atypia scoring in breast cancer histopathology images. The method is based on the region covariance descriptor that has recently become a popular method in various computer vision applications. The descriptor in its original form is not suitable for classification of histopathology images as cancerous histopathology images tend to possess diversely heterogeneous regions in a single field of view. Our proposed image-level descriptor, which we term as the geodesic mean of region covariance descriptors, possesses all the attractive properties of covariance descriptors lending itself to tractable geodesic-distance-based k-nearest neighbor classification using efficient kernels. The experimental results suggest that the proposed image descriptor yields high classification accuracy compared to a variety of widely used image-level descriptors.

  5. Alignment-independent comparison of binding sites based on DrugScore potential fields encoded by 3D Zernike descriptors.

    PubMed

    Nisius, Britta; Gohlke, Holger

    2012-09-24

    Analyzing protein binding sites provides detailed insights into the biological processes proteins are involved in, e.g., into drug-target interactions, and so is of crucial importance in drug discovery. Herein, we present novel alignment-independent binding site descriptors based on DrugScore potential fields. The potential fields are transformed to a set of information-rich descriptors using a series expansion in 3D Zernike polynomials. The resulting Zernike descriptors show a promising performance in detecting similarities among proteins with low pairwise sequence identities that bind identical ligands, as well as within subfamilies of one target class. Furthermore, the Zernike descriptors are robust against structural variations among protein binding sites. Finally, the Zernike descriptors show a high data compression power, and computing similarities between binding sites based on these descriptors is highly efficient. Consequently, the Zernike descriptors are a useful tool for computational binding site analysis, e.g., to predict the function of novel proteins, off-targets for drug candidates, or novel targets for known drugs.

  6. Ability of Bottle Cap Color to Facilitate Accurate Patient-Physician Communication Regarding Medication Identity in Patients with Glaucoma.

    PubMed

    Dave, Pujan; Villarreal, Guadalupe; Friedman, David S; Kahook, Malik Y; Ramulu, Pradeep Y

    2015-12-01

    To determine the accuracy of patient-physician communication regarding topical ophthalmic medication use based on bottle cap color, particularly among individuals who may have acquired color vision deficiency from glaucoma. Cross-sectional, clinical study. Patients aged ≥18 years with primary open-angle, primary angle-closure, pseudoexfoliation, or pigment dispersion glaucoma, bilateral visual acuity of ≥20/400, and no concurrent conditions that may affect color vision. A total of 100 patients provided color descriptions of 11 distinct medication bottle caps. Color descriptors were then presented to 3 physicians. Physicians matched each color descriptor to the medication they thought the descriptor was describing. Frequency of patient-physician agreement, occurring when all 3 physicians accurately matched the color descriptor to the correct medication. Multivariate regression models evaluated whether patient-physician agreement decreased with degree of better-eye visual field (VF) damage, color descriptor heterogeneity, or color vision deficiency, as determined by the Hardy-Rand-Rittler (HRR) score and Lanthony D15 color confusion index (D15 CCI). Subjects had a mean age of 69 (±11) years, with VF mean deviation of -4.7 (±6.0) and -10.9 (±8.4) decibels (dB) in the better- and worse-seeing eyes, respectively. Patients produced 102 unique color descriptors to describe the colors of the 11 bottle caps. Among individual patients, the mean number of medications demonstrating agreement was 6.1/11 (55.5%). Agreement was less than 15% for 4 medications (prednisolone acetate [generic], betaxolol HCl [Betoptic; Alcon Laboratories Inc., Fort Worth, TX], brinzolamide/brimonidine [Simbrinza; Alcon Laboratories Inc.], and latanoprost [Xalatan; Pfizer, Inc., New York, NY]). Lower HRR scores and higher D15 CCI (both indicating worse color vision) were associated with greater VF damage (P < 0.001). Extent of color vision deficiency and color descriptor heterogeneity significantly predicted agreement in multivariate models (odds of agreement = 0.90 per 1 point decrement in HRR score, P < 0.001; odds of agreement = 0.30 for medications exhibiting high heterogeneity [≥11 descriptors], P = 0.007). Physician understanding of patient medication use based solely on bottle cap color is frequently incorrect, particularly in patients with glaucoma who may have color vision deficiency. Errors based on communication using bottle cap color alone may be common and could lead to confusion and harm. Copyright © 2015 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  7. Secure Access Control and Large Scale Robust Representation for Online Multimedia Event Detection

    PubMed Central

    Liu, Changyu; Li, Huiling

    2014-01-01

    We developed an online multimedia event detection (MED) system. However, there are a secure access control issue and a large scale robust representation issue when we want to integrate traditional event detection algorithms into the online environment. For the first issue, we proposed a tree proxy-based and service-oriented access control (TPSAC) model based on the traditional role based access control model. Verification experiments were conducted on the CloudSim simulation platform, and the results showed that the TPSAC model is suitable for the access control of dynamic online environments. For the second issue, inspired by the object-bank scene descriptor, we proposed a 1000-object-bank (1000OBK) event descriptor. Feature vectors of the 1000OBK were extracted from response pyramids of 1000 generic object detectors which were trained on standard annotated image datasets, such as the ImageNet dataset. A spatial bag of words tiling approach was then adopted to encode these feature vectors for bridging the gap between the objects and events. Furthermore, we performed experiments in the context of event classification on the challenging TRECVID MED 2012 dataset, and the results showed that the robust 1000OBK event descriptor outperforms the state-of-the-art approaches. PMID:25147840

  8. QSPR models for various physical properties of carbohydrates based on molecular mechanics and quantum chemical calculations.

    PubMed

    Dyekjaer, Jane Dannow; Jónsdóttir, Svava Osk

    2004-01-22

    Quantitative Structure-Property Relationships (QSPR) have been developed for a series of monosaccharides, including the physical properties of partial molar heat capacity, heat of solution, melting point, heat of fusion, glass-transition temperature, and solid state density. The models were based on molecular descriptors obtained from molecular mechanics and quantum chemical calculations, combined with other types of descriptors. Saccharides exhibit a large degree of conformational flexibility, therefore a methodology for selecting the energetically most favorable conformers has been developed, and was used for the development of the QSPR models. In most cases good correlations were obtained for monosaccharides. For five of the properties predictions were made for disaccharides, and the predicted values for the partial molar heat capacities were in excellent agreement with experimental values.

  9. Developing Enhanced Blood–Brain Barrier Permeability Models: Integrating External Bio-Assay Data in QSAR Modeling

    PubMed Central

    Wang, Wenyi; Kim, Marlene T.; Sedykh, Alexander

    2015-01-01

    Purpose Experimental Blood–Brain Barrier (BBB) permeability models for drug molecules are expensive and time-consuming. As alternative methods, several traditional Quantitative Structure-Activity Relationship (QSAR) models have been developed previously. In this study, we aimed to improve the predictivity of traditional QSAR BBB permeability models by employing relevant public bio-assay data in the modeling process. Methods We compiled a BBB permeability database consisting of 439 unique compounds from various resources. The database was split into a modeling set of 341 compounds and a validation set of 98 compounds. Consensus QSAR modeling workflow was employed on the modeling set to develop various QSAR models. A five-fold cross-validation approach was used to validate the developed models, and the resulting models were used to predict the external validation set compounds. Furthermore, we used previously published membrane transporter models to generate relevant transporter profiles for target compounds. The transporter profiles were used as additional biological descriptors to develop hybrid QSAR BBB models. Results The consensus QSAR models have R2=0.638 for fivefold cross-validation and R2=0.504 for external validation. The consensus model developed by pooling chemical and transporter descriptors showed better predictivity (R2=0.646 for five-fold cross-validation and R2=0.526 for external validation). Moreover, several external bio-assays that correlate with BBB permeability were identified using our automatic profiling tool. Conclusions The BBB permeability models developed in this study can be useful for early evaluation of new compounds (e.g., new drug candidates). The combination of chemical and biological descriptors shows a promising direction to improve the current traditional QSAR models. PMID:25862462

  10. The QSAR study of flavonoid-metal complexes scavenging rad OH free radical

    NASA Astrophysics Data System (ADS)

    Wang, Bo-chu; Qian, Jun-zhen; Fan, Ying; Tan, Jun

    2014-10-01

    Flavonoid-metal complexes have antioxidant activities. However, quantitative structure-activity relationships (QSAR) of flavonoid-metal complexes and their antioxidant activities has still not been tackled. On the basis of 21 structures of flavonoid-metal complexes and their antioxidant activities for scavenging rad OH free radical, we optimised their structures using Gaussian 03 software package and we subsequently calculated and chose 18 quantum chemistry descriptors such as dipole, charge and energy. Then we chose several quantum chemistry descriptors that are very important to the IC50 of flavonoid-metal complexes for scavenging rad OH free radical through method of stepwise linear regression, Meanwhile we obtained 4 new variables through the principal component analysis. Finally, we built the QSAR models based on those important quantum chemistry descriptors and the 4 new variables as the independent variables and the IC50 as the dependent variable using an Artificial Neural Network (ANN), and we validated the two models using experimental data. These results show that the two models in this paper are reliable and predictable.

  11. A “loop” shape descriptor and its application to automated segmentation of airways from CT scans

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pu, Jiantao; Jin, Chenwang, E-mail: jcw76@163.com; Yu, Nan

    2015-06-15

    Purpose: A novel shape descriptor is presented to aid an automated identification of the airways depicted on computed tomography (CT) images. Methods: Instead of simplifying the tubular characteristic of the airways as an ideal mathematical cylindrical or circular shape, the proposed “loop” shape descriptor exploits the fact that the cross sections of any tubular structure (regardless of its regularity) always appear as a loop. In implementation, the authors first reconstruct the anatomical structures in volumetric CT as a three-dimensional surface model using the classical marching cubes algorithm. Then, the loop descriptor is applied to locate the airways with a concavemore » loop cross section. To deal with the variation of the airway walls in density as depicted on CT images, a multiple threshold strategy is proposed. A publicly available chest CT database consisting of 20 CT scans, which was designed specifically for evaluating an airway segmentation algorithm, was used for quantitative performance assessment. Measures, including length, branch count, and generations, were computed under the aid of a skeletonization operation. Results: For the test dataset, the airway length ranged from 64.6 to 429.8 cm, the generation ranged from 7 to 11, and the branch number ranged from 48 to 312. These results were comparable to the performance of the state-of-the-art algorithms validated on the same dataset. Conclusions: The authors’ quantitative experiment demonstrated the feasibility and reliability of the developed shape descriptor in identifying lung airways.« less

  12. Detecting reactive islands using Lagrangian descriptors and the relevance to transition path sampling.

    PubMed

    Patra, Sarbani; Keshavamurthy, Srihari

    2018-02-14

    It has been known for sometime now that isomerization reactions, classically, are mediated by phase space structures called reactive islands (RI). RIs provide one possible route to correct for the nonstatistical effects in the reaction dynamics. In this work, we map out the reactive islands for the two dimensional Müller-Brown model potential and show that the reactive islands are intimately linked to the issue of rare event sampling. In particular, we establish the sensitivity of the so called committor probabilities, useful quantities in the transition path sampling technique, to the hierarchical RI structures. Mapping out the RI structure for high dimensional systems, however, is a challenging task. Here, we show that the technique of Lagrangian descriptors is able to effectively identify the RI hierarchy in the model system. Based on our results, we suggest that the Lagrangian descriptors can be useful for detecting RIs in high dimensional systems.

  13. Comparative study of solvation parameter models accounting the effects of mobile phase composition in reversed-phase liquid chromatography.

    PubMed

    Torres-Lapasió, J R; Ruiz-Angel, M J; García-Alvarez-Coque, M C

    2007-09-28

    Solvation parameter models relate linearly compound properties with five fundamental solute descriptors (excess molar refraction, dipolarity/polarizability, effective hydrogen-bond acidity and basicity, and McGowan volume). These models are widely used, due to the availability of protocols to obtain the descriptors, good performance, and general applicability. Several approaches to predict retention in reversed-phase liquid chromatography (RPLC) as a function of these descriptors and mobile phase composition are compared, assaying the performance with a set of 146 organic compounds of diverse nature, eluted with acetonitrile and methanol. The approaches are classified in two groups: those that only allow predictions of retention for the mobile phases used to build the models, and those valid at any other mobile phase composition. The first group includes the use of ratios between the regressed coefficients of the solvation models that are assumed to be characteristic for a column/solvent system, and the application of offsets to transfer the retention from a reference mobile phase to any other. Maximal accuracy in predictions corresponded, however, to the approaches in the second group, which were based on models that describe the retention as a function of mobile phase composition (expressed as the solvent volume fraction or a normalised polarity measurement), where the coefficients were made dependent on the solvent descriptors. The study revealed the properties that influence the retention and distinguish the particular behaviour of acetonitrile and methanol in RPLC.

  14. Odor-color associations differ with verbal descriptors for odors: A comparison of three linguistically diverse groups.

    PubMed

    de Valk, Josje M; Wnuk, Ewelina; Huisman, John L A; Majid, Asifa

    2017-08-01

    People appear to have systematic associations between odors and colors. Previous research has emphasized the perceptual nature of these associations, but little attention has been paid to what role language might play. It is possible odor-color associations arise through a process of labeling; that is, participants select a descriptor for an odor and then choose a color accordingly (e.g., banana odor → "banana" label → yellow). If correct, this would predict odor-color associations would differ as odor descriptions differ. We compared speakers of Dutch (who overwhelmingly describe odors by referring to the source; e.g., smells like banana) with speakers of Maniq and Thai (who also describe odors with dedicated, abstract smell vocabulary; e.g., musty), and tested whether the type of descriptor mattered for odor-color associations. Participants were asked to select a color that they associated with an odor on two separate occasions (to test for consistency), and finally to label the odors. We found the hunter-gatherer Maniq showed few, if any, consistent or accurate odor-color associations. More importantly, we found the types of descriptors used to name the smells were related to the odor-color associations. When people used abstract smell terms to describe odors, they were less likely to choose a color match, but when they described an odor with a source-based term, their color choices more accurately reflected the odor source, particularly when the odor source was named correctly (e.g., banana odor → yellow). This suggests language is an important factor in odor-color cross-modal associations.

  15. Prediction of passive blood-brain partitioning: straightforward and effective classification models based on in silico derived physicochemical descriptors

    PubMed Central

    Vilar, Santiago; Chakrabarti, Mayukh; Costanzi, Stefano

    2010-01-01

    The distribution of compounds between blood and brain is a very important consideration for new candidate drug molecules. In this paper, we describe the derivation of two linear discriminant analysis (LDA) models for the prediction of passive blood-brain partitioning, expressed in terms of log BB values. The models are based on computationally derived physicochemical descriptors, namely the octanol/water partition coefficient (log P), the topological polar surface area (TPSA) and the total number of acidic and basic atoms, and were obtained using a homogeneous training set of 307 compounds, for all of which the published experimental log BB data had been determined in vivo. In particular, since molecules with log BB > 0.3 cross the blood-brain barrier (BBB) readily while molecules with log BB < −1 are poorly distributed to the brain, on the basis of these thresholds we derived two distinct models, both of which show a percentage of good classification of about 80%. Notably, the predictive power of our models was confirmed by the analysis of a large external dataset of compounds with reported activity on the central nervous system (CNS) or lack thereof. The calculation of straightforward physicochemical descriptors is the only requirement for the prediction of the log BB of novel compounds through our models, which can be conveniently applied in conjunction with drug design and virtual screenings. PMID:20427217

  16. Prediction of passive blood-brain partitioning: straightforward and effective classification models based on in silico derived physicochemical descriptors.

    PubMed

    Vilar, Santiago; Chakrabarti, Mayukh; Costanzi, Stefano

    2010-06-01

    The distribution of compounds between blood and brain is a very important consideration for new candidate drug molecules. In this paper, we describe the derivation of two linear discriminant analysis (LDA) models for the prediction of passive blood-brain partitioning, expressed in terms of logBB values. The models are based on computationally derived physicochemical descriptors, namely the octanol/water partition coefficient (logP), the topological polar surface area (TPSA) and the total number of acidic and basic atoms, and were obtained using a homogeneous training set of 307 compounds, for all of which the published experimental logBB data had been determined in vivo. In particular, since molecules with logBB>0.3 cross the blood-brain barrier (BBB) readily while molecules with logBB<-1 are poorly distributed to the brain, on the basis of these thresholds we derived two distinct models, both of which show a percentage of good classification of about 80%. Notably, the predictive power of our models was confirmed by the analysis of a large external dataset of compounds with reported activity on the central nervous system (CNS) or lack thereof. The calculation of straightforward physicochemical descriptors is the only requirement for the prediction of the logBB of novel compounds through our models, which can be conveniently applied in conjunction with drug design and virtual screenings. Published by Elsevier Inc.

  17. Compositional descriptor-based recommender system for the materials discovery

    NASA Astrophysics Data System (ADS)

    Seko, Atsuto; Hayashi, Hiroyuki; Tanaka, Isao

    2018-06-01

    Structures and properties of many inorganic compounds have been collected historically. However, it only covers a very small portion of possible inorganic crystals, which implies the presence of numerous currently unknown compounds. A powerful machine-learning strategy is mandatory to discover new inorganic compounds from all chemical combinations. Herein we propose a descriptor-based recommender-system approach to estimate the relevance of chemical compositions where crystals can be formed [i.e., chemically relevant compositions (CRCs)]. In addition to data-driven compositional similarity used in the literature, the use of compositional descriptors as a prior knowledge is helpful for the discovery of new compounds. We validate our recommender systems in two ways. First, one database is used to construct a model, while another is used for the validation. Second, we estimate the phase stability for compounds at expected CRCs using density functional theory calculations.

  18. SAR image segmentation using skeleton-based fuzzy clustering

    NASA Astrophysics Data System (ADS)

    Cao, Yun Yi; Chen, Yan Qiu

    2003-06-01

    SAR image segmentation can be converted to a clustering problem in which pixels or small patches are grouped together based on local feature information. In this paper, we present a novel framework for segmentation. The segmentation goal is achieved by unsupervised clustering upon characteristic descriptors extracted from local patches. The mixture model of characteristic descriptor, which combines intensity and texture feature, is investigated. The unsupervised algorithm is derived from the recently proposed Skeleton-Based Data Labeling method. Skeletons are constructed as prototypes of clusters to represent arbitrary latent structures in image data. Segmentation using Skeleton-Based Fuzzy Clustering is able to detect the types of surfaces appeared in SAR images automatically without any user input.

  19. A new biodegradation prediction model specific to petroleum hydrocarbons.

    PubMed

    Howard, Philip; Meylan, William; Aronson, Dallas; Stiteler, William; Tunkel, Jay; Comber, Michael; Parkerton, Thomas F

    2005-08-01

    A new predictive model for determining quantitative primary biodegradation half-lives of individual petroleum hydrocarbons has been developed. This model uses a fragment-based approach similar to that of several other biodegradation models, such as those within the Biodegradation Probability Program (BIOWIN) estimation program. In the present study, a half-life in days is estimated using multiple linear regression against counts of 31 distinct molecular fragments. The model was developed using a data set consisting of 175 compounds with environmentally relevant experimental data that was divided into training and validation sets. The original fragments from the Ministry of International Trade and Industry BIOWIN model were used initially as structural descriptors and additional fragments were then added to better describe the ring systems found in petroleum hydrocarbons and to adjust for nonlinearity within the experimental data. The training and validation sets had r2 values of 0.91 and 0.81, respectively.

  20. Hybrid Histogram Descriptor: A Fusion Feature Representation for Image Retrieval.

    PubMed

    Feng, Qinghe; Hao, Qiaohong; Chen, Yuqi; Yi, Yugen; Wei, Ying; Dai, Jiangyan

    2018-06-15

    Currently, visual sensors are becoming increasingly affordable and fashionable, acceleratingly the increasing number of image data. Image retrieval has attracted increasing interest due to space exploration, industrial, and biomedical applications. Nevertheless, designing effective feature representation is acknowledged as a hard yet fundamental issue. This paper presents a fusion feature representation called a hybrid histogram descriptor (HHD) for image retrieval. The proposed descriptor comprises two histograms jointly: a perceptually uniform histogram which is extracted by exploiting the color and edge orientation information in perceptually uniform regions; and a motif co-occurrence histogram which is acquired by calculating the probability of a pair of motif patterns. To evaluate the performance, we benchmarked the proposed descriptor on RSSCN7, AID, Outex-00013, Outex-00014 and ETHZ-53 datasets. Experimental results suggest that the proposed descriptor is more effective and robust than ten recent fusion-based descriptors under the content-based image retrieval framework. The computational complexity was also analyzed to give an in-depth evaluation. Furthermore, compared with the state-of-the-art convolutional neural network (CNN)-based descriptors, the proposed descriptor also achieves comparable performance, but does not require any training process.

  1. Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign

    NASA Astrophysics Data System (ADS)

    Sliwoski, Gregory; Mendenhall, Jeffrey; Meiler, Jens

    2016-03-01

    Quantitative structure-activity relationship (QSAR) is a branch of computer aided drug discovery that relates chemical structures to biological activity. Two well established and related QSAR descriptors are two- and three-dimensional autocorrelation (2DA and 3DA). These descriptors encode the relative position of atoms or atom properties by calculating the separation between atom pairs in terms of number of bonds (2DA) or Euclidean distance (3DA). The sums of all values computed for a given small molecule are collected in a histogram. Atom properties can be added with a coefficient that is the product of atom properties for each pair. This procedure can lead to information loss when signed atom properties are considered such as partial charge. For example, the product of two positive charges is indistinguishable from the product of two equivalent negative charges. In this paper, we present variations of 2DA and 3DA called 2DA_Sign and 3DA_Sign that avoid information loss by splitting unique sign pairs into individual histograms. We evaluate these variations with models trained on nine datasets spanning a range of drug target classes. Both 2DA_Sign and 3DA_Sign significantly increase model performance across all datasets when compared with traditional 2DA and 3DA. Lastly, we find that limiting 3DA_Sign to maximum atom pair distances of 6 Å instead of 12 Å further increases model performance, suggesting that conformational flexibility may hinder performance with longer 3DA descriptors. Consistent with this finding, limiting the number of bonds in 2DA_Sign from 11 to 5 fails to improve performance.

  2. TOXICO-CHEMINFORMATICS AND QSAR MODELING OF ...

    EPA Pesticide Factsheets

    This abstract concludes that QSAR approaches combined with toxico-chemoinformatics descriptors can enhance predictive toxicology models. This abstract concludes that QSAR approaches combined with toxico-chemoinformatics descriptors can enhance predictive toxicology models.

  3. Food product design: emerging evidence for food policy.

    PubMed

    Al-Hamdani, Mohammed; Smith, Steven

    2017-03-01

    The research on the impact of specific brand elements such as food descriptors and package colors is underexplored. We tested whether a "light" color and a "low-calorie" descriptor on food packages gain favorable consumer perception ratings as compared with regular packages. Our online experiment recruited 406 adults in a 3 (product type: Chips versus Juice versus Yoghurt) × 2 (descriptor type: regular versus low-calorie) × 2 (color type: regular versus light) mixed design. Dependent variables were sensory (evaluations of the product's nutritional value and quality), product-based (evaluations of the product's physical appeal), and consumer-based (evaluations of the potential consumers of the product) scales. "Low-calorie" descriptors were found to increase sensory ratings as compared with regular descriptors and light-colored packages received higher product-based ratings as compared with their regular-colored counterparts. Food package color and descriptors present a promising venue for understanding preventative measures against obesity.[Formula: see text].

  4. Data Base Descriptors for Electro-Optical Sensor Simulation. Final Report, May 1977 through June 1978.

    ERIC Educational Resources Information Center

    Zimmerlin, Timothy A.; And Others

    An effort to construct a model of the thermal properties of materials based on theoretical thermo-electromagnetic models, to construct a data base of the dense cultural hospital scene according to Defense Mapping Agency Aerospace Center (DMAAC) specifications, and to design and implement a program to evaluate the tonal model and generate imagery…

  5. Predicting p Ka values from EEM atomic charges

    PubMed Central

    2013-01-01

    The acid dissociation constant p Ka is a very important molecular property, and there is a strong interest in the development of reliable and fast methods for p Ka prediction. We have evaluated the p Ka prediction capabilities of QSPR models based on empirical atomic charges calculated by the Electronegativity Equalization Method (EEM). Specifically, we collected 18 EEM parameter sets created for 8 different quantum mechanical (QM) charge calculation schemes. Afterwards, we prepared a training set of 74 substituted phenols. Additionally, for each molecule we generated its dissociated form by removing the phenolic hydrogen. For all the molecules in the training set, we then calculated EEM charges using the 18 parameter sets, and the QM charges using the 8 above mentioned charge calculation schemes. For each type of QM and EEM charges, we created one QSPR model employing charges from the non-dissociated molecules (three descriptor QSPR models), and one QSPR model based on charges from both dissociated and non-dissociated molecules (QSPR models with five descriptors). Afterwards, we calculated the quality criteria and evaluated all the QSPR models obtained. We found that QSPR models employing the EEM charges proved as a good approach for the prediction of p Ka (63% of these models had R2 > 0.9, while the best had R2 = 0.924). As expected, QM QSPR models provided more accurate p Ka predictions than the EEM QSPR models but the differences were not significant. Furthermore, a big advantage of the EEM QSPR models is that their descriptors (i.e., EEM atomic charges) can be calculated markedly faster than the QM charge descriptors. Moreover, we found that the EEM QSPR models are not so strongly influenced by the selection of the charge calculation approach as the QM QSPR models. The robustness of the EEM QSPR models was subsequently confirmed by cross-validation. The applicability of EEM QSPR models for other chemical classes was illustrated by a case study focused on carboxylic acids. In summary, EEM QSPR models constitute a fast and accurate p Ka prediction approach that can be used in virtual screening. PMID:23574978

  6. Position Estimation and Local Mapping Using Omnidirectional Images and Global Appearance Descriptors

    PubMed Central

    Berenguer, Yerai; Payá, Luis; Ballesta, Mónica; Reinoso, Oscar

    2015-01-01

    This work presents some methods to create local maps and to estimate the position of a mobile robot, using the global appearance of omnidirectional images. We use a robot that carries an omnidirectional vision system on it. Every omnidirectional image acquired by the robot is described only with one global appearance descriptor, based on the Radon transform. In the work presented in this paper, two different possibilities have been considered. In the first one, we assume the existence of a map previously built composed of omnidirectional images that have been captured from previously-known positions. The purpose in this case consists of estimating the nearest position of the map to the current position of the robot, making use of the visual information acquired by the robot from its current (unknown) position. In the second one, we assume that we have a model of the environment composed of omnidirectional images, but with no information about the location of where the images were acquired. The purpose in this case consists of building a local map and estimating the position of the robot within this map. Both methods are tested with different databases (including virtual and real images) taking into consideration the changes of the position of different objects in the environment, different lighting conditions and occlusions. The results show the effectiveness and the robustness of both methods. PMID:26501289

  7. Fast human pose estimation using 3D Zernike descriptors

    NASA Astrophysics Data System (ADS)

    Berjón, Daniel; Morán, Francisco

    2012-03-01

    Markerless video-based human pose estimation algorithms face a high-dimensional problem that is frequently broken down into several lower-dimensional ones by estimating the pose of each limb separately. However, in order to do so they need to reliably locate the torso, for which they typically rely on time coherence and tracking algorithms. Their losing track usually results in catastrophic failure of the process, requiring human intervention and thus precluding their usage in real-time applications. We propose a very fast rough pose estimation scheme based on global shape descriptors built on 3D Zernike moments. Using an articulated model that we configure in many poses, a large database of descriptor/pose pairs can be computed off-line. Thus, the only steps that must be done on-line are the extraction of the descriptors for each input volume and a search against the database to get the most likely poses. While the result of such process is not a fine pose estimation, it can be useful to help more sophisticated algorithms to regain track or make more educated guesses when creating new particles in particle-filter-based tracking schemes. We have achieved a performance of about ten fps on a single computer using a database of about one million entries.

  8. CADASTER QSPR Models for Predictions of Melting and Boiling Points of Perfluorinated Chemicals.

    PubMed

    Bhhatarai, Barun; Teetz, Wolfram; Liu, Tao; Öberg, Tomas; Jeliazkova, Nina; Kochev, Nikolay; Pukalov, Ognyan; Tetko, Igor V; Kovarich, Simona; Papa, Ester; Gramatica, Paola

    2011-03-14

    Quantitative structure property relationship (QSPR) studies on per- and polyfluorinated chemicals (PFCs) on melting point (MP) and boiling point (BP) are presented. The training and prediction chemicals used for developing and validating the models were selected from Syracuse PhysProp database and literatures. The available experimental data sets were split in two different ways: a) random selection on response value, and b) structural similarity verified by self-organizing-map (SOM), in order to propose reliable predictive models, developed only on the training sets and externally verified on the prediction sets. Individual linear and non-linear approaches based models developed by different CADASTER partners on 0D-2D Dragon descriptors, E-state descriptors and fragment based descriptors as well as consensus model and their predictions are presented. In addition, the predictive performance of the developed models was verified on a blind external validation set (EV-set) prepared using PERFORCE database on 15 MP and 25 BP data respectively. This database contains only long chain perfluoro-alkylated chemicals, particularly monitored by regulatory agencies like US-EPA and EU-REACH. QSPR models with internal and external validation on two different external prediction/validation sets and study of applicability-domain highlighting the robustness and high accuracy of the models are discussed. Finally, MPs for additional 303 PFCs and BPs for 271 PFCs were predicted for which experimental measurements are unknown. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Low-Dimensional Statistics of Anatomical Variability via Compact Representation of Image Deformations.

    PubMed

    Zhang, Miaomiao; Wells, William M; Golland, Polina

    2016-10-01

    Using image-based descriptors to investigate clinical hypotheses and therapeutic implications is challenging due to the notorious "curse of dimensionality" coupled with a small sample size. In this paper, we present a low-dimensional analysis of anatomical shape variability in the space of diffeomorphisms and demonstrate its benefits for clinical studies. To combat the high dimensionality of the deformation descriptors, we develop a probabilistic model of principal geodesic analysis in a bandlimited low-dimensional space that still captures the underlying variability of image data. We demonstrate the performance of our model on a set of 3D brain MRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Our model yields a more compact representation of group variation at substantially lower computational cost than models based on the high-dimensional state-of-the-art approaches such as tangent space PCA (TPCA) and probabilistic principal geodesic analysis (PPGA).

  10. NMF-Based Image Quality Assessment Using Extreme Learning Machine.

    PubMed

    Wang, Shuigen; Deng, Chenwei; Lin, Weisi; Huang, Guang-Bin; Zhao, Baojun

    2017-01-01

    Numerous state-of-the-art perceptual image quality assessment (IQA) algorithms share a common two-stage process: distortion description followed by distortion effects pooling. As for the first stage, the distortion descriptors or measurements are expected to be effective representatives of human visual variations, while the second stage should well express the relationship among quality descriptors and the perceptual visual quality. However, most of the existing quality descriptors (e.g., luminance, contrast, and gradient) do not seem to be consistent with human perception, and the effects pooling is often done in ad-hoc ways. In this paper, we propose a novel full-reference IQA metric. It applies non-negative matrix factorization (NMF) to measure image degradations by making use of the parts-based representation of NMF. On the other hand, a new machine learning technique [extreme learning machine (ELM)] is employed to address the limitations of the existing pooling techniques. Compared with neural networks and support vector regression, ELM can achieve higher learning accuracy with faster learning speed. Extensive experimental results demonstrate that the proposed metric has better performance and lower computational complexity in comparison with the relevant state-of-the-art approaches.

  11. Prediction of Partition Coefficients of Organic Compounds between SPME/PDMS and Aqueous Solution

    PubMed Central

    Chao, Keh-Ping; Lu, Yu-Ting; Yang, Hsiu-Wen

    2014-01-01

    Polydimethylsiloxane (PDMS) is commonly used as the coated polymer in the solid phase microextraction (SPME) technique. In this study, the partition coefficients of organic compounds between SPME/PDMS and the aqueous solution were compiled from the literature sources. The correlation analysis for partition coefficients was conducted to interpret the effect of their physicochemical properties and descriptors on the partitioning process. The PDMS-water partition coefficients were significantly correlated to the polarizability of organic compounds (r = 0.977, p < 0.05). An empirical model, consisting of the polarizability, the molecular connectivity index, and an indicator variable, was developed to appropriately predict the partition coefficients of 61 organic compounds for the training set. The predictive ability of the empirical model was demonstrated by using it on a test set of 26 chemicals not included in the training set. The empirical model, applying the straightforward calculated molecular descriptors, for estimating the PDMS-water partition coefficient will contribute to the practical applications of the SPME technique. PMID:24534804

  12. A penalized quantitative structure-property relationship study on melting point of energetic carbocyclic nitroaromatic compounds using adaptive bridge penalty.

    PubMed

    Al-Fakih, A M; Algamal, Z Y; Lee, M H; Aziz, M

    2018-05-01

    A penalized quantitative structure-property relationship (QSPR) model with adaptive bridge penalty for predicting the melting points of 92 energetic carbocyclic nitroaromatic compounds is proposed. To ensure the consistency of the descriptor selection of the proposed penalized adaptive bridge (PBridge), we proposed a ridge estimator ([Formula: see text]) as an initial weight in the adaptive bridge penalty. The Bayesian information criterion was applied to ensure the accurate selection of the tuning parameter ([Formula: see text]). The PBridge based model was internally and externally validated based on [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], the Y-randomization test, [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text] and the applicability domain. The validation results indicate that the model is robust and not due to chance correlation. The descriptor selection and prediction performance of PBridge for the training dataset outperforms the other methods used. PBridge shows the highest [Formula: see text] of 0.959, [Formula: see text] of 0.953, [Formula: see text] of 0.949 and [Formula: see text] of 0.959, and the lowest [Formula: see text] and [Formula: see text]. For the test dataset, PBridge shows a higher [Formula: see text] of 0.945 and [Formula: see text] of 0.948, and a lower [Formula: see text] and [Formula: see text], indicating its better prediction performance. The results clearly reveal that the proposed PBridge is useful for constructing reliable and robust QSPRs for predicting melting points prior to synthesizing new organic compounds.

  13. The evaluation of distributed damage in concrete based on sinusoidal modeling of the ultrasonic response.

    PubMed

    Sepehrinezhad, Alireza; Toufigh, Vahab

    2018-05-25

    Ultrasonic wave attenuation is an effective descriptor of distributed damage in inhomogeneous materials. Methods developed to measure wave attenuation have the potential to provide an in-site evaluation of existing concrete structures insofar as they are accurate and time-efficient. In this study, material classification and distributed damage evaluation were investigated based on the sinusoidal modeling of the response from the through-transmission ultrasonic tests on polymer concrete specimens. The response signal was modeled as single or the sum of damping sinusoids. Due to the inhomogeneous nature of concrete materials, model parameters may vary from one specimen to another. Therefore, these parameters are not known in advance and should be estimated while the response signal is being received. The modeling procedure used in this study involves a data-adaptive algorithm to estimate the parameters online. Data-adaptive algorithms are used due to a lack of knowledge of the model parameters. The damping factor was estimated as a descriptor of the distributed damage. The results were compared in two different cases as follows: (1) constant excitation frequency with varying concrete mixtures and (2) constant mixture with varying excitation frequencies. The specimens were also loaded up to their ultimate compressive strength to investigate the effect of distributed damage in the response signal. The results of the estimation indicated that the damping was highly sensitive to the change in material inhomogeneity, even in comparable mixtures. In addition to the proposed method, three methods were employed to compare the results based on their accuracy in the classification of materials and the evaluation of the distributed damage. It is shown that the estimated damping factor is not only sensitive to damage in the final stages of loading, but it is also applicable in evaluating micro damages in the earlier stages providing a reliable descriptor of damage. In addition, the modified amplitude ratio method is introduced as an improvement of the classical method. The proposed methods were validated to be effective descriptors of distributed damage. The presented models were also in good agreement with the experimental data. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Prediction of olive oil sensory descriptors using instrumental data fusion and partial least squares (PLS) regression.

    PubMed

    Borràs, Eva; Ferré, Joan; Boqué, Ricard; Mestres, Montserrat; Aceña, Laura; Calvo, Angels; Busto, Olga

    2016-08-01

    Headspace-Mass Spectrometry (HS-MS), Fourier Transform Mid-Infrared spectroscopy (FT-MIR) and UV-Visible spectrophotometry (UV-vis) instrumental responses have been combined to predict virgin olive oil sensory descriptors. 343 olive oil samples analyzed during four consecutive harvests (2010-2014) were used to build multivariate calibration models using partial least squares (PLS) regression. The reference values of the sensory attributes were provided by expert assessors from an official taste panel. The instrumental data were modeled individually and also using data fusion approaches. The use of fused data with both low- and mid-level of abstraction improved PLS predictions for all the olive oil descriptors. The best PLS models were obtained for two positive attributes (fruity and bitter) and two defective descriptors (fusty and musty), all of them using data fusion of MS and MIR spectral fingerprints. Although good predictions were not obtained for some sensory descriptors, the results are encouraging, specially considering that the legal categorization of virgin olive oils only requires the determination of fruity and defective descriptors. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Explanatory Power of Multi-scale Physical Descriptors in Modeling Benthic Indices Across Nested Ecoregions of the Pacific Northwest

    NASA Astrophysics Data System (ADS)

    Holburn, E. R.; Bledsoe, B. P.; Poff, N. L.; Cuhaciyan, C. O.

    2005-05-01

    Using over 300 R/EMAP sites in OR and WA, we examine the relative explanatory power of watershed, valley, and reach scale descriptors in modeling variation in benthic macroinvertebrate indices. Innovative metrics describing flow regime, geomorphic processes, and hydrologic-distance weighted watershed and valley characteristics are used in multiple regression and regression tree modeling to predict EPT richness, % EPT, EPT/C, and % Plecoptera. A nested design using seven ecoregions is employed to evaluate the influence of geographic scale and environmental heterogeneity on the explanatory power of individual and combined scales. Regression tree models are constructed to explain variability while identifying threshold responses and interactions. Cross-validated models demonstrate differences in the explanatory power associated with single-scale and multi-scale models as environmental heterogeneity is varied. Models explaining the greatest variability in biological indices result from multi-scale combinations of physical descriptors. Results also indicate that substantial variation in benthic macroinvertebrate response can be explained with process-based watershed and valley scale metrics derived exclusively from common geospatial data. This study outlines a general framework for identifying key processes driving macroinvertebrate assemblages across a range of scales and establishing the geographic extent at which various levels of physical description best explain biological variability. Such information can guide process-based stratification to avoid spurious comparison of dissimilar stream types in bioassessments and ensure that key environmental gradients are adequately represented in sampling designs.

  16. Fingerprint Identification Using SIFT-Based Minutia Descriptors and Improved All Descriptor-Pair Matching

    PubMed Central

    Zhou, Ru; Zhong, Dexing; Han, Jiuqiang

    2013-01-01

    The performance of conventional minutiae-based fingerprint authentication algorithms degrades significantly when dealing with low quality fingerprints with lots of cuts or scratches. A similar degradation of the minutiae-based algorithms is observed when small overlapping areas appear because of the quite narrow width of the sensors. Based on the detection of minutiae, Scale Invariant Feature Transformation (SIFT) descriptors are employed to fulfill verification tasks in the above difficult scenarios. However, the original SIFT algorithm is not suitable for fingerprint because of: (1) the similar patterns of parallel ridges; and (2) high computational resource consumption. To enhance the efficiency and effectiveness of the algorithm for fingerprint verification, we propose a SIFT-based Minutia Descriptor (SMD) to improve the SIFT algorithm through image processing, descriptor extraction and matcher. A two-step fast matcher, named improved All Descriptor-Pair Matching (iADM), is also proposed to implement the 1:N verifications in real-time. Fingerprint Identification using SMD and iADM (FISiA) achieved a significant improvement with respect to accuracy in representative databases compared with the conventional minutiae-based method. The speed of FISiA also can meet real-time requirements. PMID:23467056

  17. RGB-D SLAM Combining Visual Odometry and Extended Information Filter

    PubMed Central

    Zhang, Heng; Liu, Yanli; Tan, Jindong; Xiong, Naixue

    2015-01-01

    In this paper, we present a novel RGB-D SLAM system based on visual odometry and an extended information filter, which does not require any other sensors or odometry. In contrast to the graph optimization approaches, this is more suitable for online applications. A visual dead reckoning algorithm based on visual residuals is devised, which is used to estimate motion control input. In addition, we use a novel descriptor called binary robust appearance and normals descriptor (BRAND) to extract features from the RGB-D frame and use them as landmarks. Furthermore, considering both the 3D positions and the BRAND descriptors of the landmarks, our observation model avoids explicit data association between the observations and the map by marginalizing the observation likelihood over all possible associations. Experimental validation is provided, which compares the proposed RGB-D SLAM algorithm with just RGB-D visual odometry and a graph-based RGB-D SLAM algorithm using the publicly-available RGB-D dataset. The results of the experiments demonstrate that our system is quicker than the graph-based RGB-D SLAM algorithm. PMID:26263990

  18. Use of in Vitro HTS-Derived Concentration–Response Data as Biological Descriptors Improves the Accuracy of QSAR Models of in Vivo Toxicity

    PubMed Central

    Sedykh, Alexander; Zhu, Hao; Tang, Hao; Zhang, Liying; Richard, Ann; Rusyn, Ivan; Tropsha, Alexander

    2011-01-01

    Background Quantitative high-throughput screening (qHTS) assays are increasingly being used to inform chemical hazard identification. Hundreds of chemicals have been tested in dozens of cell lines across extensive concentration ranges by the National Toxicology Program in collaboration with the National Institutes of Health Chemical Genomics Center. Objectives Our goal was to test a hypothesis that dose–response data points of the qHTS assays can serve as biological descriptors of assayed chemicals and, when combined with conventional chemical descriptors, improve the accuracy of quantitative structure–activity relationship (QSAR) models applied to prediction of in vivo toxicity end points. Methods We obtained cell viability qHTS concentration–response data for 1,408 substances assayed in 13 cell lines from PubChem; for a subset of these compounds, rodent acute toxicity half-maximal lethal dose (LD50) data were also available. We used the k nearest neighbor classification and random forest QSAR methods to model LD50 data using chemical descriptors either alone (conventional models) or combined with biological descriptors derived from the concentration–response qHTS data (hybrid models). Critical to our approach was the use of a novel noise-filtering algorithm to treat qHTS data. Results Both the external classification accuracy and coverage (i.e., fraction of compounds in the external set that fall within the applicability domain) of the hybrid QSAR models were superior to conventional models. Conclusions Concentration–response qHTS data may serve as informative biological descriptors of molecules that, when combined with conventional chemical descriptors, may considerably improve the accuracy and utility of computational approaches for predicting in vivo animal toxicity end points. PMID:20980217

  19. Mobile visual object identification: from SIFT-BoF-RANSAC to Sketchprint

    NASA Astrophysics Data System (ADS)

    Voloshynovskiy, Sviatoslav; Diephuis, Maurits; Holotyak, Taras

    2015-03-01

    Mobile object identification based on its visual features find many applications in the interaction with physical objects and security. Discriminative and robust content representation plays a central role in object and content identification. Complex post-processing methods are used to compress descriptors and their geometrical information, aggregate them into more compact and discriminative representations and finally re-rank the results based on the similarity geometries of descriptors. Unfortunately, most of the existing descriptors are not very robust and discriminative once applied to the various contend such as real images, text or noise-like microstructures next to requiring at least 500-1'000 descriptors per image for reliable identification. At the same time, the geometric re-ranking procedures are still too complex to be applied to the numerous candidates obtained from the feature similarity based search only. This restricts that list of candidates to be less than 1'000 which obviously causes a higher probability of miss. In addition, the security and privacy of content representation has become a hot research topic in multimedia and security communities. In this paper, we introduce a new framework for non- local content representation based on SketchPrint descriptors. It extends the properties of local descriptors to a more informative and discriminative, yet geometrically invariant content representation. In particular it allows images to be compactly represented by 100 SketchPrint descriptors without being fully dependent on re-ranking methods. We consider several use cases, applying SketchPrint descriptors to natural images, text documents, packages and micro-structures and compare them with the traditional local descriptors.

  20. Landmark-free statistical analysis of the shape of plant leaves.

    PubMed

    Laga, Hamid; Kurtek, Sebastian; Srivastava, Anuj; Miklavcic, Stanley J

    2014-12-21

    The shapes of plant leaves are important features to biologists, as they can help in distinguishing plant species, measuring their health, analyzing their growth patterns, and understanding relations between various species. Most of the methods that have been developed in the past focus on comparing the shape of individual leaves using either descriptors or finite sets of landmarks. However, descriptor-based representations are not invertible and thus it is often hard to map descriptor variability into shape variability. On the other hand, landmark-based techniques require automatic detection and registration of the landmarks, which is very challenging in the case of plant leaves that exhibit high variability within and across species. In this paper, we propose a statistical model based on the Squared Root Velocity Function (SRVF) representation and the Riemannian elastic metric of Srivastava et al. (2011) to model the observed continuous variability in the shape of plant leaves. We treat plant species as random variables on a non-linear shape manifold and thus statistical summaries, such as means and covariances, can be computed. One can then study the principal modes of variations and characterize the observed shapes using probability density models, such as Gaussians or Mixture of Gaussians. We demonstrate the usage of such statistical model for (1) efficient classification of individual leaves, (2) the exploration of the space of plant leaf shapes, which is important in the study of population-specific variations, and (3) comparing entire plant species, which is fundamental to the study of evolutionary relationships in plants. Our approach does not require descriptors or landmarks but automatically solves for the optimal registration that aligns a pair of shapes. We evaluate the performance of the proposed framework on publicly available benchmarks such as the Flavia, the Swedish, and the ImageCLEF2011 plant leaf datasets. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Quantification of ultrasonic texture intra-heterogeneity via volumetric stochastic modeling for tissue characterization.

    PubMed

    Al-Kadi, Omar S; Chung, Daniel Y F; Carlisle, Robert C; Coussios, Constantin C; Noble, J Alison

    2015-04-01

    Intensity variations in image texture can provide powerful quantitative information about physical properties of biological tissue. However, tissue patterns can vary according to the utilized imaging system and are intrinsically correlated to the scale of analysis. In the case of ultrasound, the Nakagami distribution is a general model of the ultrasonic backscattering envelope under various scattering conditions and densities where it can be employed for characterizing image texture, but the subtle intra-heterogeneities within a given mass are difficult to capture via this model as it works at a single spatial scale. This paper proposes a locally adaptive 3D multi-resolution Nakagami-based fractal feature descriptor that extends Nakagami-based texture analysis to accommodate subtle speckle spatial frequency tissue intensity variability in volumetric scans. Local textural fractal descriptors - which are invariant to affine intensity changes - are extracted from volumetric patches at different spatial resolutions from voxel lattice-based generated shape and scale Nakagami parameters. Using ultrasound radio-frequency datasets we found that after applying an adaptive fractal decomposition label transfer approach on top of the generated Nakagami voxels, tissue characterization results were superior to the state of art. Experimental results on real 3D ultrasonic pre-clinical and clinical datasets suggest that describing tumor intra-heterogeneity via this descriptor may facilitate improved prediction of therapy response and disease characterization. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.

  2. Considering ionic state in modeling sorption of pharmaceuticals to sewage sludge.

    PubMed

    Rybacka, Aleksandra; Andersson, Patrik L

    2016-12-01

    Information on the partitioning of chemicals between particulate matter and water in sewage treatment plants (STPs) can be used to predict their subsequent environmental fate. However, this information can be challenging to acquire, especially for pharmaceuticals that are frequently present in ionized forms. This study investigated the relationship between the ionization state of active pharmaceutical ingredients (APIs) and their partitioning between water and sludge in STPs. We also investigated the underlying mechanisms of sludge sorption by using chemical descriptors based on ionized structures, and evaluated the usefulness of these descriptors in quantitative structure-property relationship (QSPR) modeling. K D values were collected for 110 APIs, which were classified as neutral, positive, or negative at pH 7. The models with the highest performance had the R 2 Y and Q 2 values of above 0.75 and 0.65, respectively. We found that the dominant intermolecular forces governing the interactions of neutral and positively charged APIs with sludge are hydrophobic, pi-pi, and dipole-dipole interactions, whereas the interactions of negatively charged APIs with sludge were mainly governed by covalent bonding as well as ion-ion, ion-dipole, and dipole-dipole interactions; hydrophobicity-driven interactions were rather unimportant. Including charge-related descriptors improved the models' performance by 5-10%, underlining the importance of electrostatic interactions. The use of descriptors calculated for ionized structures did not improve the model statistics for positive and negative APIs, but slightly increased model performance for neutral APIs. We attribute this to a better description of neutral zwitterions. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Receptive fields selection for binary feature description.

    PubMed

    Fan, Bin; Kong, Qingqun; Trzcinski, Tomasz; Wang, Zhiheng; Pan, Chunhong; Fua, Pascal

    2014-06-01

    Feature description for local image patch is widely used in computer vision. While the conventional way to design local descriptor is based on expert experience and knowledge, learning-based methods for designing local descriptor become more and more popular because of their good performance and data-driven property. This paper proposes a novel data-driven method for designing binary feature descriptor, which we call receptive fields descriptor (RFD). Technically, RFD is constructed by thresholding responses of a set of receptive fields, which are selected from a large number of candidates according to their distinctiveness and correlations in a greedy way. Using two different kinds of receptive fields (namely rectangular pooling area and Gaussian pooling area) for selection, we obtain two binary descriptors RFDR and RFDG .accordingly. Image matching experiments on the well-known patch data set and Oxford data set demonstrate that RFD significantly outperforms the state-of-the-art binary descriptors, and is comparable with the best float-valued descriptors at a fraction of processing time. Finally, experiments on object recognition tasks confirm that both RFDR and RFDG successfully bridge the performance gap between binary descriptors and their floating-point competitors.

  4. Circular blurred shape model for multiclass symbol recognition.

    PubMed

    Escalera, Sergio; Fornés, Alicia; Pujol, Oriol; Lladós, Josep; Radeva, Petia

    2011-04-01

    In this paper, we propose a circular blurred shape model descriptor to deal with the problem of symbol detection and classification as a particular case of object recognition. The feature extraction is performed by capturing the spatial arrangement of significant object characteristics in a correlogram structure. The shape information from objects is shared among correlogram regions, where a prior blurring degree defines the level of distortion allowed in the symbol, making the descriptor tolerant to irregular deformations. Moreover, the descriptor is rotation invariant by definition. We validate the effectiveness of the proposed descriptor in both the multiclass symbol recognition and symbol detection domains. In order to perform the symbol detection, the descriptors are learned using a cascade of classifiers. In the case of multiclass categorization, the new feature space is learned using a set of binary classifiers which are embedded in an error-correcting output code design. The results over four symbol data sets show the significant improvements of the proposed descriptor compared to the state-of-the-art descriptors. In particular, the results are even more significant in those cases where the symbols suffer from elastic deformations.

  5. A new adaptive L1-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives.

    PubMed

    Algamal, Z Y; Lee, M H

    2017-01-01

    A high-dimensional quantitative structure-activity relationship (QSAR) classification model typically contains a large number of irrelevant and redundant descriptors. In this paper, a new design of descriptor selection for the QSAR classification model estimation method is proposed by adding a new weight inside L1-norm. The experimental results of classifying the anti-hepatitis C virus activity of thiourea derivatives demonstrate that the proposed descriptor selection method in the QSAR classification model performs effectively and competitively compared with other existing penalized methods in terms of classification performance on both the training and the testing datasets. Moreover, it is noteworthy that the results obtained in terms of stability test and applicability domain provide a robust QSAR classification model. It is evident from the results that the developed QSAR classification model could conceivably be employed for further high-dimensional QSAR classification studies.

  6. A Quantum Chemical and Statistical Study of Phenolic Schiff Bases with Antioxidant Activity against DPPH Free Radical

    PubMed Central

    Anouar, El Hassane

    2014-01-01

    Phenolic Schiff bases are known as powerful antioxidants. To select the electronic, 2D and 3D descriptors responsible for the free radical scavenging ability of a series of 30 phenolic Schiff bases, a set of molecular descriptors were calculated by using B3P86 (Becke’s three parameter hybrid functional with Perdew 86 correlation functional) combined with 6-31 + G(d,p) basis set (i.e., at the B3P86/6-31 + G(d,p) level of theory). The chemometric methods, simple and multiple linear regressions (SLR and MLR), principal component analysis (PCA) and hierarchical cluster analysis (HCA) were employed to reduce the dimensionality and to investigate the relationship between the calculated descriptors and the antioxidant activity. The results showed that the antioxidant activity mainly depends on the first and second bond dissociation enthalpies of phenolic hydroxyl groups, the dipole moment and the hydrophobicity descriptors. The antioxidant activity is inversely proportional to the main descriptors. The selected descriptors discriminate the Schiff bases into active and inactive antioxidants. PMID:26784873

  7. Automated selection of BI-RADS lesion descriptors for reporting calcifications in mammograms

    NASA Astrophysics Data System (ADS)

    Paquerault, Sophie; Jiang, Yulei; Nishikawa, Robert M.; Schmidt, Robert A.; D'Orsi, Carl J.; Vyborny, Carl J.; Newstead, Gillian M.

    2003-05-01

    We are developing an automated computer technique to describe calcifications in mammograms according to the BI-RADS lexicon. We evaluated this technique by its agreement with radiologists' description of the same lesions. Three expert mammographers reviewed our database of 90 cases of digitized mammograms containing clustered microcalcifications and described the calcifications according to BI-RADS. In our study, the radiologists used only 4 of the 5 calcification distribution descriptors and 5 of the 14 calcification morphology descriptors contained in BI-RADS. Our computer technique was therefore designed specifically for these 4 calcification distribution descriptors and 5 calcification morphology descriptors. For calcification distribution, 4 linear discriminant analysis (LDA) classifiers were developed using 5 computer-extracted features to produce scores of how well each descriptor describes a cluster. Similarly, for calcification morphology, 5 LDAs were designed using 10 computer-extracted features. We trained the LDAs using only the BI-RADS data reported by the first radiologist and compared the computer output to the descriptor data reported by all 3 radiologists (for the first radiologist, the leave-one-out method was used). The computer output consisted of the best calcification distribution descriptor and the best 2 calcification morphology descriptors. The results of the comparison with the data from each radiologist, respectively, were: for calcification distribution, percent agreement, 74%, 66%, and 73%, kappa value, 0.44, 0.36, and 0.46; for calcification morphology, percent agreement, 83%, 77%, and 57%, kappa value, 0.78, 0.70, and 0.44. These results indicate that the proposed computer technique can select BI-RADS descriptors in good agreement with radiologists.

  8. QSPR for predicting chloroform formation in drinking water disinfection.

    PubMed

    Luilo, G B; Cabaniss, S E

    2011-01-01

    Chlorination is the most widely used technique for water disinfection, but may lead to the formation of chloroform (trichloromethane; TCM) and other by-products. This article reports the first quantitative structure-property relationship (QSPR) for predicting the formation of TCM in chlorinated drinking water. Model compounds (n = 117) drawn from 10 literature sources were divided into training data (n = 90, analysed by five-way leave-many-out internal cross-validation) and external validation data (n = 27). QSPR internal cross-validation had Q² = 0.94 and root mean square error (RMSE) of 0.09 moles TCM per mole compound, consistent with external validation Q2 of 0.94 and RMSE of 0.08 moles TCM per mole compound, and met criteria for high predictive power and robustness. In contrast, log TCM QSPR performed poorly and did not meet the criteria for predictive power. The QSPR predictions were consistent with experimental values for TCM formation from tannic acid and for model fulvic acid structures. The descriptors used are consistent with a relatively small number of important TCM precursor structures based upon 1,3-dicarbonyls or 1,3-diphenols.

  9. In silico prediction of nematic transition temperature for liquid crystals using quantitative structure-property relationship approaches.

    PubMed

    Fatemi, Mohammad Hossein; Ghorbanzad'e, Mehdi

    2009-11-01

    Quantitative structure-property relationship models for the prediction of the nematic transition temperature (T (N)) were developed by using multilinear regression analysis and a feedforward artificial neural network (ANN). A collection of 42 thermotropic liquid crystals was chosen as the data set. The data set was divided into three sets: for training, and an internal and external test set. Training and internal test sets were used for ANN model development, and the external test set was used for evaluation of the predictive power of the model. In order to build the models, a set of six descriptors were selected by the best multilinear regression procedure of the CODESSA program. These descriptors were: atomic charge weighted partial negatively charged surface area, relative negative charged surface area, polarity parameter/square distance, minimum most negative atomic partial charge, molecular volume, and the A component of moment of inertia, which encode geometrical and electronic characteristics of molecules. These descriptors were used as inputs to ANN. The optimized ANN model had 6:6:1 topology. The standard errors in the calculation of T (N) for the training, internal, and external test sets using the ANN model were 1.012, 4.910, and 4.070, respectively. To further evaluate the ANN model, a crossvalidation test was performed, which produced the statistic Q (2) = 0.9796 and standard deviation of 2.67 based on predicted residual sum of square. Also, the diversity test was performed to ensure the model's stability and prove its predictive capability. The obtained results reveal the suitability of ANN for the prediction of T (N) for liquid crystals using molecular structural descriptors.

  10. Application of a Cloud Model-Set Pair Analysis in Hazard Assessment for Biomass Gasification Stations.

    PubMed

    Yan, Fang; Xu, Kaili

    2017-01-01

    Because a biomass gasification station includes various hazard factors, hazard assessment is needed and significant. In this article, the cloud model (CM) is employed to improve set pair analysis (SPA), and a novel hazard assessment method for a biomass gasification station is proposed based on the cloud model-set pair analysis (CM-SPA). In this method, cloud weight is proposed to be the weight of index. In contrast to the index weight of other methods, cloud weight is shown by cloud descriptors; hence, the randomness and fuzziness of cloud weight will make it effective to reflect the linguistic variables of experts. Then, the cloud connection degree (CCD) is proposed to replace the connection degree (CD); the calculation algorithm of CCD is also worked out. By utilizing the CCD, the hazard assessment results are shown by some normal clouds, and the normal clouds are reflected by cloud descriptors; meanwhile, the hazard grade is confirmed by analyzing the cloud descriptors. After that, two biomass gasification stations undergo hazard assessment via CM-SPA and AHP based SPA, respectively. The comparison of assessment results illustrates that the CM-SPA is suitable and effective for the hazard assessment of a biomass gasification station and that CM-SPA will make the assessment results more reasonable and scientific.

  11. Application of a Cloud Model-Set Pair Analysis in Hazard Assessment for Biomass Gasification Stations

    PubMed Central

    Yan, Fang; Xu, Kaili

    2017-01-01

    Because a biomass gasification station includes various hazard factors, hazard assessment is needed and significant. In this article, the cloud model (CM) is employed to improve set pair analysis (SPA), and a novel hazard assessment method for a biomass gasification station is proposed based on the cloud model-set pair analysis (CM-SPA). In this method, cloud weight is proposed to be the weight of index. In contrast to the index weight of other methods, cloud weight is shown by cloud descriptors; hence, the randomness and fuzziness of cloud weight will make it effective to reflect the linguistic variables of experts. Then, the cloud connection degree (CCD) is proposed to replace the connection degree (CD); the calculation algorithm of CCD is also worked out. By utilizing the CCD, the hazard assessment results are shown by some normal clouds, and the normal clouds are reflected by cloud descriptors; meanwhile, the hazard grade is confirmed by analyzing the cloud descriptors. After that, two biomass gasification stations undergo hazard assessment via CM-SPA and AHP based SPA, respectively. The comparison of assessment results illustrates that the CM-SPA is suitable and effective for the hazard assessment of a biomass gasification station and that CM-SPA will make the assessment results more reasonable and scientific. PMID:28076440

  12. Three-Dimensional Biologically Relevant Spectrum (BRS-3D): Shape Similarity Profile Based on PDB Ligands as Molecular Descriptors.

    PubMed

    Hu, Ben; Kuang, Zheng-Kun; Feng, Shi-Yu; Wang, Dong; He, Song-Bing; Kong, De-Xin

    2016-11-17

    The crystallized ligands in the Protein Data Bank (PDB) can be treated as the inverse shapes of the active sites of corresponding proteins. Therefore, the shape similarity between a molecule and PDB ligands indicated the possibility of the molecule to bind with the targets. In this paper, we proposed a shape similarity profile that can be used as a molecular descriptor for ligand-based virtual screening. First, through three-dimensional (3D) structural clustering, 300 diverse ligands were extracted from the druggable protein-ligand database, sc-PDB. Then, each of the molecules under scrutiny was flexibly superimposed onto the 300 ligands. Superimpositions were scored by shape overlap and property similarity, producing a 300 dimensional similarity array termed the "Three-Dimensional Biologically Relevant Spectrum (BRS-3D)". Finally, quantitative or discriminant models were developed with the 300 dimensional descriptor using machine learning methods (support vector machine). The effectiveness of this approach was evaluated using 42 benchmark data sets from the G protein-coupled receptor (GPCR) ligand library and the GPCR decoy database (GLL/GDD). We compared the performance of BRS-3D with other 2D and 3D state-of-the-art molecular descriptors. The results showed that models built with BRS-3D performed best for most GLL/GDD data sets. We also applied BRS-3D in histone deacetylase 1 inhibitors screening and GPCR subtype selectivity prediction. The advantages and disadvantages of this approach are discussed.

  13. In silico design of anti-atherogenic biomaterials.

    PubMed

    Lewis, Daniel R; Kholodovych, Vladyslav; Tomasini, Michael D; Abdelhamid, Dalia; Petersen, Latrisha K; Welsh, William J; Uhrich, Kathryn E; Moghe, Prabhas V

    2013-10-01

    Atherogenesis, the uncontrolled deposition of modified lipoproteins in inflamed arteries, serves as a focal trigger of cardiovascular disease (CVD). Polymeric biomaterials have been envisioned to counteract atherogenesis based on their ability to repress scavenger mediated uptake of oxidized lipoprotein (oxLDL) in macrophages. Following the conceptualization in our laboratories of a new library of amphiphilic macromolecules (AMs), assembled from sugar backbones, aliphatic chains and poly(ethylene glycol) tails, a more rational approach is necessary to parse the diverse features such as charge, hydrophobicity, sugar composition and stereochemistry. In this study, we advance a computational biomaterials design approach to screen and elucidate anti-atherogenic biomaterials with high efficacy. AMs were quantified in terms of not only 1D (molecular formula) and 2D (molecular connectivity) descriptors, but also new 3D (molecular geometry) descriptors of AMs modeled by coarse-grained molecular dynamics (MD) followed by all-atom MD simulations. Quantitative structure-activity relationship (QSAR) models for anti-atherogenic activity were then constructed by screening a total of 1164 descriptors against the corresponding, experimentally measured potency of AM inhibition of oxLDL uptake in human monocyte-derived macrophages. Five key descriptors were identified to provide a strong linear correlation between the predicted and observed anti-atherogenic activity values, and were then used to correctly forecast the efficacy of three newly designed AMs. Thus, a new ligand-based drug design framework was successfully adapted to computationally screen and design biomaterials with cardiovascular therapeutic properties. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Application of the QSPR approach to the boiling points of azeotropes.

    PubMed

    Katritzky, Alan R; Stoyanova-Slavova, Iva B; Tämm, Kaido; Tamm, Tarmo; Karelson, Mati

    2011-04-21

    CODESSA Pro derivative descriptors were calculated for a data set of 426 azeotropic mixtures by the centroid approximation and the weighted-contribution-factor approximation. The two approximations produced almost identical four-descriptor QSPR models relating the structural characteristic of the individual components of azeotropes to the azeotropic boiling points. These models were supported by internal and external validations. The descriptors contributing to the QSPR models are directly related to the three components of the enthalpy (heat) of vaporization.

  15. Spherical harmonics coefficients for ligand-based virtual screening of cyclooxygenase inhibitors.

    PubMed

    Wang, Quan; Birod, Kerstin; Angioni, Carlo; Grösch, Sabine; Geppert, Tim; Schneider, Petra; Rupp, Matthias; Schneider, Gisbert

    2011-01-01

    Molecular descriptors are essential for many applications in computational chemistry, such as ligand-based similarity searching. Spherical harmonics have previously been suggested as comprehensive descriptors of molecular structure and properties. We investigate a spherical harmonics descriptor for shape-based virtual screening. We introduce and validate a partially rotation-invariant three-dimensional molecular shape descriptor based on the norm of spherical harmonics expansion coefficients. Using this molecular representation, we parameterize molecular surfaces, i.e., isosurfaces of spatial molecular property distributions. We validate the shape descriptor in a comprehensive retrospective virtual screening experiment. In a prospective study, we virtually screen a large compound library for cyclooxygenase inhibitors, using a self-organizing map as a pre-filter and the shape descriptor for candidate prioritization. 12 compounds were tested in vitro for direct enzyme inhibition and in a whole blood assay. Active compounds containing a triazole scaffold were identified as direct cyclooxygenase-1 inhibitors. This outcome corroborates the usefulness of spherical harmonics for representation of molecular shape in virtual screening of large compound collections. The combination of pharmacophore and shape-based filtering of screening candidates proved to be a straightforward approach to finding novel bioactive chemotypes with minimal experimental effort.

  16. Global structure–activity relationship model for nonmutagenic carcinogens using virtual ligand-protein interactions as model descriptors

    PubMed Central

    Cunningham, Albert R.; Trent, John O.

    2012-01-01

    Structure–activity relationship (SAR) models are powerful tools to investigate the mechanisms of action of chemical carcinogens and to predict the potential carcinogenicity of untested compounds. We describe the use of a traditional fragment-based SAR approach along with a new virtual ligand-protein interaction-based approach for modeling of nonmutagenic carcinogens. The ligand-based SAR models used descriptors derived from computationally calculated ligand-binding affinities for learning set agents to 5495 proteins. Two learning sets were developed. One set was from the Carcinogenic Potency Database, where chemicals tested for rat carcinogenesis along with Salmonella mutagenicity data were provided. The second was from Malacarne et al. who developed a learning set of nonalerting compounds based on rodent cancer bioassay data and Ashby’s structural alerts. When the rat cancer models were categorized based on mutagenicity, the traditional fragment model outperformed the ligand-based model. However, when the learning sets were composed solely of nonmutagenic or nonalerting carcinogens and noncarcinogens, the fragment model demonstrated a concordance of near 50%, whereas the ligand-based models demonstrated a concordance of 71% for nonmutagenic carcinogens and 74% for nonalerting carcinogens. Overall, these findings suggest that expert system analysis of virtual chemical protein interactions may be useful for developing predictive SAR models for nonmutagenic carcinogens. Moreover, a more practical approach for developing SAR models for carcinogenesis may include fragment-based models for chemicals testing positive for mutagenicity and ligand-based models for chemicals devoid of DNA reactivity. PMID:22678118

  17. Global structure-activity relationship model for nonmutagenic carcinogens using virtual ligand-protein interactions as model descriptors.

    PubMed

    Cunningham, Albert R; Carrasquer, C Alex; Qamar, Shahid; Maguire, Jon M; Cunningham, Suzanne L; Trent, John O

    2012-10-01

    Structure-activity relationship (SAR) models are powerful tools to investigate the mechanisms of action of chemical carcinogens and to predict the potential carcinogenicity of untested compounds. We describe the use of a traditional fragment-based SAR approach along with a new virtual ligand-protein interaction-based approach for modeling of nonmutagenic carcinogens. The ligand-based SAR models used descriptors derived from computationally calculated ligand-binding affinities for learning set agents to 5495 proteins. Two learning sets were developed. One set was from the Carcinogenic Potency Database, where chemicals tested for rat carcinogenesis along with Salmonella mutagenicity data were provided. The second was from Malacarne et al. who developed a learning set of nonalerting compounds based on rodent cancer bioassay data and Ashby's structural alerts. When the rat cancer models were categorized based on mutagenicity, the traditional fragment model outperformed the ligand-based model. However, when the learning sets were composed solely of nonmutagenic or nonalerting carcinogens and noncarcinogens, the fragment model demonstrated a concordance of near 50%, whereas the ligand-based models demonstrated a concordance of 71% for nonmutagenic carcinogens and 74% for nonalerting carcinogens. Overall, these findings suggest that expert system analysis of virtual chemical protein interactions may be useful for developing predictive SAR models for nonmutagenic carcinogens. Moreover, a more practical approach for developing SAR models for carcinogenesis may include fragment-based models for chemicals testing positive for mutagenicity and ligand-based models for chemicals devoid of DNA reactivity.

  18. QSAR Study and Molecular Design of Open-Chain Enaminones as Anticonvulsant Agents

    PubMed Central

    Garro Martinez, Juan C.; Duchowicz, Pablo R.; Estrada, Mario R.; Zamarbide, Graciela N.; Castro, Eduardo A.

    2011-01-01

    Present work employs the QSAR formalism to predict the ED50 anticonvulsant activity of ringed-enaminones, in order to apply these relationships for the prediction of unknown open-chain compounds containing the same types of functional groups in their molecular structure. Two different modeling approaches are applied with the purpose of comparing the consistency of our results: (a) the search of molecular descriptors via multivariable linear regressions; and (b) the calculation of flexible descriptors with the CORAL (CORrelation And Logic) program. Among the results found, we propose some potent candidate open-chain enaminones having ED50 values lower than 10 mg·kg−1 for corresponding pharmacological studies. These compounds are classified as Class 1 and Class 2 according to the Anticonvulsant Selection Project. PMID:22272137

  19. Mobile Visual Search Based on Histogram Matching and Zone Weight Learning

    NASA Astrophysics Data System (ADS)

    Zhu, Chuang; Tao, Li; Yang, Fan; Lu, Tao; Jia, Huizhu; Xie, Xiaodong

    2018-01-01

    In this paper, we propose a novel image retrieval algorithm for mobile visual search. At first, a short visual codebook is generated based on the descriptor database to represent the statistical information of the dataset. Then, an accurate local descriptor similarity score is computed by merging the tf-idf weighted histogram matching and the weighting strategy in compact descriptors for visual search (CDVS). At last, both the global descriptor matching score and the local descriptor similarity score are summed up to rerank the retrieval results according to the learned zone weights. The results show that the proposed approach outperforms the state-of-the-art image retrieval method in CDVS.

  20. 3D QSAR studies on binding affinities of coumarin natural products for glycosomal GAPDH of Trypanosoma cruzi

    NASA Astrophysics Data System (ADS)

    Menezes, Irwin R. A.; Lopes, Julio C. D.; Montanari, Carlos A.; Oliva, Glaucius; Pavão, Fernando; Castilho, Marcelo S.; Vieira, Paulo C.; Pupo, M.^onica T.

    2003-05-01

    Drug design strategies based on Comparative Molecular Field Analysis (CoMFA) have been used to predict the activity of new compounds. The major advantage of this approach is that it permits the analysis of a large number of quantitative descriptors and uses chemometric methods such as partial least squares (PLS) to correlate changes in bioactivity with changes in chemical structure. Because it is often difficult to rationalize all variables affecting the binding affinity of compounds using CoMFA solely, the program GRID was used to describe ligands in terms of their molecular interaction fields, MIFs. The program VolSurf that is able to compress the relevant information present in 3D maps into a few descriptors can treat these GRID fields. The binding affinities of a new set of compounds consisting of 13 coumarins, for one of which the three-dimensional ligand-enzyme bound structure is known, were studied. A final model based on the mentioned programs was independently validated by synthesizing and testing new coumarin derivatives. By relying on our knowledge of the real physical data (i.e., combining crystallographic and binding affinity results), it is also shown that ligand-based design agrees with structure-based design. The compound with the highest binding affinity was the coumarin chalepin, isolated from Rutaceae species, with an IC50 value of 55.5 μM towards the enzyme glyceraldehyde-3-phosphate dehydrogenase (gGAPDH) from glycosomes of the parasite Trypanosoma cruzi, the causative agent of Chagas' disease. The proposed models from GRID MIFs have revealed the importance of lipophilic interactions in modulating the inhibition, but without excluding the dependence on stereo-electronic properties as found from CoMFA fields.

  1. A Novel Two-Step Hierarchical Quantitative Structure–Activity Relationship Modeling Work Flow for Predicting Acute Toxicity of Chemicals in Rodents

    PubMed Central

    Zhu, Hao; Ye, Lin; Richard, Ann; Golbraikh, Alexander; Wright, Fred A.; Rusyn, Ivan; Tropsha, Alexander

    2009-01-01

    Background Accurate prediction of in vivo toxicity from in vitro testing is a challenging problem. Large public–private consortia have been formed with the goal of improving chemical safety assessment by the means of high-throughput screening. Objective A wealth of available biological data requires new computational approaches to link chemical structure, in vitro data, and potential adverse health effects. Methods and results A database containing experimental cytotoxicity values for in vitro half-maximal inhibitory concentration (IC50) and in vivo rodent median lethal dose (LD50) for more than 300 chemicals was compiled by Zentralstelle zur Erfassung und Bewertung von Ersatz- und Ergaenzungsmethoden zum Tierversuch (ZEBET; National Center for Documentation and Evaluation of Alternative Methods to Animal Experiments). The application of conventional quantitative structure–activity relationship (QSAR) modeling approaches to predict mouse or rat acute LD50 values from chemical descriptors of ZEBET compounds yielded no statistically significant models. The analysis of these data showed no significant correlation between IC50 and LD50. However, a linear IC50 versus LD50 correlation could be established for a fraction of compounds. To capitalize on this observation, we developed a novel two-step modeling approach as follows. First, all chemicals are partitioned into two groups based on the relationship between IC50 and LD50 values: One group comprises compounds with linear IC50 versus LD50 relationships, and another group comprises the remaining compounds. Second, we built conventional binary classification QSAR models to predict the group affiliation based on chemical descriptors only. Third, we developed k-nearest neighbor continuous QSAR models for each subclass to predict LD50 values from chemical descriptors. All models were extensively validated using special protocols. Conclusions The novelty of this modeling approach is that it uses the relationships between in vivo and in vitro data only to inform the initial construction of the hierarchical two-step QSAR models. Models resulting from this approach employ chemical descriptors only for external prediction of acute rodent toxicity. PMID:19672406

  2. A novel two-step hierarchical quantitative structure-activity relationship modeling work flow for predicting acute toxicity of chemicals in rodents.

    PubMed

    Zhu, Hao; Ye, Lin; Richard, Ann; Golbraikh, Alexander; Wright, Fred A; Rusyn, Ivan; Tropsha, Alexander

    2009-08-01

    Accurate prediction of in vivo toxicity from in vitro testing is a challenging problem. Large public-private consortia have been formed with the goal of improving chemical safety assessment by the means of high-throughput screening. A wealth of available biological data requires new computational approaches to link chemical structure, in vitro data, and potential adverse health effects. A database containing experimental cytotoxicity values for in vitro half-maximal inhibitory concentration (IC(50)) and in vivo rodent median lethal dose (LD(50)) for more than 300 chemicals was compiled by Zentralstelle zur Erfassung und Bewertung von Ersatz- und Ergaenzungsmethoden zum Tierversuch (ZEBET; National Center for Documentation and Evaluation of Alternative Methods to Animal Experiments). The application of conventional quantitative structure-activity relationship (QSAR) modeling approaches to predict mouse or rat acute LD(50) values from chemical descriptors of ZEBET compounds yielded no statistically significant models. The analysis of these data showed no significant correlation between IC(50) and LD(50). However, a linear IC(50) versus LD(50) correlation could be established for a fraction of compounds. To capitalize on this observation, we developed a novel two-step modeling approach as follows. First, all chemicals are partitioned into two groups based on the relationship between IC(50) and LD(50) values: One group comprises compounds with linear IC(50) versus LD(50) relationships, and another group comprises the remaining compounds. Second, we built conventional binary classification QSAR models to predict the group affiliation based on chemical descriptors only. Third, we developed k-nearest neighbor continuous QSAR models for each subclass to predict LD(50) values from chemical descriptors. All models were extensively validated using special protocols. The novelty of this modeling approach is that it uses the relationships between in vivo and in vitro data only to inform the initial construction of the hierarchical two-step QSAR models. Models resulting from this approach employ chemical descriptors only for external prediction of acute rodent toxicity.

  3. Quantitative structure-retention relationships applied to development of liquid chromatography gradient-elution method for the separation of sartans.

    PubMed

    Golubović, Jelena; Protić, Ana; Otašević, Biljana; Zečević, Mira

    2016-04-01

    QSRR are mathematically derived relationships between the chromatographic parameters determined for a representative series of analytes in given separation systems and the molecular descriptors accounting for the structural differences among the investigated analytes. Artificial neural network is a technique of data analysis, which sets out to emulate the human brain's way of working. The aim of the present work was to optimize separation of six angiotensin receptor antagonists, so-called sartans: losartan, valsartan, irbesartan, telmisartan, candesartan cilexetil and eprosartan in a gradient-elution HPLC method. For this purpose, ANN as a mathematical tool was used for establishing a QSRR model based on molecular descriptors of sartans and varied instrumental conditions. The optimized model can be further used for prediction of an external congener of sartans and analysis of the influence of the analyte structure, represented through molecular descriptors, on retention behaviour. Molecular descriptors included in modelling were electrostatic, geometrical and quantum-chemical descriptors: connolly solvent excluded volume non-1,4 van der Waals energy, octanol/water distribution coefficient, polarizability, number of proton-donor sites and number of proton-acceptor sites. Varied instrumental conditions were gradient time, buffer pH and buffer molarity. High prediction ability of the optimized network enabled complete separation of the analytes within the run time of 15.5 min under following conditions: gradient time of 12.5 min, buffer pH of 3.95 and buffer molarity of 25 mM. Applied methodology showed the potential to predict retention behaviour of an external analyte with the properties within the training space. Connolly solvent excluded volume, polarizability and number of proton-acceptor sites appeared to be most influential paramateres on retention behaviour of the sartans. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. The effect of geometrical presentation of multimodal cation-exchange ligands on selective recognition of hydrophobic regions on protein surfaces.

    PubMed

    Woo, James; Parimal, Siddharth; Brown, Matthew R; Heden, Ryan; Cramer, Steven M

    2015-09-18

    The effects of spatial organization of hydrophobic and charged moieties on multimodal (MM) cation-exchange ligands were examined by studying protein retention behavior on two commercial chromatographic media, Capto™ MMC and Nuvia™ cPrime™. Proteins with extended regions of surface-exposed aliphatic residues were found to have enhanced retention on the Capto MMC system as compared to the Nuvia cPrime resin. The results further indicated that while the Nuvia cPrime ligand had a strong preference for interactions with aromatic groups, the Capto MMC ligand appeared to interact with both aliphatic and aromatic clusters on the protein surfaces. These observations were formalized into a new set of protein surface property descriptors, which quantified the local distribution of electrostatic and hydrophobic potentials as well as distinguishing between aromatic and aliphatic properties. Using these descriptors, high-performing quantitative structure-activity relationship (QSAR) models (R(2)>0.88) were generated for both the Capto MMC and Nuvia cPrime datasets at pH 5 and pH 6. Descriptors of electrostatic properties were generally common across the four models; however both Capto MMC models included descriptors that quantified regions of aliphatic-based hydrophobicity in addition to aromatic descriptors. Retention was generally reduced by lowering the ligand densities on both MM resins. Notably, elution order was largely unaffected by the change in surface density, but smaller and more aliphatic proteins tended to be more affected by this drop in ligand density. This suggests that modulating the exposure, shape and density of the hydrophobic moieties in multimodal chromatographic systems can alter the preference for surface exposed aliphatic or aromatic residues, thus providing an additional dimension for modulating the selectivity of MM protein separation systems. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Determination of solute descriptors by chromatographic methods.

    PubMed

    Poole, Colin F; Atapattu, Sanka N; Poole, Salwa K; Bell, Andrea K

    2009-10-12

    The solvation parameter model is now well established as a useful tool for obtaining quantitative structure-property relationships for chemical, biomedical and environmental processes. The model correlates a free-energy related property of a system to six free-energy derived descriptors describing molecular properties. These molecular descriptors are defined as L (gas-liquid partition coefficient on hexadecane at 298K), V (McGowan's characteristic volume), E (excess molar refraction), S (dipolarity/polarizability), A (hydrogen-bond acidity), and B (hydrogen-bond basicity). McGowan's characteristic volume is trivially calculated from structure and the excess molar refraction can be calculated for liquids from their refractive index and easily estimated for solids. The remaining four descriptors are derived by experiment using (largely) two-phase partitioning, chromatography, and solubility measurements. In this article, the use of gas chromatography, reversed-phase liquid chromatography, micellar electrokinetic chromatography, and two-phase partitioning for determining solute descriptors is described. A large database of experimental retention factors and partition coefficients is constructed after first applying selection tools to remove unreliable experimental values and an optimized collection of varied compounds with descriptor values suitable for calibrating chromatographic systems is presented. These optimized descriptors are demonstrated to be robust and more suitable than other groups of descriptors characterizing the separation properties of chromatographic systems.

  6. A computer program for creating keyword indexes to textual data files

    USGS Publications Warehouse

    Moody, David W.

    1972-01-01

    A keyword-in-context (KWIC) or out-of-context (KWOC) index is a convenient means of organizing information. This keyword index program can be used to create either KWIC or KWOC indexes of bibliographic references or other types of information punched on. cards, typed on optical scanner sheets, or retrieved from various Department of Interior data bases using the Generalized Information Processing System (GIPSY). The index consists of a 'bibliographic' section and a keyword-section based on the permutation of. document titles, project titles, environmental impact statement titles, maps, etc. or lists of descriptors. The program can also create a back-of-the-book index to documents from a list of descriptors. By providing the user with a wide range of input and output options, the program provides the researcher, manager, or librarian with a means of-maintaining a list and index to documents in. a small library, reprint collection, or office file.

  7. Probabilistic modeling of anatomical variability using a low dimensional parameterization of diffeomorphisms.

    PubMed

    Zhang, Miaomiao; Wells, William M; Golland, Polina

    2017-10-01

    We present an efficient probabilistic model of anatomical variability in a linear space of initial velocities of diffeomorphic transformations and demonstrate its benefits in clinical studies of brain anatomy. To overcome the computational challenges of the high dimensional deformation-based descriptors, we develop a latent variable model for principal geodesic analysis (PGA) based on a low dimensional shape descriptor that effectively captures the intrinsic variability in a population. We define a novel shape prior that explicitly represents principal modes as a multivariate complex Gaussian distribution on the initial velocities in a bandlimited space. We demonstrate the performance of our model on a set of 3D brain MRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Our model yields a more compact representation of group variation at substantially lower computational cost than the state-of-the-art method such as tangent space PCA (TPCA) and probabilistic principal geodesic analysis (PPGA) that operate in the high dimensional image space. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Eyeglasses Lens Contour Extraction from Facial Images Using an Efficient Shape Description

    PubMed Central

    Borza, Diana; Darabant, Adrian Sergiu; Danescu, Radu

    2013-01-01

    This paper presents a system that automatically extracts the position of the eyeglasses and the accurate shape and size of the frame lenses in facial images. The novelty brought by this paper consists in three key contributions. The first one is an original model for representing the shape of the eyeglasses lens, using Fourier descriptors. The second one is a method for generating the search space starting from a finite, relatively small number of representative lens shapes based on Fourier morphing. Finally, we propose an accurate lens contour extraction algorithm using a multi-stage Monte Carlo sampling technique. Multiple experiments demonstrate the effectiveness of our approach. PMID:24152926

  9. Toxicity prediction of PHDDs and phenols in the light of nucleic acid bases and DNA base pair interaction.

    PubMed

    Mondal Roy, Sutapa; Roy, Debesh R; Sahoo, Suban K

    2015-11-01

    The applicability of Density Functional Theory (DFT) based descriptors for the development of quantitative structure-toxicity relationships (QSTR) is assessed for two different series of toxic aromatic compounds, viz., polyhalogenated dibenzo-p-dioxins (PHDDs) and phenols (PHs). A series of 20 compounds each for PHDDs and PHs with their experimental toxicities (IC50 and IGC50) is chosen in the present study to develop DFT based efficient quantum chemical parameters (QCPs) for explaining the toxin potential of the considered compounds. A systematic analysis to find out the electron donation/acceptance nature of these selected compounds with the considered model biosystems, viz., nucleic acid (NA) bases and DNA base pairs, is performed to identify potential QCPs. Accordingly, PHDDs is found to be electron acceptors whereas phenols as donors, during their interaction with biosystems. Two parameter regression model is carried out comprising global charge transfer (ΔN), and local Fukui Function's for nucleophilic attack (fk(+)) for PHDDs and the same for electrophilic attack (fk(-)) in case of PHs. It is heartening to note that our chosen descriptors, viz, charge transfer (ΔN) and Fukui Function (fk(±)) plays a crucial role by explaining more than 90% of the observed toxic behavior (in terms of correlation-coefficient, R) of PHDDs and PHs. The developed QCPs, viz., ΔN and fk(±) can be added as the new descriptors in the QSTR parlance. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Rate constants of hydroxyl radical oxidation of polychlorinated biphenyls in the gas phase: A single-descriptor based QSAR and DFT study.

    PubMed

    Yang, Zhihui; Luo, Shuang; Wei, Zongsu; Ye, Tiantian; Spinney, Richard; Chen, Dong; Xiao, Ruiyang

    2016-04-01

    The second-order rate constants (k) of hydroxyl radical (·OH) with polychlorinated biphenyls (PCBs) in the gas phase are of scientific and regulatory importance for assessing their global distribution and fate in the atmosphere. Due to the limited number of measured k values, there is a need to model the k values for unknown PCBs congeners. In the present study, we developed a quantitative structure-activity relationship (QSAR) model with quantum chemical descriptors using a sequential approach, including correlation analysis, principal component analysis, multi-linear regression, validation, and estimation of applicability domain. The result indicates that the single descriptor, polarizability (α), plays an important role in determining the reactivity with a global standardized function of lnk = -0.054 × α ‒ 19.49 at 298 K. In order to validate the QSAR predicted k values and expand the current k value database for PCBs congeners, an independent method, density functional theory (DFT), was employed to calculate the kinetics and thermodynamics of the gas-phase ·OH oxidation of 2,4',5-trichlorobiphenyl (PCB31), 2,2',4,4'-tetrachlorobiphenyl (PCB47), 2,3,4,5,6-pentachlorobiphenyl (PCB116), 3,3',4,4',5,5'-hexachlorobiphenyl (PCB169), and 2,3,3',4,5,5',6-heptachlorobiphenyl (PCB192) at 298 K at B3LYP/6-311++G**//B3LYP/6-31 + G** level of theory. The QSAR predicted and DFT calculated k values for ·OH oxidation of these PCB congeners exhibit excellent agreement with the experimental k values, indicating the robustness and predictive power of the single-descriptor based QSAR model we developed. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Resting State fMRI Functional Connectivity-Based Classification Using a Convolutional Neural Network Architecture

    PubMed Central

    Meszlényi, Regina J.; Buza, Krisztian; Vidnyánszky, Zoltán

    2017-01-01

    Machine learning techniques have become increasingly popular in the field of resting state fMRI (functional magnetic resonance imaging) network based classification. However, the application of convolutional networks has been proposed only very recently and has remained largely unexplored. In this paper we describe a convolutional neural network architecture for functional connectome classification called connectome-convolutional neural network (CCNN). Our results on simulated datasets and a publicly available dataset for amnestic mild cognitive impairment classification demonstrate that our CCNN model can efficiently distinguish between subject groups. We also show that the connectome-convolutional network is capable to combine information from diverse functional connectivity metrics and that models using a combination of different connectivity descriptors are able to outperform classifiers using only one metric. From this flexibility follows that our proposed CCNN model can be easily adapted to a wide range of connectome based classification or regression tasks, by varying which connectivity descriptor combinations are used to train the network. PMID:29089883

  12. Resting State fMRI Functional Connectivity-Based Classification Using a Convolutional Neural Network Architecture.

    PubMed

    Meszlényi, Regina J; Buza, Krisztian; Vidnyánszky, Zoltán

    2017-01-01

    Machine learning techniques have become increasingly popular in the field of resting state fMRI (functional magnetic resonance imaging) network based classification. However, the application of convolutional networks has been proposed only very recently and has remained largely unexplored. In this paper we describe a convolutional neural network architecture for functional connectome classification called connectome-convolutional neural network (CCNN). Our results on simulated datasets and a publicly available dataset for amnestic mild cognitive impairment classification demonstrate that our CCNN model can efficiently distinguish between subject groups. We also show that the connectome-convolutional network is capable to combine information from diverse functional connectivity metrics and that models using a combination of different connectivity descriptors are able to outperform classifiers using only one metric. From this flexibility follows that our proposed CCNN model can be easily adapted to a wide range of connectome based classification or regression tasks, by varying which connectivity descriptor combinations are used to train the network.

  13. External validation of a publicly available computer assisted diagnostic tool for mammographic mass lesions with two high prevalence research datasets.

    PubMed

    Benndorf, Matthias; Burnside, Elizabeth S; Herda, Christoph; Langer, Mathias; Kotter, Elmar

    2015-08-01

    Lesions detected at mammography are described with a highly standardized terminology: the breast imaging-reporting and data system (BI-RADS) lexicon. Up to now, no validated semantic computer assisted classification algorithm exists to interactively link combinations of morphological descriptors from the lexicon to a probabilistic risk estimate of malignancy. The authors therefore aim at the external validation of the mammographic mass diagnosis (MMassDx) algorithm. A classification algorithm like MMassDx must perform well in a variety of clinical circumstances and in datasets that were not used to generate the algorithm in order to ultimately become accepted in clinical routine. The MMassDx algorithm uses a naïve Bayes network and calculates post-test probabilities of malignancy based on two distinct sets of variables, (a) BI-RADS descriptors and age ("descriptor model") and (b) BI-RADS descriptors, age, and BI-RADS assessment categories ("inclusive model"). The authors evaluate both the MMassDx (descriptor) and MMassDx (inclusive) models using two large publicly available datasets of mammographic mass lesions: the digital database for screening mammography (DDSM) dataset, which contains two subsets from the same examinations-a medio-lateral oblique (MLO) view and cranio-caudal (CC) view dataset-and the mammographic mass (MM) dataset. The DDSM contains 1220 mass lesions and the MM dataset contains 961 mass lesions. The authors evaluate discriminative performance using area under the receiver-operating-characteristic curve (AUC) and compare this to the BI-RADS assessment categories alone (i.e., the clinical performance) using the DeLong method. The authors also evaluate whether assigned probabilistic risk estimates reflect the lesions' true risk of malignancy using calibration curves. The authors demonstrate that the MMassDx algorithms show good discriminatory performance. AUC for the MMassDx (descriptor) model in the DDSM data is 0.876/0.895 (MLO/CC view) and AUC for the MMassDx (inclusive) model in the DDSM data is 0.891/0.900 (MLO/CC view). AUC for the MMassDx (descriptor) model in the MM data is 0.862 and AUC for the MMassDx (inclusive) model in the MM data is 0.900. In all scenarios, MMassDx performs significantly better than clinical performance, P < 0.05 each. The authors furthermore demonstrate that the MMassDx algorithm systematically underestimates the risk of malignancy in the DDSM and MM datasets, especially when low probabilities of malignancy are assigned. The authors' results reveal that the MMassDx algorithms have good discriminatory performance but less accurate calibration when tested on two independent validation datasets. Improvement in calibration and testing in a prospective clinical population will be important steps in the pursuit of translation of these algorithms to the clinic.

  14. Numerical study of wall shear stress-based descriptors in the human left coronary artery.

    PubMed

    Pinto, S I S; Campos, J B L M

    2016-10-01

    The present work is about the application of wall shear stress descriptors - time averaged wall shear stress (TAWSS), oscillating shear index (OSI) and relative residence time (RRT) - to the study of blood flow in the left coronary artery (LCA). These descriptors aid the prediction of disturbed flow conditions in the vessels and play a significant role in the detection of potential zones of atherosclerosis development. Hemodynamic descriptors data were obtained, numerically, through ANSYS® software, for the LCA of a patient-specific geometry and for a 3D idealized model. Comparing both cases, the results are coherent, in terms of location and magnitude. Low TAWSS, high OSI and high RRT values are observed in the bifurcation - potential zone of atherosclerosis appearance. The dissimilarities observed in the TAWSS values, considering blood as a Newtonian or non-Newtonian fluid, releases the importance of the correct blood rheologic caracterization. Moreover, for a higher Reynolds number, the TAWSS values decrease in the bifurcation and along the LAD branch, increasing the probability of plaques deposition. Furthermore, for a stenotic LCA model, very low TAWSS and high RRT values in front and behind the stenosis are observed, indicating the probable extension, in the flow direction, of the lesion.

  15. Modeling Geometric-Temporal Context With Directional Pyramid Co-Occurrence for Action Recognition.

    PubMed

    Yuan, Chunfeng; Li, Xi; Hu, Weiming; Ling, Haibin; Maybank, Stephen J

    2014-02-01

    In this paper, we present a new geometric-temporal representation for visual action recognition based on local spatio-temporal features. First, we propose a modified covariance descriptor under the log-Euclidean Riemannian metric to represent the spatio-temporal cuboids detected in the video sequences. Compared with previously proposed covariance descriptors, our descriptor can be measured and clustered in Euclidian space. Second, to capture the geometric-temporal contextual information, we construct a directional pyramid co-occurrence matrix (DPCM) to describe the spatio-temporal distribution of the vector-quantized local feature descriptors extracted from a video. DPCM characterizes the co-occurrence statistics of local features as well as the spatio-temporal positional relationships among the concurrent features. These statistics provide strong descriptive power for action recognition. To use DPCM for action recognition, we propose a directional pyramid co-occurrence matching kernel to measure the similarity of videos. The proposed method achieves the state-of-the-art performance and improves on the recognition performance of the bag-of-visual-words (BOVWs) models by a large margin on six public data sets. For example, on the KTH data set, it achieves 98.78% accuracy while the BOVW approach only achieves 88.06%. On both Weizmann and UCF CIL data sets, the highest possible accuracy of 100% is achieved.

  16. QSAR, QSPR and QSRR in Terms of 3-D-MoRSE Descriptors for In Silico Screening of Clofibric Acid Analogues.

    PubMed

    Di Tullio, Maurizio; Maccallini, Cristina; Ammazzalorso, Alessandra; Giampietro, Letizia; Amoroso, Rosa; De Filippis, Barbara; Fantacuzzi, Marialuigia; Wiczling, Paweł; Kaliszan, Roman

    2012-07-01

    A series of 27 analogues of clofibric acid, mostly heteroarylalkanoic derivatives, have been analyzed by a novel high-throughput reversed-phase HPLC method employing combined gradient of eluent's pH and organic modifier content. The such determined hydrophobicity (lipophilicity) parameters, log kw , and acidity constants, pKa , were subjected to multiple regression analysis to get a QSRR (Quantitative StructureRetention Relationships) and a QSPR (Quantitative Structure-Property Relationships) equation, respectively, describing these pharmacokinetics-determining physicochemical parameters in terms of the calculation chemistry derived structural descriptors. The previously determined in vitro log EC50 values - transactivation activity towards PPARα (human Peroxisome Proliferator-Activated Receptor α) - have also been described in a QSAR (Quantitative StructureActivity Relationships) equation in terms of the 3-D-MoRSE descriptors (3D-Molecule Representation of Structures based on Electron diffraction descriptors). The QSAR model derived can serve for an a priori prediction of bioactivity in vitro of any designed analogue, whereas the QSRR and the QSPR models can be used to evaluate lipophilicity and acidity, respectively, of the compounds, and hence to rational guide selection of structures of proper pharmacokinetics. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Shuffling cross-validation-bee algorithm as a new descriptor selection method for retention studies of pesticides in biopartitioning micellar chromatography.

    PubMed

    Zarei, Kobra; Atabati, Morteza; Ahmadi, Monire

    2017-05-04

    Bee algorithm (BA) is an optimization algorithm inspired by the natural foraging behaviour of honey bees to find the optimal solution which can be proposed to feature selection. In this paper, shuffling cross-validation-BA (CV-BA) was applied to select the best descriptors that could describe the retention factor (log k) in the biopartitioning micellar chromatography (BMC) of 79 heterogeneous pesticides. Six descriptors were obtained using BA and then the selected descriptors were applied for model development using multiple linear regression (MLR). The descriptor selection was also performed using stepwise, genetic algorithm and simulated annealing methods and MLR was applied to model development and then the results were compared with those obtained from shuffling CV-BA. The results showed that shuffling CV-BA can be applied as a powerful descriptor selection method. Support vector machine (SVM) was also applied for model development using six selected descriptors by BA. The obtained statistical results using SVM were better than those obtained using MLR, as the root mean square error (RMSE) and correlation coefficient (R) for whole data set (training and test), using shuffling CV-BA-MLR, were obtained as 0.1863 and 0.9426, respectively, while these amounts for the shuffling CV-BA-SVM method were obtained as 0.0704 and 0.9922, respectively.

  18. Chemically Aware Model Builder (camb): an R package for property and bioactivity modelling of small molecules.

    PubMed

    Murrell, Daniel S; Cortes-Ciriano, Isidro; van Westen, Gerard J P; Stott, Ian P; Bender, Andreas; Malliavin, Thérèse E; Glen, Robert C

    2015-01-01

    In silico predictive models have proved to be valuable for the optimisation of compound potency, selectivity and safety profiles in the drug discovery process. camb is an R package that provides an environment for the rapid generation of quantitative Structure-Property and Structure-Activity models for small molecules (including QSAR, QSPR, QSAM, PCM) and is aimed at both advanced and beginner R users. camb's capabilities include the standardisation of chemical structure representation, computation of 905 one-dimensional and 14 fingerprint type descriptors for small molecules, 8 types of amino acid descriptors, 13 whole protein sequence descriptors, filtering methods for feature selection, generation of predictive models (using an interface to the R package caret), as well as techniques to create model ensembles using techniques from the R package caretEnsemble). Results can be visualised through high-quality, customisable plots (R package ggplot2). Overall, camb constitutes an open-source framework to perform the following steps: (1) compound standardisation, (2) molecular and protein descriptor calculation, (3) descriptor pre-processing and model training, visualisation and validation, and (4) bioactivity/property prediction for new molecules. camb aims to speed model generation, in order to provide reproducibility and tests of robustness. QSPR and proteochemometric case studies are included which demonstrate camb's application.Graphical abstractFrom compounds and data to models: a complete model building workflow in one package.

  19. Prediction of pesticide acute toxicity using two-dimensional chemical descriptors and target species classification

    EPA Science Inventory

    Previous modelling of the median lethal dose (oral rat LD50) has indicated that local class-based models yield better correlations than global models. We evaluated the hypothesis that dividing the dataset by pesticidal mechanisms would improve prediction accuracy. A linear discri...

  20. Visual feature discrimination versus compression ratio for polygonal shape descriptors

    NASA Astrophysics Data System (ADS)

    Heuer, Joerg; Sanahuja, Francesc; Kaup, Andre

    2000-10-01

    In the last decade several methods for low level indexing of visual features appeared. Most often these were evaluated with respect to their discrimination power using measures like precision and recall. Accordingly, the targeted application was indexing of visual data within databases. During the standardization process of MPEG-7 the view on indexing of visual data changed, taking also communication aspects into account where coding efficiency is important. Even if the descriptors used for indexing are small compared to the size of images, it is recognized that there can be several descriptors linked to an image, characterizing different features and regions. Beside the importance of a small memory footprint for the transmission of the descriptor and the memory footprint in a database, eventually the search and filtering can be sped up by reducing the dimensionality of the descriptor if the metric of the matching can be adjusted. Based on a polygon shape descriptor presented for MPEG-7 this paper compares the discrimination power versus memory consumption of the descriptor. Different methods based on quantization are presented and their effect on the retrieval performance are measured. Finally an optimized computation of the descriptor is presented.

  1. MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes.

    PubMed

    Kim, Sungjin; Jinich, Adrián; Aspuru-Guzik, Alán

    2017-04-24

    We propose a multiple descriptor multiple kernel (MultiDK) method for efficient molecular discovery using machine learning. We show that the MultiDK method improves both the speed and accuracy of molecular property prediction. We apply the method to the discovery of electrolyte molecules for aqueous redox flow batteries. Using multiple-type-as opposed to single-type-descriptors, we obtain more relevant features for machine learning. Following the principle of "wisdom of the crowds", the combination of multiple-type descriptors significantly boosts prediction performance. Moreover, by employing multiple kernels-more than one kernel function for a set of the input descriptors-MultiDK exploits nonlinear relations between molecular structure and properties better than a linear regression approach. The multiple kernels consist of a Tanimoto similarity kernel and a linear kernel for a set of binary descriptors and a set of nonbinary descriptors, respectively. Using MultiDK, we achieve an average performance of r 2 = 0.92 with a test set of molecules for solubility prediction. We also extend MultiDK to predict pH-dependent solubility and apply it to a set of quinone molecules with different ionizable functional groups to assess their performance as flow battery electrolytes.

  2. Adjacent bin stability evaluating for feature description

    NASA Astrophysics Data System (ADS)

    Nie, Dongdong; Ma, Qinyong

    2018-04-01

    Recent study improves descriptor performance by accumulating stability votes for all scale pairs to compose the local descriptor. We argue that the stability of a bin depends on the differences across adjacent pairs more than the differences across all scale pairs, and a new local descriptor is composed based on the hypothesis. A series of SIFT descriptors are extracted from multiple scales firstly. Then the difference value of the bin across adjacent scales is calculated, and the stability value of a bin is calculated based on it and accumulated to compose the final descriptor. The performance of the proposed method is evaluated with two popular matching datasets, and compared with other state-of-the-art works. Experimental results show that the proposed method performs satisfactorily.

  3. Modeling the Tendency for Music to Induce Movement in Humans: First Correlations with Low-Level Audio Descriptors across Music Genres

    ERIC Educational Resources Information Center

    Madison, Guy; Gouyon, Fabien; Ullen, Fredrik; Hornstrom, Kalle

    2011-01-01

    "Groove" is often described as the experience of music that makes people tap their feet and want to dance. A high degree of consistency in ratings of groove across listeners indicates that physical properties of the sound signal contribute to groove (Madison, 2006). Here, correlations were assessed between listeners' ratings and a number…

  4. A comparison between space-time video descriptors

    NASA Astrophysics Data System (ADS)

    Costantini, Luca; Capodiferro, Licia; Neri, Alessandro

    2013-02-01

    The description of space-time patches is a fundamental task in many applications such as video retrieval or classification. Each space-time patch can be described by using a set of orthogonal functions that represent a subspace, for example a sphere or a cylinder, within the patch. In this work, our aim is to investigate the differences between the spherical descriptors and the cylindrical descriptors. In order to compute the descriptors, the 3D spherical and cylindrical Zernike polynomials are employed. This is important because both the functions are based on the same family of polynomials, and only the symmetry is different. Our experimental results show that the cylindrical descriptor outperforms the spherical descriptor. However, the performances of the two descriptors are similar.

  5. MIND: modality independent neighbourhood descriptor for multi-modal deformable registration.

    PubMed

    Heinrich, Mattias P; Jenkinson, Mark; Bhushan, Manav; Matin, Tahreema; Gleeson, Fergus V; Brady, Sir Michael; Schnabel, Julia A

    2012-10-01

    Deformable registration of images obtained from different modalities remains a challenging task in medical image analysis. This paper addresses this important problem and proposes a modality independent neighbourhood descriptor (MIND) for both linear and deformable multi-modal registration. Based on the similarity of small image patches within one image, it aims to extract the distinctive structure in a local neighbourhood, which is preserved across modalities. The descriptor is based on the concept of image self-similarity, which has been introduced for non-local means filtering for image denoising. It is able to distinguish between different types of features such as corners, edges and homogeneously textured regions. MIND is robust to the most considerable differences between modalities: non-functional intensity relations, image noise and non-uniform bias fields. The multi-dimensional descriptor can be efficiently computed in a dense fashion across the whole image and provides point-wise local similarity across modalities based on the absolute or squared difference between descriptors, making it applicable for a wide range of transformation models and optimisation algorithms. We use the sum of squared differences of the MIND representations of the images as a similarity metric within a symmetric non-parametric Gauss-Newton registration framework. In principle, MIND would be applicable to the registration of arbitrary modalities. In this work, we apply and validate it for the registration of clinical 3D thoracic CT scans between inhale and exhale as well as the alignment of 3D CT and MRI scans. Experimental results show the advantages of MIND over state-of-the-art techniques such as conditional mutual information and entropy images, with respect to clinically annotated landmark locations. Copyright © 2012 Elsevier B.V. All rights reserved.

  6. Computational modeling of high performance steel fiber reinforced concrete using a micromorphic approach

    NASA Astrophysics Data System (ADS)

    Huespe, A. E.; Oliver, J.; Mora, D. F.

    2013-12-01

    A finite element methodology for simulating the failure of high performance fiber reinforced concrete composites (HPFRC), with arbitrarily oriented short fibers, is presented. The composite material model is based on a micromorphic approach. Using the framework provided by this theory, the body configuration space is described through two kinematical descriptors. At the structural level, the displacement field represents the standard kinematical descriptor. Additionally, a morphological kinematical descriptor, the micromorphic field, is introduced. It describes the fiber-matrix relative displacement, or slipping mechanism of the bond, observed at the mesoscale level. In the first part of this paper, we summarize the model formulation of the micromorphic approach presented in a previous work by the authors. In the second part, and as the main contribution of the paper, we address specific issues related to the numerical aspects involved in the computational implementation of the model. The developed numerical procedure is based on a mixed finite element technique. The number of dofs per node changes according with the number of fiber bundles simulated in the composite. Then, a specific solution scheme is proposed to solve the variable number of unknowns in the discrete model. The HPFRC composite model takes into account the important effects produced by concrete fracture. A procedure for simulating quasi-brittle fracture is introduced into the model and is described in the paper. The present numerical methodology is assessed by simulating a selected set of experimental tests which proves its viability and accuracy to capture a number of mechanical phenomenon interacting at the macro- and mesoscale and leading to failure of HPFRC composites.

  7. Bond-based linear indices of the non-stochastic and stochastic edge-adjacency matrix. 1. Theory and modeling of ChemPhys properties of organic molecules.

    PubMed

    Marrero-Ponce, Yovani; Martínez-Albelo, Eugenio R; Casañola-Martín, Gerardo M; Castillo-Garit, Juan A; Echevería-Díaz, Yunaimy; Zaldivar, Vicente Romero; Tygat, Jan; Borges, José E Rodriguez; García-Domenech, Ramón; Torrens, Francisco; Pérez-Giménez, Facundo

    2010-11-01

    Novel bond-level molecular descriptors are proposed, based on linear maps similar to the ones defined in algebra theory. The kth edge-adjacency matrix (E(k)) denotes the matrix of bond linear indices (non-stochastic) with regard to canonical basis set. The kth stochastic edge-adjacency matrix, ES(k), is here proposed as a new molecular representation easily calculated from E(k). Then, the kth stochastic bond linear indices are calculated using ES(k) as operators of linear transformations. In both cases, the bond-type formalism is developed. The kth non-stochastic and stochastic total linear indices are calculated by adding the kth non-stochastic and stochastic bond linear indices, respectively, of all bonds in molecule. First, the new bond-based molecular descriptors (MDs) are tested for suitability, for the QSPRs, by analyzing regressions of novel indices for selected physicochemical properties of octane isomers (first round). General performance of the new descriptors in this QSPR studies is evaluated with regard to the well-known sets of 2D/3D MDs. From the analysis, we can conclude that the non-stochastic and stochastic bond-based linear indices have an overall good modeling capability proving their usefulness in QSPR studies. Later, the novel bond-level MDs are also used for the description and prediction of the boiling point of 28 alkyl-alcohols (second round), and to the modeling of the specific rate constant (log k), partition coefficient (log P), as well as the antibacterial activity of 34 derivatives of 2-furylethylenes (third round). The comparison with other approaches (edge- and vertices-based connectivity indices, total and local spectral moments, and quantum chemical descriptors as well as E-state/biomolecular encounter parameters) exposes a good behavior of our method in this QSPR studies. Finally, the approach described in this study appears to be a very promising structural invariant, useful not only for QSPR studies but also for similarity/diversity analysis and drug discovery protocols.

  8. Novel dimer based descriptors with solvational computation for QSAR study of oxadiazoylbenzoyl-ureas as novel insect-growth regulators.

    PubMed

    Fan, Feng; Cheng, Jiagao; Li, Zhong; Xu, Xiaoyong; Qian, Xuhong

    2010-02-01

    Molecular aggregation state of bioactive compounds plays a key role in its bio-interactive procedure. In this article, based on the structure information of dimers, the simplest model of molecular aggregation state, and combined with solvational computation, total four descriptors (DeltaV, MR2, DeltaE(1), and DeltaE(2)) were calculated for QSAR study of a novel insect-growth regulator, N-(5-phenyl-1,3,4-oxadiazol-2-yl)-N'-benzoyl urea. Two QSAR models were constructed with r(2) = 0.671, q(2) = 0.516 and r(2) = 0.816, q(2) = 0.695, respectively. It implicates that the bioactivity may strongly depend on the characters of molecular aggregation state, especially on the dimeric transport ability from oil phase to water phase. Copyright 2009 Wiley Periodicals, Inc.

  9. Assessment of Beer Quality Based on a Robotic Pourer, Computer Vision, and Machine Learning Algorithms Using Commercial Beers.

    PubMed

    Gonzalez Viejo, Claudia; Fuentes, Sigfredo; Torrico, Damir D; Howell, Kate; Dunshea, Frank R

    2018-05-01

    Sensory attributes of beer are directly linked to perceived foam-related parameters and beer color. The aim of this study was to develop an objective predictive model using machine learning modeling to assess the intensity levels of sensory descriptors in beer using the physical measurements of color and foam-related parameters. A robotic pourer (RoboBEER), was used to obtain 15 color and foam-related parameters from 22 different commercial beer samples. A sensory session using quantitative descriptive analysis (QDA ® ) with trained panelists was conducted to assess the intensity of 10 beer descriptors. Results showed that the principal component analysis explained 64% of data variability with correlations found between foam-related descriptors from sensory and RoboBEER such as the positive and significant correlation between carbon dioxide and carbonation mouthfeel (R = 0.62), correlation of viscosity to sensory, and maximum volume of foam and total lifetime of foam (R = 0.75, R = 0.77, respectively). Using the RoboBEER parameters as inputs, an artificial neural network (ANN) regression model showed high correlation (R = 0.91) to predict the intensity levels of 10 related sensory descriptors such as yeast, grains and hops aromas, hops flavor, bitter, sour and sweet tastes, viscosity, carbonation, and astringency. This paper is a novel approach for food science using machine modeling techniques that could contribute significantly to rapid screenings of food and brewage products for the food industry and the implementation of Artificial Intelligence (AI). The use of RoboBEER to assess beer quality showed to be a reliable, objective, accurate, and less time-consuming method to predict sensory descriptors compared to trained sensory panels. Hence, this method could be useful as a rapid screening procedure to evaluate beer quality at the end of the production line for industry applications. © 2018 Institute of Food Technologists®.

  10. A chirality-based metrics for free-energy calculations in biomolecular systems.

    PubMed

    Pietropaolo, Adriana; Branduardi, Davide; Bonomi, Massimiliano; Parrinello, Michele

    2011-09-01

    In this work, we exploit the chirality index introduced in (Pietropaolo et al., Proteins 2008, 70, 667) as an effective descriptor of the secondary structure of proteins to explore their complex free-energy landscape. We use the chirality index as an alternative metrics in the path collective variables (PCVs) framework and we show in the prototypical case of the C-terminal domain of immunoglobulin binding protein GB1 that relevant configurations can be efficiently sampled in combination with well-tempered metadynamics. While the projections of the configurations found onto a variety of different descriptors are fully consistent with previously reported calculations, this approach provides a unifying perspective of the folding mechanism which was not possible using metadynamics with the previous formulation of PCVs. Copyright © 2011 Wiley Periodicals, Inc.

  11. Novel Uses of In Vitro Data to Develop Quantitative Biological Activity Relationship Models for in Vivo Carcinogenicity Prediction.

    PubMed

    Pradeep, Prachi; Povinelli, Richard J; Merrill, Stephen J; Bozdag, Serdar; Sem, Daniel S

    2015-04-01

    The availability of large in vitro datasets enables better insight into the mode of action of chemicals and better identification of potential mechanism(s) of toxicity. Several studies have shown that not all in vitro assays can contribute as equal predictors of in vivo carcinogenicity for development of hybrid Quantitative Structure Activity Relationship (QSAR) models. We propose two novel approaches for the use of mechanistically relevant in vitro assay data in the identification of relevant biological descriptors and development of Quantitative Biological Activity Relationship (QBAR) models for carcinogenicity prediction. We demonstrate that in vitro assay data can be used to develop QBAR models for in vivo carcinogenicity prediction via two case studies corroborated with firm scientific rationale. The case studies demonstrate the similarities between QBAR and QSAR modeling in: (i) the selection of relevant descriptors to be used in the machine learning algorithm, and (ii) the development of a computational model that maps chemical or biological descriptors to a toxic endpoint. The results of both the case studies show: (i) improved accuracy and sensitivity which is especially desirable under regulatory requirements, and (ii) overall adherence with the OECD/REACH guidelines. Such mechanism based models can be used along with QSAR models for prediction of mechanistically complex toxic endpoints. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Quantitative structure carcinogenicity relationship for detecting structural alerts in nitroso-compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Helguera, Aliuska Morales; Molecular Simulation and Drug Design, Chemical Bioactive Center, Central University of Las Villas, Santa Clara, 54830, Villa Clara; Department of Chemistry, Central University of Las Villas, Santa Clara, 54830, Villa Clara

    2008-09-01

    In this work, Quantitative Structure-Activity Relationship (QSAR) modelling was used as a tool for predicting the carcinogenic potency of a set of 39 nitroso-compounds, which have been bioassayed in male rats by using the oral route of administration. The optimum QSAR model provided evidence of good fit and performance of predicitivity from training set. It was able to account for about 84% of the variance in the experimental activity and exhibited high values of the determination coefficients of cross validations, leave one out and bootstrapping (q{sup 2}{sub LOO} = 78.53 and q{sup 2}{sub Boot} = 74.97). Such a model wasmore » based on spectral moments weighted with Gasteiger-Marsilli atomic charges, polarizability and hydrophobicity, as well as with Abraham indexes, specifically the summation solute hydrogen bond basicity and the combined dipolarity/polarizability. This is the first study to have explored the possibility of combining Abraham solute descriptors with spectral moments. A reasonable interpretation of these molecular descriptors from a toxicological point of view was achieved by means of taking into account bond contributions. The set of relationships so derived revealed the importance of the length of the alkyl chains for determining carcinogenic potential of the chemicals analysed, and were able to explain the difference between mono-substituted and di-substituted nitrosoureas as well as to discriminate between isomeric structures with hydroxyl-alkyl and alkyl substituents in different positions. Moreover, they allowed the recognition of structural alerts in classical structures of two potent nitrosamines, consistent with their biotransformation. These results indicate that this new approach has the potential for improving carcinogenicity predictions based on the identification of structural alerts.« less

  13. Structure-activity relationships for serotonin transporter and dopamine receptor selectivity.

    PubMed

    Agatonovic-Kustrin, Snezana; Davies, Paul; Turner, Joseph V

    2009-05-01

    Antipsychotic medications have a diverse pharmacology with affinity for serotonergic, dopaminergic, adrenergic, histaminergic and cholinergic receptors. Their clinical use now also includes the treatment of mood disorders, thought to be mediated by serotonergic receptor activity. The aim of our study was to characterise the molecular properties of antipsychotic agents, and to develop a model that would indicate molecular specificity for the dopamine (D(2)) receptor and the serotonin (5-HT) transporter. Back-propagation artificial neural networks (ANNs) were trained on a dataset of 47 ligands categorically assigned antidepressant or antipsychotic utility. The structure of each compound was encoded with 63 calculated molecular descriptors. ANN parameters including hidden neurons and input descriptors were optimised based on sensitivity analyses, with optimum models containing between four and 14 descriptors. Predicted binding preferences were in excellent agreement with clinical antipsychotic or antidepressant utility. Validated models were further tested by use of an external prediction set of five drugs with unknown mechanism of action. The SAR models developed revealed the importance of simple molecular characteristics for differential binding to the D(2) receptor and the 5-HT transporter. These included molecular size and shape, solubility parameters, hydrogen donating potential, electrostatic parameters, stereochemistry and presence of nitrogen. The developed models and techniques employed are expected to be useful in the rational design of future therapeutic agents.

  14. In silico quantitative structure-toxicity relationship study of aromatic nitro compounds.

    PubMed

    Pasha, Farhan Ahmad; Neaz, Mohammad Morshed; Cho, Seung Joo; Ansari, Mohiuddin; Mishra, Sunil Kumar; Tiwari, Sharvan

    2009-05-01

    Small molecules often have toxicities that are a function of molecular structural features. Minor variations in structural features can make large difference in such toxicity. Consequently, in silico techniques may be used to correlate such molecular toxicities with their structural features. Relative to nine different sets of aromatic nitro compounds having known observed toxicities against different targets, we developed ligand-based 2D quantitative structure-toxicity relationship models using 20 selected topological descriptors. The topological descriptors have several advantages such as conformational independency, facile and less time-consuming computation to yield good results. Multiple linear regression analysis was used to correlate variations of toxicity with molecular properties. The information index on molecular size, lopping centric index and Kier flexibility index were identified as fundamental descriptors for different kinds of toxicity, and further showed that molecular size, branching and molecular flexibility might be particularly important factors in quantitative structure-toxicity relationship analysis. This study revealed that topological descriptor-guided quantitative structure-toxicity relationship provided a very useful, cost and time-efficient, in silico tool for describing small-molecule toxicities.

  15. Temperature and relative humidity influence the ripening descriptors of Camembert-type cheeses throughout ripening.

    PubMed

    Leclercq-Perlat, M-N; Sicard, M; Perrot, N; Trelea, I C; Picque, D; Corrieu, G

    2015-02-01

    Ripening descriptors are the main factors that determine consumers' preferences of soft cheeses. Six descriptors were defined to represent the sensory changes in Camembert cheeses: Penicillium camemberti appearance, cheese odor and rind color, creamy underrind thickness and consistency, and core hardness. To evaluate the effects of the main process parameters on these descriptors, Camembert cheeses were ripened under different temperatures (8, 12, and 16°C) and relative humidity (RH; 88, 92, and 98%). The sensory descriptors were highly dependent on the temperature and RH used throughout ripening in a ripening chamber. All sensory descriptor changes could be explained by microorganism growth, pH, carbon substrate metabolism, and cheese moisture, as well as by microbial enzymatic activities. On d 40, at 8°C and 88% RH, all sensory descriptors scored the worst: the cheese was too dry, its odor and its color were similar to those of the unripe cheese, the underrind was driest, and the core was hardest. At 16°C and 98% RH, the odor was strongly ammonia and the color was dark brown, and the creamy underrind represented the entire thickness of the cheese but was completely runny, descriptors indicative of an over ripened cheese. Statistical analysis showed that the best ripening conditions to achieve an optimum balance between cheese sensory qualities and marketability were 13±1°C and 94±1% RH. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  16. Correlating subjective and objective descriptors of ultra high molecular weight wear particles from total joint prostheses.

    PubMed

    McMullin, Brian T; Leung, Ming-Ying; Shanbhag, Arun S; McNulty, Donald; Mabrey, Jay D; Agrawal, C Mauli

    2006-02-01

    A total of 750 images of individual ultra-high molecular weight polyethylene (UHMWPE) particles isolated from periprosthetic failed hip, knee, and shoulder arthroplasties were extracted from archival scanning electron micrographs. Particle size and morphology was subsequently analyzed using computerized image analysis software utilizing five descriptors found in ASTM F1877-98, a standard for quantitative description of wear debris. An online survey application was developed to display particle images, and allowed ten respondents to classify particle morphologies according to commonly used terminology as fibers, flakes, or granules. Particles were categorized based on a simple majority of responses. All descriptors were evaluated using a one-way ANOVA and Tukey-Kramer test for all-pairs comparison among each class of particles. A logistic regression model using half of the particles included in the survey was then used to develop a mathematical scheme to predict whether a given particle should be classified as a fiber, flake, or granule based on its quantitative measurements. The validity of the model was then assessed using the other half of the survey particles and compared with human responses. Comparison of the quantitative measurements of isolated particles showed that the morphologies of each particle type classified by respondents were statistically different from one another (p<0.05). The average agreement between mathematical prediction and human respondents was 83.5% (standard error 0.16%). These data suggest that computerized descriptors can be feasibly correlated with subjective terminology, thus providing a basis for a common vocabulary for particle description which can be translated into quantitative dimensions.

  17. Correlating subjective and objective descriptors of ultra high molecular weight wear particles from total joint prostheses

    PubMed Central

    McMullin, Brian T.; Leung, Ming-Ying; Shanbhag, Arun S.; McNulty, Donald; Mabrey, Jay D.; Agrawal, C. Mauli

    2014-01-01

    A total of 750 images of individual ultra-high molecular weight polyethylene (UHMWPE) particles isolated from periprosthetic failed hip, knee, and shoulder arthroplasties were extracted from archival scanning electron micrographs. Particle size and morphology was subsequently analyzed using computerized image analysis software utilizing five descriptors found in ASTM F1877-98, a standard for quantitative description of wear debris. An online survey application was developed to display particle images, and allowed ten respondents to classify particle morphologies according to commonly used terminology as fibers, flakes, or granules. Particles were categorized based on a simple majority of responses. All descriptors were evaluated using a one-way ANOVA and Tukey–Kramer test for all-pairs comparison among each class of particles. A logistic regression model using half of the particles included in the survey was then used to develop a mathematical scheme to predict whether a given particle should be classified as a fiber, flake, or granule based on its quantitative measurements. The validity of the model was then assessed using the other half of the survey particles and compared with human responses. Comparison of the quantitative measurements of isolated particles showed that the morphologies of each particle type classified by respondents were statistically different from one another (po0:05). The average agreement between mathematical prediction and human respondents was 83.5% (standard error 0.16%). These data suggest that computerized descriptors can be feasibly correlated with subjective terminology, thus providing a basis for a common vocabulary for particle description which can be translated into quantitative dimensions. PMID:16112725

  18. An orientation sensitive approach in biomolecule interaction quantitative structure-activity relationship modeling and its application in ion-exchange chromatography.

    PubMed

    Kittelmann, Jörg; Lang, Katharina M H; Ottens, Marcel; Hubbuch, Jürgen

    2017-01-27

    Quantitative structure-activity relationship (QSAR) modeling for prediction of biomolecule parameters has become an established technique in chromatographic purification process design. Unfortunately available descriptor sets fail to describe the orientation of biomolecules and the effects of ionic strength in the mobile phase on the interaction with the stationary phase. The literature describes several special descriptors used for chromatographic retention modeling, all of these do not describe the screening of electrostatic potential by the mobile phase in use. In this work we introduce two new approaches of descriptor calculations, namely surface patches and plane projection, which capture an oriented binding to charged surfaces and steric hindrance of the interaction with chromatographic ligands with regard to electrostatic potential screening by mobile phase ions. We present the use of the developed descriptor sets for predictive modeling of Langmuir isotherms for proteins at different pH values between pH 5 and 10 and varying ionic strength in the range of 10-100mM. The resulting model has a high correlation of calculated descriptors and experimental results, with a coefficient of determination of 0.82 and a predictive coefficient of determination of 0.92 for unknown molecular structures and conditions. The agreement of calculated molecular interaction orientations with both, experimental results as well as molecular dynamic simulations from literature is shown. The developed descriptors provide the means for improved QSAR models of chromatographic processes, as they reflect the complex interactions of biomolecules with chromatographic phases. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Descriptors of Oxygen-Evolution Activity for Oxides: A Statistical Evaluation

    DOE PAGES

    Hong, Wesley T.; Welsch, Roy E.; Shao-Horn, Yang

    2015-12-16

    Catalysts for oxygen electrochemical processes are critical for the commercial viability of renewable energy storage and conversion devices such as fuel cells, artificial photosynthesis, and metal-air batteries. Transition metal oxides are an excellent system for developing scalable, non-noble-metal-based catalysts, especially for the oxygen evolution reaction (OER). Central to the rational design of novel catalysts is the development of quantitative structure-activity relation-ships, which correlate the desired catalytic behavior to structural and/or elemental descriptors of materials. The ultimate goal is to use these relationships to guide materials design. In this study, 101 intrinsic OER activities of 51 perovskites were compiled from fivemore » studies in literature and additional measurements made for this work. We explored the behavior and performance of 14 descriptors of the metal-oxygen bond strength using a number of statistical approaches, including factor analysis and linear regression models. We found that these descriptors can be classified into five descriptor families and identify electron occupancy and metal-oxygen covalency as the dominant influences on the OER activity. However, multiple descriptors still need to be considered in order to develop strong predictive relationships, largely outperforming the use of only one or two descriptors (as conventionally done in the field). Here, we confirmed that the number of d electrons, charge-transfer energy (covalency), and optimality of eg occupancy play the important roles, but found that structural factors such as M-O-M bond angle and tolerance factor are relevant as well. With these tools, we demonstrate how statistical learning can be used to draw novel physical insights and combined with data mining to rapidly screen OER electrocatalysts across a wide chemical space.« less

  20. Contemporary group estimates adjusted for climatic effects provide a finer definition of the unknown environmental challenges experienced by growing pigs.

    PubMed

    Guy, S Z Y; Li, L; Thomson, P C; Hermesch, S

    2017-12-01

    Environmental descriptors derived from mean performances of contemporary groups (CGs) are assumed to capture any known and unknown environmental challenges. The objective of this paper was to obtain a finer definition of the unknown challenges, by adjusting CG estimates for the known climatic effects of monthly maximum air temperature (MaxT), minimum air temperature (MinT) and monthly rainfall (Rain). As the unknown component could include infection challenges, these refined descriptors may help to better model varying responses of sire progeny to environmental infection challenges for the definition of disease resilience. Data were recorded from 1999 to 2013 at a piggery in south-east Queensland, Australia (n = 31,230). Firstly, CG estimates of average daily gain (ADG) and backfat (BF) were adjusted for MaxT, MinT and Rain, which were fitted as splines. In the models used to derive CG estimates for ADG, MaxT and MinT were significant variables. The models that contained these significant climatic variables had CG estimates with a lower variance compared to models without significant climatic variables. Variance component estimates were similar across all models, suggesting that these significant climatic variables accounted for some known environmental variation captured in CG estimates. No climatic variables were significant in the models used to derive the CG estimates for BF. These CG estimates were used to categorize environments. There was no observable sire by environment interaction (Sire×E) for ADG when using the environmental descriptors based on CG estimates on BF. For the environmental descriptors based on CG estimates of ADG, there was significant Sire×E only when MinT was included in the model (p = .01). Therefore, this new definition of the environment, preadjusted by MinT, increased the ability to detect Sire×E. While the unknown challenges captured in refined CG estimates need verification for infection challenges, this may provide a practical approach for the genetic improvement of disease resilience. © 2017 Blackwell Verlag GmbH.

  1. Development of QSAR models for predicting the binding affinity of endocrine disrupting chemicals to eight fish estrogen receptor.

    PubMed

    He, Junyi; Peng, Tao; Yang, Xianhai; Liu, Huihui

    2018-02-01

    Endocrine disrupting effect has become a central point of concern, and various biological mechanisms involve in the disruption of endocrine system. Recently, we have explored the mechanism of disrupting hormonal transport protein, through the binding affinity of sex hormone-binding globulin in different fish species. This study, serving as a companion article, focused on the mechanism of activating/inhibiting hormone receptor, by investigating the binding interaction of chemicals with the estrogen receptor (ER) of different fish species. We collected the relative binding affinity (RBA) of chemicals with 17β-estradiol binding to the ER of eight fish species. With this parameter as the endpoints, quantitative structure-activity relationship (QSAR) models were established using DRAGON descriptors. Statistical results indicated that the developed models had satisfactory goodness of fit, robustness and predictive ability. The Euclidean distance and Williams plot verified that these models had wide application domains, which covered a large number of structurally diverse chemicals. Based on the screened descriptors, we proposed an appropriate mechanism interpretation for the binding potency. Additionally, even though the same chemical had different affinities for ER from different fish species, the affinity of ER exhibited a high correlation for fish species within the same Order (i.e., Salmoniformes, Cypriniformes, Perciformes), which consistent with that in our previous study. Hence, when performing the endocrine disrupting effect assessment, the species diversity should be taken into account, but maybe the fish species in the same Order can be grouped together. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Interplay of heritage and habitat in the distribution of bacterial signal transduction systems.

    PubMed

    Galperin, Michael Y; Higdon, Roger; Kolker, Eugene

    2010-04-01

    Comparative analysis of the complete genome sequences from a variety of poorly studied organisms aims at predicting ecological and behavioral properties of these organisms and helping in characterizing their habitats. This task requires finding appropriate descriptors that could be correlated with the core traits of each system and would allow meaningful comparisons. Using the relatively simple bacterial models, first attempts have been made to introduce suitable metrics to describe the complexity of organism's signaling machinery, which included introducing the "bacterial IQ" score. Here, we use an updated census of prokaryotic signal transduction systems to improve this parameter and evaluate its consistency within selected bacterial phyla. We also introduce a more elaborate descriptor, a set of profiles of relative abundance of members of each family of signal transduction proteins encoded in each genome. We show that these family profiles are well conserved within each genus and are often consistent within families of bacteria. Thus, they reflect evolutionary relationships between organisms as well as individual adaptations of each organism to its specific ecological niche.

  3. Feature extraction and descriptor calculation methods for automatic georeferencing of Philippines' first microsatellite imagery

    NASA Astrophysics Data System (ADS)

    Tupas, M. E. A.; Dasallas, J. A.; Jiao, B. J. D.; Magallon, B. J. P.; Sempio, J. N. H.; Ramos, M. K. F.; Aranas, R. K. D.; Tamondong, A. M.

    2017-10-01

    The FAST-SIFT corner detector and descriptor extractor combination was used to automatically georeference DIWATA-1 Spaceborne Multispectral Imager images. Features from the Fast Accelerated Segment Test (FAST) algorithm detects corners or keypoints in an image, and these robustly detected keypoints have well-defined positions. Descriptors were computed using Scale-Invariant Feature Transform (SIFT) extractor. FAST-SIFT method effectively SMI same-subscene images detected by the NIR sensor. The method was also tested in stitching NIR images with varying subscene swept by the camera. The slave images were matched to the master image. The keypoints served as the ground control points. Random sample consensus was used to eliminate fall-out matches and ensure accuracy of the feature points from which the transformation parameters were derived. Keypoints are matched based on their descriptor vector. Nearest-neighbor matching is employed based on a metric distance between the descriptors. The metrics include Euclidean and city block, among others. Rough matching outputs not only the correct matches but also the faulty matches. A previous work in automatic georeferencing incorporates a geometric restriction. In this work, we applied a simplified version of the learning method. RANSAC was used to eliminate fall-out matches and ensure accuracy of the feature points. This method identifies if a point fits the transformation function and returns inlier matches. The transformation matrix was solved by Affine, Projective, and Polynomial models. The accuracy of the automatic georeferencing method were determined by calculating the RMSE of interest points, selected randomly, between the master image and transformed slave image.

  4. Implicit Shape Models for Object Detection in 3d Point Clouds

    NASA Astrophysics Data System (ADS)

    Velizhev, A.; Shapovalov, R.; Schindler, K.

    2012-07-01

    We present a method for automatic object localization and recognition in 3D point clouds representing outdoor urban scenes. The method is based on the implicit shape models (ISM) framework, which recognizes objects by voting for their center locations. It requires only few training examples per class, which is an important property for practical use. We also introduce and evaluate an improved version of the spin image descriptor, more robust to point density variation and uncertainty in normal direction estimation. Our experiments reveal a significant impact of these modifications on the recognition performance. We compare our results against the state-of-the-art method and get significant improvement in both precision and recall on the Ohio dataset, consisting of combined aerial and terrestrial LiDAR scans of 150,000 m2 of urban area in total.

  5. Global QSAR modeling of logP values of phenethylamines acting as adrenergic alpha-1 receptor agonists.

    PubMed

    Yadav, Mukesh; Joshi, Shobha; Nayarisseri, Anuraj; Jain, Anuja; Hussain, Aabid; Dubey, Tushar

    2013-06-01

    Global QSAR models predict biological response of molecular structures which are generic in particular class. A global QSAR dataset admits structural features derived from larger chemical space, intricate to model but more applicable in medicinal chemistry. The present work is global in either sense of structural diversity in QSAR dataset or large number of descriptor input. Forty phenethylamine structure derivatives were selected from a large pool (904) of similar phenethylamines available in Pubchem database. LogP values of selected candidates were collected from physical properties database (PHYSPROP) determined in identical set of conditions. Attempts to model logP value have produced significant QSAR models. MLR aided linear one-variable and two-variable QSAR models with their respective R(2) (0.866, 0.937), R(2)A (0.862, 0.932), F-stat (181.936, 199.812) and Standard Error (0.365, 0.255) are statistically fit and found predictive after internal validation and external validation. The descriptors chosen after improvisation and optimization reveal mechanistic part of work in terms of Verhaar model of Fish base-line toxicity from MLOGP, i.e. (BLTF96) and 3D-MoRSE -signal 15 /unweighted molecular descriptor calculated by summing atom weights viewed by a different angular scattering function (Mor15u) are crucial in regulation of logP values of phenethylamines.

  6. Quantitative structure-property relationship analysis for the retention index of fragrance-like compounds on a polar stationary phase.

    PubMed

    Rojas, Cristian; Duchowicz, Pablo R; Tripaldi, Piercosimo; Pis Diez, Reinaldo

    2015-11-27

    A quantitative structure-property relationship (QSPR) was developed for modeling the retention index of 1184 flavor and fragrance compounds measured using a Carbowax 20M glass capillary gas chromatography column. The 4885 molecular descriptors were calculated using Dragon software, and then were simultaneously analyzed through multivariable linear regression analysis using the replacement method (RM) variable subset selection technique. We proceeded in three steps, the first one by considering all descriptor blocks, the second one by excluding conformational descriptor blocks, and the last one by analyzing only 3D-descriptor families. The models were validated through an external test set of compounds. Cross-validation methods such as leave-one-out and leave-many-out were applied, together with Y-randomization and applicability domain analysis. The developed model was used to estimate the I of a set of 22 molecules. The results clearly suggest that 3D-descriptors do not offer relevant information for modeling the retention index, while a topological index such as the Randić-like index from reciprocal squared distance matrix has a high relevance for this purpose. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Health sciences descriptors in the brazilian speech-language and hearing science.

    PubMed

    Campanatti-Ostiz, Heliane; Andrade, Claudia Regina Furquim de

    2010-01-01

    Terminology in Speech-Language and Hearing Science. To propose a specific thesaurus about the Speech-Language and Hearing Science, for the English, Portuguese and Spanish languages, based on the existing keywords available on the Health Sciences Descriptors (DeCS). Methodology was based on the pilot study developed by Campanatti-Ostiz and Andrade; that had as a purpose to verify the methodological viability for the creation of a Speech-Language and Hearing Science category in the DeCS. The scientific journals selected for analyses of the titles, abstracts and keywords of all scientific articles were those in the field of the Speech-Language and Hearing Science, indexed on the SciELO. 1. Recovery of the Descriptors in the English language (Medical Subject Headings--MeSH); 2. Recovery and hierarchic organization of the descriptors in the Portuguese language was done (DeCS). The obtained data was analyzed as follows: descriptive analyses and relative relevance analyses of the DeCS areas. Based on the first analyses, we decided to select all 761 descriptors, with all the hierarchic numbers, independently of their occurrence (occurrence number--ON), and based on the second analyses, we decided to propose to exclude the less relevant areas and the exclusive DeCS areas. The proposal was finished with a total of 1676 occurrences of DeCS descriptors, distributed in the following areas: Anatomy; Diseases; Analytical, Diagnostic and Therapeutic Techniques and Equipments; Psychiatry and Psychology; Phenomena and Processes; Health Care. The presented proposal of a thesaurus contains the specific terminology of the Brazilian Speech-Language and Hearing Sciences and reflects the descriptors of the published scientific production. Being the DeCS a trilingual vocabulary (Portuguese, English and Spanish), the present descriptors organization proposition can be used in these three languages, allowing greater cultural interchange between different nations.

  8. Gun bore flaw image matching based on improved SIFT descriptor

    NASA Astrophysics Data System (ADS)

    Zeng, Luan; Xiong, Wei; Zhai, You

    2013-01-01

    In order to increase the operation speed and matching ability of SIFT algorithm, the SIFT descriptor and matching strategy are improved. First, a method of constructing feature descriptor based on sector area is proposed. By computing the gradients histogram of location bins which are parted into 6 sector areas, a descriptor with 48 dimensions is constituted. It can reduce the dimension of feature vector and decrease the complexity of structuring descriptor. Second, it introduce a strategy that partitions the circular region into 6 identical sector areas starting from the dominate orientation. Consequently, the computational complexity is reduced due to cancellation of rotation operation for the area. The experimental results indicate that comparing with the OpenCV SIFT arithmetic, the average matching speed of the new method increase by about 55.86%. The matching veracity can be increased even under some variation of view point, illumination, rotation, scale and out of focus. The new method got satisfied results in gun bore flaw image matching. Keywords: Metrology, Flaw image matching, Gun bore, Feature descriptor

  9. Application of the artificial neural network in quantitative structure-gradient elution retention relationship of phenylthiocarbamyl amino acids derivatives.

    PubMed

    Tham, S Y; Agatonovic-Kustrin, S

    2002-05-15

    Quantitative structure-retention relationship(QSRR) method was used to model reversed-phase high-performance liquid chromatography (RP-HPLC) separation of 18 selected amino acids. Retention data for phenylthiocarbamyl (PTC) amino acids derivatives were obtained using gradient elution on ODS column with mobile phase of varying acetonitrile, acetate buffer and containing 0.5 ml/l of triethylamine (TEA). Molecular structure of each amino acid was encoded with 36 calculated molecular descriptors. The correlation between the molecular descriptors and the retention time of the compounds in the calibration set was established using the genetic neural network method. A genetic algorithm (GA) was used to select important molecular descriptors and supervised artificial neural network (ANN) was used to correlate mobile phase composition and selected descriptors with the experimentally derived retention times. Retention time values were used as the network's output and calculated molecular descriptors and mobile phase composition as the inputs. The best model with five input descriptors was chosen, and the significance of the selected descriptors for amino acid separation was examined. Results confirmed the dominant role of the organic modifier in such chromatographic systems in addition to lipophilicity (log P) and molecular size and shape (topological indices) of investigated solutes.

  10. Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.

    PubMed

    Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu

    2016-08-01

    This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Characterization of Mixtures. Part 2: QSPR Models for Prediction of Excess Molar Volume and Liquid Density Using Neural Networks.

    PubMed

    Ajmani, Subhash; Rogers, Stephen C; Barley, Mark H; Burgess, Andrew N; Livingstone, David J

    2010-09-17

    In our earlier work, we have demonstrated that it is possible to characterize binary mixtures using single component descriptors by applying various mixing rules. We also showed that these methods were successful in building predictive QSPR models to study various mixture properties of interest. Here in, we developed a QSPR model of an excess thermodynamic property of binary mixtures i.e. excess molar volume (V(E) ). In the present study, we use a set of mixture descriptors which we earlier designed to specifically account for intermolecular interactions between the components of a mixture and applied successfully to the prediction of infinite-dilution activity coefficients using neural networks (part 1 of this series). We obtain a significant QSPR model for the prediction of excess molar volume (V(E) ) using consensus neural networks and five mixture descriptors. We find that hydrogen bond and thermodynamic descriptors are the most important in determining excess molar volume (V(E) ), which is in line with the theory of intermolecular forces governing excess mixture properties. The results also suggest that the mixture descriptors utilized herein may be sufficient to model a wide variety of properties of binary and possibly even more complex mixtures. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. A Data Driven Model for Predicting RNA-Protein Interactions based on Gradient Boosting Machine.

    PubMed

    Jain, Dharm Skandh; Gupte, Sanket Rajan; Aduri, Raviprasad

    2018-06-22

    RNA protein interactions (RPI) play a pivotal role in the regulation of various biological processes. Experimental validation of RPI has been time-consuming, paving the way for computational prediction methods. The major limiting factor of these methods has been the accuracy and confidence of the predictions, and our in-house experiments show that they fail to accurately predict RPI involving short RNA sequences such as TERRA RNA. Here, we present a data-driven model for RPI prediction using a gradient boosting classifier. Amino acids and nucleotides are classified based on the high-resolution structural data of RNA protein complexes. The minimum structural unit consisting of five residues is used as the descriptor. Comparative analysis of existing methods shows the consistently higher performance of our method irrespective of the length of RNA present in the RPI. The method has been successfully applied to map RPI networks involving both long noncoding RNA as well as TERRA RNA. The method is also shown to successfully predict RNA and protein hubs present in RPI networks of four different organisms. The robustness of this method will provide a way for predicting RPI networks of yet unknown interactions for both long noncoding RNA and microRNA.

  13. Developing an instrument to assess the endoscopic severity of ulcerative colitis: the Ulcerative Colitis Endoscopic Index of Severity (UCEIS).

    PubMed

    Travis, Simon P L; Schnell, Dan; Krzeski, Piotr; Abreu, Maria T; Altman, Douglas G; Colombel, Jean-Frédéric; Feagan, Brian G; Hanauer, Stephen B; Lémann, Marc; Lichtenstein, Gary R; Marteau, Phillippe R; Reinisch, Walter; Sands, Bruce E; Yacyshyn, Bruce R; Bernhardt, Christian A; Mary, Jean-Yves; Sandborn, William J

    2012-04-01

    Variability in endoscopic assessment necessitates rigorous investigation of descriptors for scoring severity of ulcerative colitis (UC). To evaluate variation in the overall endoscopic assessment of severity, the intra- and interindividual variation of descriptive terms and to create an Ulcerative Colitis Endoscopic Index of Severity which could be validated. A two-phase study used a library of 670 video sigmoidoscopies from patients with Mayo Clinic scores 0-11, supplemented by 10 videos from five people without UC and five hospitalised patients with acute severe UC. In phase 1, each of 10 investigators viewed 16/24 videos to assess agreement on the Baron score with a central reader and agreed definitions of 10 endoscopic descriptors. In phase 2, each of 30 different investigators rated 25/60 different videos for the descriptors and assessed overall severity on a 0-100 visual analogue scale. κ Statistics tested inter- and intraobserver variability for each descriptor. A general linear mixed regression model based on logit link and β distribution of variance was used to predict overall endoscopic severity from descriptors. There was 76% agreement for 'severe', but 27% agreement for 'normal' appearances between phase I investigators and the central reader. In phase 2, weighted κ values ranged from 0.34 to 0.65 and 0.30 to 0.45 within and between observers for the 10 descriptors. The final model incorporated vascular pattern, (normal/patchy/complete obliteration) bleeding (none/mucosal/luminal mild/luminal moderate or severe), erosions and ulcers (none/erosions/superficial/deep), each with precise definitions, which explained 90% of the variance (pR(2), Akaike Information Criterion) in the overall assessment of endoscopic severity, predictions varying from 4 to 93 on a 100-point scale (from normal to worst endoscopic severity). The Ulcerative Colitis Endoscopic Index of Severity accurately predicts overall assessment of endoscopic severity of UC. Validity and responsiveness need further testing before it can be applied as an outcome measure in clinical trials or clinical practice.

  14. Bio-AIMS Collection of Chemoinformatics Web Tools based on Molecular Graph Information and Artificial Intelligence Models.

    PubMed

    Munteanu, Cristian R; Gonzalez-Diaz, Humberto; Garcia, Rafael; Loza, Mabel; Pazos, Alejandro

    2015-01-01

    The molecular information encoding into molecular descriptors is the first step into in silico Chemoinformatics methods in Drug Design. The Machine Learning methods are a complex solution to find prediction models for specific biological properties of molecules. These models connect the molecular structure information such as atom connectivity (molecular graphs) or physical-chemical properties of an atom/group of atoms to the molecular activity (Quantitative Structure - Activity Relationship, QSAR). Due to the complexity of the proteins, the prediction of their activity is a complicated task and the interpretation of the models is more difficult. The current review presents a series of 11 prediction models for proteins, implemented as free Web tools on an Artificial Intelligence Model Server in Biosciences, Bio-AIMS (http://bio-aims.udc.es/TargetPred.php). Six tools predict protein activity, two models evaluate drug - protein target interactions and the other three calculate protein - protein interactions. The input information is based on the protein 3D structure for nine models, 1D peptide amino acid sequence for three tools and drug SMILES formulas for two servers. The molecular graph descriptor-based Machine Learning models could be useful tools for in silico screening of new peptides/proteins as future drug targets for specific treatments.

  15. GTM-Based QSAR Models and Their Applicability Domains.

    PubMed

    Gaspar, H A; Baskin, I I; Marcou, G; Horvath, D; Varnek, A

    2015-06-01

    In this paper we demonstrate that Generative Topographic Mapping (GTM), a machine learning method traditionally used for data visualisation, can be efficiently applied to QSAR modelling using probability distribution functions (PDF) computed in the latent 2-dimensional space. Several different scenarios of the activity assessment were considered: (i) the "activity landscape" approach based on direct use of PDF, (ii) QSAR models involving GTM-generated on descriptors derived from PDF, and, (iii) the k-Nearest Neighbours approach in 2D latent space. Benchmarking calculations were performed on five different datasets: stability constants of metal cations Ca(2+) , Gd(3+) and Lu(3+) complexes with organic ligands in water, aqueous solubility and activity of thrombin inhibitors. It has been shown that the performance of GTM-based regression models is similar to that obtained with some popular machine-learning methods (random forest, k-NN, M5P regression tree and PLS) and ISIDA fragment descriptors. By comparing GTM activity landscapes built both on predicted and experimental activities, we may visually assess the model's performance and identify the areas in the chemical space corresponding to reliable predictions. The applicability domain used in this work is based on data likelihood. Its application has significantly improved the model performances for 4 out of 5 datasets. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Describing contrast across scales

    NASA Astrophysics Data System (ADS)

    Syed, Sohaib Ali; Iqbal, Muhammad Zafar; Riaz, Muhammad Mohsin

    2017-06-01

    Due to its sensitive nature against illumination and noise distributions, contrast is not widely used for image description. On the contrary, the human perception of contrast along different spatial frequency bandwidths provides a powerful discriminator function that can be modeled in a robust manner against local illumination. Based upon this observation, a dense local contrast descriptor is proposed and its potential in different applications of computer vision is discussed. Extensive experiments reveal that this simple yet effective description performs well in comparison with state of the art image descriptors. We also show the importance of this description in multiresolution pansharpening framework.

  17. Harmony Search as a Powerful Tool for Feature Selection in QSPR Study of the Drugs Lipophilicity.

    PubMed

    Bahadori, Behnoosh; Atabati, Morteza

    2017-01-01

    Aims & Scope: Lipophilicity represents one of the most studied and most frequently used fundamental physicochemical properties. In the present work, harmony search (HS) algorithm is suggested to feature selection in quantitative structure-property relationship (QSPR) modeling to predict lipophilicity of neutral, acidic, basic and amphotheric drugs that were determined by UHPLC. Harmony search is a music-based metaheuristic optimization algorithm. It was affected by the observation that the aim of music is to search for a perfect state of harmony. Semi-empirical quantum-chemical calculations at AM1 level were used to find the optimum 3D geometry of the studied molecules and variant descriptors (1497 descriptors) were calculated by the Dragon software. The selected descriptors by harmony search algorithm (9 descriptors) were applied for model development using multiple linear regression (MLR). In comparison with other feature selection methods such as genetic algorithm and simulated annealing, harmony search algorithm has better results. The root mean square error (RMSE) with and without leave-one out cross validation (LOOCV) were obtained 0.417 and 0.302, respectively. The results were compared with those obtained from the genetic algorithm and simulated annealing methods and it showed that the HS is a helpful tool for feature selection with fine performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Characterizing region of interest in image using MPEG-7 visual descriptors

    NASA Astrophysics Data System (ADS)

    Ryu, Min-Sung; Park, Soo-Jun; Won, Chee Sun

    2005-08-01

    In this paper, we propose a region-based image retrieval system using EHD (Edge Histogram Descriptor) and CLD (Color Layout Descriptor) of MPEG-7 descriptors. The combined descriptor can efficiently describe edge and color features in terms of sub-image regions. That is, the basic unit for the selection of the region-of-interest (ROI) in the image is the sub-image block of the EHD, which corresponds to 16 (i.e., 4x4) non-overlapping image blocks in the image space. This implies that, to have a one-to-one region correspondence between EHD and CLD, we need to take an 8x8 inverse DCT (IDCT) for the CLD. Experimental results show that the proposed retrieval scheme can be used for image retrieval with the ROI based image retrieval for MPEG-7 indexed images.

  19. Local Multi-Grouped Binary Descriptor With Ring-Based Pooling Configuration and Optimization.

    PubMed

    Gao, Yongqiang; Huang, Weilin; Qiao, Yu

    2015-12-01

    Local binary descriptors are attracting increasingly attention due to their great advantages in computational speed, which are able to achieve real-time performance in numerous image/vision applications. Various methods have been proposed to learn data-dependent binary descriptors. However, most existing binary descriptors aim overly at computational simplicity at the expense of significant information loss which causes ambiguity in similarity measure using Hamming distance. In this paper, by considering multiple features might share complementary information, we present a novel local binary descriptor, referred as ring-based multi-grouped descriptor (RMGD), to successfully bridge the performance gap between current binary and floated-point descriptors. Our contributions are twofold. First, we introduce a new pooling configuration based on spatial ring-region sampling, allowing for involving binary tests on the full set of pairwise regions with different shapes, scales, and distances. This leads to a more meaningful description than the existing methods which normally apply a limited set of pooling configurations. Then, an extended Adaboost is proposed for an efficient bit selection by emphasizing high variance and low correlation, achieving a highly compact representation. Second, the RMGD is computed from multiple image properties where binary strings are extracted. We cast multi-grouped features integration as rankSVM or sparse support vector machine learning problem, so that different features can compensate strongly for each other, which is the key to discriminativeness and robustness. The performance of the RMGD was evaluated on a number of publicly available benchmarks, where the RMGD outperforms the state-of-the-art binary descriptors significantly.

  20. Reduced density gradient as a novel approach for estimating QSAR descriptors, and its application to 1, 4-dihydropyridine derivatives with potential antihypertensive effects.

    PubMed

    Jardínez, Christiaan; Vela, Alberto; Cruz-Borbolla, Julián; Alvarez-Mendez, Rodrigo J; Alvarado-Rodríguez, José G

    2016-12-01

    The relationship between the chemical structure and biological activity (log IC 50 ) of 40 derivatives of 1,4-dihydropyridines (DHPs) was studied using density functional theory (DFT) and multiple linear regression analysis methods. With the aim of improving the quantitative structure-activity relationship (QSAR) model, the reduced density gradient s( r) of the optimized equilibrium geometries was used as a descriptor to include weak non-covalent interactions. The QSAR model highlights the correlation between the log IC 50 with highest molecular orbital energy (E HOMO ), molecular volume (V), partition coefficient (log P), non-covalent interactions NCI(H4-G) and the dual descriptor [Δf(r)]. The model yielded values of R 2 =79.57 and Q 2 =69.67 that were validated with the next four internal analytical validations DK=0.076, DQ=-0.006, R P =0.056, and R N =0.000, and the external validation Q 2 boot =64.26. The QSAR model found can be used to estimate biological activity with high reliability in new compounds based on a DHP series. Graphical abstract The good correlation between the log IC 50 with the NCI (H4-G) estimated by the reduced density gradient approach of the DHP derivatives.

  1. Effect of substituents on prediction of TLC retention of tetra-dentate Schiff bases and their Copper(II) and Nickel(II) complexes.

    PubMed

    Stevanović, Nikola R; Perušković, Danica S; Gašić, Uroš M; Antunović, Vesna R; Lolić, Aleksandar Đ; Baošić, Rada M

    2017-03-01

    The objectives of this study were to gain insights into structure-retention relationships and to propose the model to estimating their retention. Chromatographic investigation of series of 36 Schiff bases and their copper(II) and nickel(II) complexes was performed under both normal- and reverse-phase conditions. Chemical structures of the compounds were characterized by molecular descriptors which are calculated from the structure and related to the chromatographic retention parameters by multiple linear regression analysis. Effects of chelation on retention parameters of investigated compounds, under normal- and reverse-phase chromatographic conditions, were analyzed by principal component analysis, quantitative structure-retention relationship and quantitative structure-activity relationship models were developed on the basis of theoretical molecular descriptors, calculated exclusively from molecular structure, and parameters of retention and lipophilicity. Copyright © 2016 John Wiley & Sons, Ltd.

  2. QSAR models for anti-malarial activity of 4-aminoquinolines.

    PubMed

    Masand, Vijay H; Toropov, Andrey A; Toropova, Alla P; Mahajan, Devidas T

    2014-03-01

    In the present study, predictive quantitative structure - activity relationship (QSAR) models for anti-malarial activity of 4-aminoquinolines have been developed. CORAL, which is freely available on internet (http://www.insilico.eu/coral), has been used as a tool of QSAR analysis to establish statistically robust QSAR model of anti-malarial activity of 4-aminoquinolines. Six random splits into the visible sub-system of the training and invisible subsystem of validation were examined. Statistical qualities for these splits vary, but in all these cases, statistical quality of prediction for anti-malarial activity was quite good. The optimal SMILES-based descriptor was used to derive the single descriptor based QSAR model for a data set of 112 aminoquinolones. All the splits had r(2)> 0.85 and r(2)> 0.78 for subtraining and validation sets, respectively. The three parametric multilinear regression (MLR) QSAR model has Q(2) = 0.83, R(2) = 0.84 and F = 190.39. The anti-malarial activity has strong correlation with presence/absence of nitrogen and oxygen at a topological distance of six.

  3. Improving Large-Scale Image Retrieval Through Robust Aggregation of Local Descriptors.

    PubMed

    Husain, Syed Sameed; Bober, Miroslaw

    2017-09-01

    Visual search and image retrieval underpin numerous applications, however the task is still challenging predominantly due to the variability of object appearance and ever increasing size of the databases, often exceeding billions of images. Prior art methods rely on aggregation of local scale-invariant descriptors, such as SIFT, via mechanisms including Bag of Visual Words (BoW), Vector of Locally Aggregated Descriptors (VLAD) and Fisher Vectors (FV). However, their performance is still short of what is required. This paper presents a novel method for deriving a compact and distinctive representation of image content called Robust Visual Descriptor with Whitening (RVD-W). It significantly advances the state of the art and delivers world-class performance. In our approach local descriptors are rank-assigned to multiple clusters. Residual vectors are then computed in each cluster, normalized using a direction-preserving normalization function and aggregated based on the neighborhood rank. Importantly, the residual vectors are de-correlated and whitened in each cluster before aggregation, leading to a balanced energy distribution in each dimension and significantly improved performance. We also propose a new post-PCA normalization approach which improves separability between the matching and non-matching global descriptors. This new normalization benefits not only our RVD-W descriptor but also improves existing approaches based on FV and VLAD aggregation. Furthermore, we show that the aggregation framework developed using hand-crafted SIFT features also performs exceptionally well with Convolutional Neural Network (CNN) based features. The RVD-W pipeline outperforms state-of-the-art global descriptors on both the Holidays and Oxford datasets. On the large scale datasets, Holidays1M and Oxford1M, SIFT-based RVD-W representation obtains a mAP of 45.1 and 35.1 percent, while CNN-based RVD-W achieve a mAP of 63.5 and 44.8 percent, all yielding superior performance to the state-of-the-art.

  4. Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets.

    PubMed

    García-Jacas, César R; Contreras-Torres, Ernesto; Marrero-Ponce, Yovani; Pupo-Meriño, Mario; Barigye, Stephen J; Cabrera-Leyva, Lisset

    2016-01-01

    Recently, novel 3D alignment-free molecular descriptors (also known as QuBiLS-MIDAS) based on two-linear, three-linear and four-linear algebraic forms have been introduced. These descriptors codify chemical information for relations between two, three and four atoms by using several (dis-)similarity metrics and multi-metrics. Several studies aimed at assessing the quality of these novel descriptors have been performed. However, a deeper analysis of their performance is necessary. Therefore, in the present manuscript an assessment and statistical validation of the performance of these novel descriptors in QSAR studies is performed. To this end, eight molecular datasets (angiotensin converting enzyme, acetylcholinesterase inhibitors, benzodiazepine receptor, cyclooxygenase-2 inhibitors, dihydrofolate reductase inhibitors, glycogen phosphorylase b, thermolysin inhibitors, thrombin inhibitors) widely used as benchmarks in the evaluation of several procedures are utilized. Three to nine variable QSAR models based on Multiple Linear Regression are built for each chemical dataset according to the original division into training/test sets. Comparisons with respect to leave-one-out cross-validation correlation coefficients[Formula: see text] reveal that the models based on QuBiLS-MIDAS indices possess superior predictive ability in 7 of the 8 datasets analyzed, outperforming methodologies based on similar or more complex techniques such as: Partial Least Square, Neural Networks, Support Vector Machine and others. On the other hand, superior external correlation coefficients[Formula: see text] are attained in 6 of the 8 test sets considered, confirming the good predictive power of the obtained models. For the [Formula: see text] values non-parametric statistic tests were performed, which demonstrated that the models based on QuBiLS-MIDAS indices have the best global performance and yield significantly better predictions in 11 of the 12 QSAR procedures used in the comparison. Lastly, a study concerning to the performance of the indices according to several conformer generation methods was performed. This demonstrated that the quality of predictions of the QSAR models based on QuBiLS-MIDAS indices depend on 3D structure generation method considered, although in this preliminary study the results achieved do not present significant statistical differences among them. As conclusions it can be stated that the QuBiLS-MIDAS indices are suitable for extracting structural information of the molecules and thus, constitute a promissory alternative to build models that contribute to the prediction of pharmacokinetic, pharmacodynamics and toxicological properties on novel compounds.Graphical abstractComparative graphical representation of the performance of the novel QuBiLS-MIDAS 3D-MDs with respect to other methodologies in QSAR modeling of eight chemical datasets.

  5. A contour-based shape descriptor for biomedical image classification and retrieval

    NASA Astrophysics Data System (ADS)

    You, Daekeun; Antani, Sameer; Demner-Fushman, Dina; Thoma, George R.

    2013-12-01

    Contours, object blobs, and specific feature points are utilized to represent object shapes and extract shape descriptors that can then be used for object detection or image classification. In this research we develop a shape descriptor for biomedical image type (or, modality) classification. We adapt a feature extraction method used in optical character recognition (OCR) for character shape representation, and apply various image preprocessing methods to successfully adapt the method to our application. The proposed shape descriptor is applied to radiology images (e.g., MRI, CT, ultrasound, X-ray, etc.) to assess its usefulness for modality classification. In our experiment we compare our method with other visual descriptors such as CEDD, CLD, Tamura, and PHOG that extract color, texture, or shape information from images. The proposed method achieved the highest classification accuracy of 74.1% among all other individual descriptors in the test, and when combined with CSD (color structure descriptor) showed better performance (78.9%) than using the shape descriptor alone.

  6. Quantitative structure-activity relationships of the antimalarial agent artemisinin and some of its derivatives - a DFT approach.

    PubMed

    Rajkhowa, Sanchaita; Hussain, Iftikar; Hazarika, Kalyan K; Sarmah, Pubalee; Deka, Ramesh Chandra

    2013-09-01

    Artemisinin form the most important class of antimalarial agents currently available, and is a unique sesquiterpene peroxide occurring as a constituent of Artemisia annua. Artemisinin is effectively used in the treatment of drug-resistant Plasmodium falciparum and because of its rapid clearance of cerebral malaria, many clinically useful semisynthetic drugs for severe and complicated malaria have been developed. However, one of the major disadvantages of using artemisinins is their poor solubility either in oil or water and therefore, in order to overcome this difficulty many derivatives of artemisinin were prepared. A comparative study on the chemical reactivity of artemisinin and some of its derivatives is performed using density functional theory (DFT) calculations. DFT based global and local reactivity descriptors, such as hardness, chemical potential, electrophilicity index, Fukui function, and local philicity calculated at the optimized geometries are used to investigate the usefulness of these descriptors for understanding the reactive nature and reactive sites of the molecules. Multiple regression analysis is applied to build up a quantitative structure-activity relationship (QSAR) model based on the DFT based descriptors against the chloroquine-resistant, mefloquine-sensitive Plasmodium falciparum W-2 clone.

  7. QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations.

    PubMed

    Valdés-Martiní, José R; Marrero-Ponce, Yovani; García-Jacas, César R; Martinez-Mayorga, Karina; Barigye, Stephen J; Vaz d'Almeida, Yasser Silveira; Pham-The, Hai; Pérez-Giménez, Facundo; Morell, Carlos A

    2017-06-07

    In previous reports, Marrero-Ponce et al. proposed algebraic formalisms for characterizing topological (2D) and chiral (2.5D) molecular features through atom- and bond-based ToMoCoMD-CARDD (acronym for Topological Molecular Computational Design-Computer Aided Rational Drug Design) molecular descriptors. These MDs codify molecular information based on the bilinear, quadratic and linear algebraic forms and the graph-theoretical electronic-density and edge-adjacency matrices in order to consider atom- and bond-based relations, respectively. These MDs have been successfully applied in the screening of chemical compounds of different therapeutic applications ranging from antimalarials, antibacterials, tyrosinase inhibitors and so on. To compute these MDs, a computational program with the same name was initially developed. However, this in house software barely offered the functionalities required in contemporary molecular modeling tasks, in addition to the inherent limitations that made its usability impractical. Therefore, the present manuscript introduces the QuBiLS-MAS (acronym for Quadratic, Bilinear and N-Linear mapS based on graph-theoretic electronic-density Matrices and Atomic weightingS) software designed to compute topological (0-2.5D) molecular descriptors based on bilinear, quadratic and linear algebraic forms for atom- and bond-based relations. The QuBiLS-MAS module was designed as standalone software, in which extensions and generalizations of the former ToMoCoMD-CARDD 2D-algebraic indices are implemented, considering the following aspects: (a) two new matrix normalization approaches based on double-stochastic and mutual probability formalisms; (b) topological constraints (cut-offs) to take into account particular inter-atomic relations; (c) six additional atomic properties to be used as weighting schemes in the calculation of the molecular vectors; (d) four new local-fragments to consider molecular regions of interest; (e) number of lone-pair electrons in chemical structure defined by diagonal coefficients in matrix representations; and (f) several aggregation operators (invariants) applied over atom/bond-level descriptors in order to compute global indices. This software permits the parallel computation of the indices, contains a batch processing module and data curation functionalities. This program was developed in Java v1.7 using the Chemistry Development Kit library (version 1.4.19). The QuBiLS-MAS software consists of two components: a desktop interface (GUI) and an API library allowing for the easy integration of the latter in chemoinformatics applications. The relevance of the novel extensions and generalizations implemented in this software is demonstrated through three studies. Firstly, a comparative Shannon's entropy based variability study for the proposed QuBiLS-MAS and the DRAGON indices demonstrates superior performance for the former. A principal component analysis reveals that the QuBiLS-MAS approach captures chemical information orthogonal to that codified by the DRAGON descriptors. Lastly, a QSAR study for the binding affinity to the corticosteroid-binding globulin using Cramer's steroid dataset is carried out. From these analyses, it is revealed that the QuBiLS-MAS approach for atom-pair relations yields similar-to-superior performance with regard to other QSAR methodologies reported in the literature. Therefore, the QuBiLS-MAS approach constitutes a useful tool for the diversity analysis of chemical compound datasets and high-throughput screening of structure-activity data.

  8. QSPR model for bioconcentration factors of nonpolar organic compounds using molecular electronegativity distance vector descriptors.

    PubMed

    Qin, Li-Tang; Liu, Shu-Shen; Liu, Hai-Ling

    2010-02-01

    A five-variable model (model M2) was developed for the bioconcentration factors (BCFs) of nonpolar organic compounds (NPOCs) by using molecular electronegativity distance vector (MEDV) to characterize the structures of NPOCs and variable selection and modeling based on prediction (VSMP) to select the optimum descriptors. The estimated correlation coefficient (r (2)) and the leave-one-out cross-validation correlation coefficients (q (2)) of model M2 were 0.9271 and 0.9171, respectively. The model was externally validated by splitting the whole data set into a representative training set of 85 chemicals and a validation set of 29 chemicals. The results show that the main structural factors influencing the BCFs of NPOCs are -cCc, cCcc, -Cl, and -Br (where "-" refers to a single bond and "c" refers to a conjugated bond). The quantitative structure-property relationship (QSPR) model can effectively predict the BCFs of NPOCs, and the predictions of the model can also extend the current BCF database of experimental values.

  9. Fragment-based prediction of skin sensitization using recursive partitioning

    NASA Astrophysics Data System (ADS)

    Lu, Jing; Zheng, Mingyue; Wang, Yong; Shen, Qiancheng; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

    2011-09-01

    Skin sensitization is an important toxic endpoint in the risk assessment of chemicals. In this paper, structure-activity relationships analysis was performed on the skin sensitization potential of 357 compounds with local lymph node assay data. Structural fragments were extracted by GASTON (GrAph/Sequence/Tree extractiON) from the training set. Eight fragments with accuracy significantly higher than 0.73 ( p < 0.1) were retained to make up an indicator descriptor fragment. The fragment descriptor and eight other physicochemical descriptors closely related to the endpoint were calculated to construct the recursive partitioning tree (RP tree) for classification. The balanced accuracy of the training set, test set I, and test set II in the leave-one-out model were 0.846, 0.800, and 0.809, respectively. The results highlight that fragment-based RP tree is a preferable method for identifying skin sensitizers. Moreover, the selected fragments provide useful structural information for exploring sensitization mechanisms, and RP tree creates a graphic tree to identify the most important properties associated with skin sensitization. They can provide some guidance for designing of drugs with lower sensitization level.

  10. Recognizing stationary and locomotion activities using combinational of spectral analysis with statistical descriptors features

    NASA Astrophysics Data System (ADS)

    Zainudin, M. N. Shah; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran

    2017-10-01

    Prior knowledge in pervasive computing recently garnered a lot of attention due to its high demand in various application domains. Human activity recognition (HAR) considered as the applications that are widely explored by the expertise that provides valuable information to the human. Accelerometer sensor-based approach is utilized as devices to undergo the research in HAR since their small in size and this sensor already build-in in the various type of smartphones. However, the existence of high inter-class similarities among the class tends to degrade the recognition performance. Hence, this work presents the method for activity recognition using our proposed features from combinational of spectral analysis with statistical descriptors that able to tackle the issue of differentiating stationary and locomotion activities. The noise signal is filtered using Fourier Transform before it will be extracted using two different groups of features, spectral frequency analysis, and statistical descriptors. Extracted signal later will be classified using random forest ensemble classifier models. The recognition results show the good accuracy performance for stationary and locomotion activities based on USC HAD datasets.

  11. Quantitative structure-retention relationship studies with immobilized artificial membrane chromatography II: partial least squares regression.

    PubMed

    Li, Jie; Sun, Jin; He, Zhonggui

    2007-01-26

    We aimed to establish quantitative structure-retention relationship (QSRR) with immobilized artificial membrane (IAM) chromatography using easily understood and obtained physicochemical molecular descriptors and to elucidate which descriptors are critical to affect the interaction process between solutes and immobilized phospholipid membranes. The retention indices (logk(IAM)) of 55 structurally diverse drugs were determined on an immobilized artificial membrane column (IAM.PC.DD2) directly or obtained by extrapolation method for highly hydrophobic compounds. Ten simple physicochemical property descriptors (clogP, rings, rotatory bond, hydro-bond counting, etc.) of these drugs were collected and used to establish QSRR and predict the retention data by partial least squares regression (PLSR). Five descriptors, clogP, rotatory bond (RotB), rings, molecular weight (MW) and total surface area (TSA), were reserved by using the Variable Importance for Projection (VIP) values as criterion to build the final PLSR model. An external test set was employed to verify the QSRR based on the training set with the five variables, and QSRR by PLSR exhibited a satisfying predictive ability with R(p)=0.902 and RMSE(p)=0.400. Comparison of coefficients of centered and scaled variables by PLSR demonstrated that, for the descriptors studied, clogP and TSA have the most significant positive effect but the rotatable bond has significant negative effect on drug IAM chromatographic retention.

  12. A New Shape Description Method Using Angular Radial Transform

    NASA Astrophysics Data System (ADS)

    Lee, Jong-Min; Kim, Whoi-Yul

    Shape is one of the primary low-level image features in content-based image retrieval. In this paper we propose a new shape description method that consists of a rotationally invariant angular radial transform descriptor (IARTD). The IARTD is a feature vector that combines the magnitude and aligned phases of the angular radial transform (ART) coefficients. A phase correction scheme is employed to produce the aligned phase so that the IARTD is invariant to rotation. The distance between two IARTDs is defined by combining differences in the magnitudes and aligned phases. In an experiment using the MPEG-7 shape dataset, the proposed method outperforms existing methods; the average BEP of the proposed method is 57.69%, while the average BEPs of the invariant Zernike moments descriptor and the traditional ART are 41.64% and 36.51%, respectively.

  13. MO-C-17A-04: Forecasting Longitudinal Changes in Oropharyngeal Tumor Morphology Throughout the Course of Head and Neck Radiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yock, A; UT Graduate School of Biomedical Sciences, Houston, TX; Rao, A

    2014-06-15

    Purpose: To generate, evaluate, and compare models that predict longitudinal changes in tumor morphology throughout the course of radiation therapy. Methods: Two morphology feature vectors were used to describe the size, shape, and position of 35 oropharyngeal GTVs at each treatment fraction during intensity-modulated radiation therapy. The feature vectors comprised the coordinates of the GTV centroids and one of two shape descriptors. One shape descriptor was based on radial distances between the GTV centroid and 614 GTV surface landmarks. The other was based on a spherical harmonic decomposition of these distances. Feature vectors over the course of therapy were describedmore » using static, linear, and mean models. The error of these models in forecasting GTV morphology was evaluated with leave-one-out cross-validation, and their accuracy was compared using Wilcoxon signed-rank tests. The effect of adjusting model parameters at 1, 2, 3, or 5 time points (adjustment points) was also evaluated. Results: The addition of a single adjustment point to the static model decreased the median error in forecasting the position of GTV surface landmarks by 1.2 mm (p<0.001). Additional adjustment points further decreased forecast error by about 0.4 mm each. The linear model decreased forecast error compared to the static model for feature vectors based on both shape descriptors (0.2 mm), while the mean model did so only for those based on the inter-landmark distances (0.2 mm). The decrease in forecast error due to adding adjustment points was greater than that due to model selection. Both effects diminished with subsequent adjustment points. Conclusion: Models of tumor morphology that include information from prior patients and/or prior treatment fractions are able to predict the tumor surface at each treatment fraction during radiation therapy. The predicted tumor morphology can be compared with patient anatomy or dose distributions, opening the possibility of anticipatory re-planning. American Legion Auxiliary Fellowship; The University of Texas Graduate School of Biomedical Sciences at Houston.« less

  14. A quantitative structure-activity relationship to predict efficacy of granular activated carbon adsorption to control emerging contaminants.

    PubMed

    Kennicutt, A R; Morkowchuk, L; Krein, M; Breneman, C M; Kilduff, J E

    2016-08-01

    A quantitative structure-activity relationship was developed to predict the efficacy of carbon adsorption as a control technology for endocrine-disrupting compounds, pharmaceuticals, and components of personal care products, as a tool for water quality professionals to protect public health. Here, we expand previous work to investigate a broad spectrum of molecular descriptors including subdivided surface areas, adjacency and distance matrix descriptors, electrostatic partial charges, potential energy descriptors, conformation-dependent charge descriptors, and Transferable Atom Equivalent (TAE) descriptors that characterize the regional electronic properties of molecules. We compare the efficacy of linear (Partial Least Squares) and non-linear (Support Vector Machine) machine learning methods to describe a broad chemical space and produce a user-friendly model. We employ cross-validation, y-scrambling, and external validation for quality control. The recommended Support Vector Machine model trained on 95 compounds having 23 descriptors offered a good balance between good performance statistics, low error, and low probability of over-fitting while describing a wide range of chemical features. The cross-validated model using a log-uptake (qe) response calculated at an aqueous equilibrium concentration (Ce) of 1 μM described the training dataset with an r(2) of 0.932, had a cross-validated r(2) of 0.833, and an average residual of 0.14 log units.

  15. Convenient QSAR model for predicting the complexation of structurally diverse compounds with beta-cyclodextrins.

    PubMed

    Pérez-Garrido, Alfonso; Morales Helguera, Aliuska; Abellán Guillén, Adela; Cordeiro, M Natália D S; Garrido Escudero, Amalio

    2009-01-15

    This paper reports a QSAR study for predicting the complexation of a large and heterogeneous variety of substances (233 organic compounds) with beta-cyclodextrins (beta-CDs). Several different theoretical molecular descriptors, calculated solely from the molecular structure of the compounds under investigation, and an efficient variable selection procedure, like the Genetic Algorithm, led to models with satisfactory global accuracy and predictivity. But the best-final QSAR model is based on Topological descriptors meanwhile offering a reasonable interpretation. This QSAR model was able to explain ca. 84% of the variance in the experimental activity, and displayed very good internal cross-validation statistics and predictivity on external data. It shows that the driving forces for CD complexation are mainly hydrophobic and steric (van der Waals) interactions. Thus, the results of our study provide a valuable tool for future screening and priority testing of beta-CDs guest molecules.

  16. Predicted Hematologic and Plasma Volume Responses Following Rapid Ascent to Progressive Altitudes

    DTIC Science & Technology

    2014-06-01

    of these changes, and define baseline demographics and physiologic descriptors that are important in predicting these changes. The overall impact of... physiologic descriptors that are important in predicting these changes. Using general linear mixed models and a comprehensive relational database...accomplished using a comprehensive relational database containing individual ascent profiles, demographics, and physiologic subject descriptors as well as

  17. Effectiveness and consistency of a suite of descriptors for assessing the ecological status of seagrass meadows (Posidonia oceanica L. Delile)

    NASA Astrophysics Data System (ADS)

    Rotini, Alice; Belmonte, Alessandro; Barrote, Isabel; Micheli, Carla; Peirano, Andrea; Santos, Rui O.; Silva, João; Migliore, Luciana

    2013-09-01

    The increasing rate of human-induced environmental changes on coastal marine ecosystems has created a demand for effective descriptors, in particular for those suitable for monitoring the status of seagrass meadows. Growing evidence has supported the useful application of biochemical and genetic descriptors such as secondary metabolite synthesis, photosynthetic activity and genetic diversity. In the present study, we have investigated the effectiveness of different descriptors (traditional, biochemical and genetic) in monitoring seagrass meadow conservation status. The Posidonia oceanica meadow of Monterosso al Mare (Ligurian sea, NW Mediterranean) was subjected to the measurement of bed density, leaf biometry, total phenols, soluble protein and photosynthetic pigment content as well as to RAPD marker analysis. This suite of descriptors provided evidence of their effectiveness and convenient application as markers of the conservation status of P. oceanica and/or other seagrasses. Biochemical/genetic descriptors and those obtained by traditional methods depicted a well conserved meadow with seasonal variability and, particularly in summer, indicated a healthier condition in a portion of the bed (station C), which was in agreement with the physical and sedimentological features of the station. Our results support the usefulness of introducing biochemical and genetic approaches to seagrass monitoring programs since they are effective indicators of plant physiological stress and environmental disturbance.

  18. Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds.

    PubMed

    Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena

    2013-01-01

    The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  19. Prediction of Chemical Function: Model Development and ...

    EPA Pesticide Factsheets

    The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (HT) screening-level exposures developed under ExpoCast can be combined with HT screening (HTS) bioactivity data for the risk-based prioritization of chemicals for further evaluation. The functional role (e.g. solvent, plasticizer, fragrance) that a chemical performs can drive both the types of products in which it is found and the concentration in which it is present and therefore impacting exposure potential. However, critical chemical use information (including functional role) is lacking for the majority of commercial chemicals for which exposure estimates are needed. A suite of machine-learning based models for classifying chemicals in terms of their likely functional roles in products based on structure were developed. This effort required collection, curation, and harmonization of publically-available data sources of chemical functional use information from government and industry bodies. Physicochemical and structure descriptor data were generated for chemicals with function data. Machine-learning classifier models for function were then built in a cross-validated manner from the descriptor/function data using the method of random forests. The models were applied to: 1) predict chemi

  20. QSAR modeling based on structure-information for properties of interest in human health.

    PubMed

    Hall, L H; Hall, L M

    2005-01-01

    The development of QSAR models based on topological structure description is presented for problems in human health. These models are based on the structure-information approach to quantitative biological modeling and prediction, in contrast to the mechanism-based approach. The structure-information approach is outlined, starting with basic structure information developed from the chemical graph (connection table). Information explicit in the connection table (element identity and skeletal connections) leads to significant (implicit) structure information that is useful for establishing sound models of a wide range of properties of interest in drug design. Valence state definition leads to relationships for valence state electronegativity and atom/group molar volume. Based on these important aspects of molecules, together with skeletal branching patterns, both the electrotopological state (E-state) and molecular connectivity (chi indices) structure descriptors are developed and described. A summary of four QSAR models indicates the wide range of applicability of these structure descriptors and the predictive quality of QSAR models based on them: aqueous solubility (5535 chemically diverse compounds, 938 in external validation), percent oral absorption (%OA, 417 therapeutic drugs, 195 drugs in external validation testing), AMES mutagenicity (2963 compounds including 290 therapeutic drugs, 400 in external validation), fish toxicity (92 substituted phenols, anilines and substituted aromatics). These models are established independent of explicit three-dimensional (3-D) structure information and are directly interpretable in terms of the implicit structure information useful to the drug design process.

  1. Recognizing characters of ancient manuscripts

    NASA Astrophysics Data System (ADS)

    Diem, Markus; Sablatnig, Robert

    2010-02-01

    Considering printed Latin text, the main issues of Optical Character Recognition (OCR) systems are solved. However, for degraded handwritten document images, basic preprocessing steps such as binarization, gain poor results with state-of-the-art methods. In this paper ancient Slavonic manuscripts from the 11th century are investigated. In order to minimize the consequences of false character segmentation, a binarization-free approach based on local descriptors is proposed. Additionally local information allows the recognition of partially visible or washed out characters. The proposed algorithm consists of two steps: character classification and character localization. Initially Scale Invariant Feature Transform (SIFT) features are extracted which are subsequently classified using Support Vector Machines (SVM). Afterwards, the interest points are clustered according to their spatial information. Thereby, characters are localized and finally recognized based on a weighted voting scheme of pre-classified local descriptors. Preliminary results show that the proposed system can handle highly degraded manuscript images with background clutter (e.g. stains, tears) and faded out characters.

  2. High throughput heuristics for prioritizing human exposure to environmental chemicals.

    PubMed

    Wambaugh, John F; Wang, Anran; Dionisio, Kathie L; Frame, Alicia; Egeghy, Peter; Judson, Richard; Setzer, R Woodrow

    2014-11-04

    The risk posed to human health by any of the thousands of untested anthropogenic chemicals in our environment is a function of both the hazard presented by the chemical and the extent of exposure. However, many chemicals lack estimates of exposure intake, limiting the understanding of health risks. We aim to develop a rapid heuristic method to determine potential human exposure to chemicals for application to the thousands of chemicals with little or no exposure data. We used Bayesian methodology to infer ranges of exposure consistent with biomarkers identified in urine samples from the U.S. population by the National Health and Nutrition Examination Survey (NHANES). We performed linear regression on inferred exposure for demographic subsets of NHANES demarked by age, gender, and weight using chemical descriptors and use information from multiple databases and structure-based calculators. Five descriptors are capable of explaining roughly 50% of the variability in geometric means across 106 NHANES chemicals for all the demographic groups, including children aged 6-11. We use these descriptors to estimate human exposure to 7968 chemicals, the majority of which have no other quantitative exposure prediction. For thousands of chemicals with no other information, this approach allows forecasting of average exposure intake of environmental chemicals.

  3. Molecular structure and gas chromatographic retention behavior of the components of Ylang-Ylang oil.

    PubMed

    Olivero, J; Gracia, T; Payares, P; Vivas, R; Díaz, D; Daza, E; Geerlings, P

    1997-05-01

    Using quantitative structure-retention relationships (QSRR) methodologies the Kovats gas chromatographic retention indices for both apolar (DB-1) and polar (DB-Wax) columns for 48 compounds from Ylang-Ylang essential oil were empirically predicted from calculated and experimental data on molecular structure. Topological, geometric, and electronic descriptors were obtained for model generation. Relationships between descriptors and the retention data reported were established by linear multiple regression, giving equations that can be used to predict the Kovats indices for compounds present in essential oils, both in DB-1 and DB-Wax columns. Factor analysis was performed to interpret the meaning of the descriptors included in the models. The prediction model for the DB-1 column includes descriptors such as Randic's first-order connectivity index (1X), the molecular surface (MSA), the sum of the atomic charge on all the hydrogens (QH), Randic's third-order connectivity index (3X) and the molecular electronegativity (chi). The prediction model for the DB-Wax column includes the first three descriptors mentioned for the DB-1 column (1X, MSA and QH) and the most negative charge (MNC), the global softness (S), and the difference between Randic's and Kier and Hall's third-order connectivity indexes (3X-3XV).

  4. Using probabilistic model as feature descriptor on a smartphone device for autonomous navigation of unmanned ground vehicles

    NASA Astrophysics Data System (ADS)

    Desai, Alok; Lee, Dah-Jye

    2013-12-01

    There has been significant research on the development of feature descriptors in the past few years. Most of them do not emphasize real-time applications. This paper presents the development of an affine invariant feature descriptor for low resource applications such as UAV and UGV that are equipped with an embedded system with a small microprocessor, a field programmable gate array (FPGA), or a smart phone device. UAV and UGV have proven suitable for many promising applications such as unknown environment exploration, search and rescue operations. These applications required on board image processing for obstacle detection, avoidance and navigation. All these real-time vision applications require a camera to grab images and match features using a feature descriptor. A good feature descriptor will uniquely describe a feature point thus allowing it to be correctly identified and matched with its corresponding feature point in another image. A few feature description algorithms are available for a resource limited system. They either require too much of the device's resource or too much simplification on the algorithm, which results in reduction in performance. This research is aimed at meeting the needs of these systems without sacrificing accuracy. This paper introduces a new feature descriptor called PRObabilistic model (PRO) for UGV navigation applications. It is a compact and efficient binary descriptor that is hardware-friendly and easy for implementation.

  5. Modeling and Prediction of Solvent Effect on Human Skin Permeability using Support Vector Regression and Random Forest.

    PubMed

    Baba, Hiromi; Takahara, Jun-ichi; Yamashita, Fumiyoshi; Hashida, Mitsuru

    2015-11-01

    The solvent effect on skin permeability is important for assessing the effectiveness and toxicological risk of new dermatological formulations in pharmaceuticals and cosmetics development. The solvent effect occurs by diverse mechanisms, which could be elucidated by efficient and reliable prediction models. However, such prediction models have been hampered by the small variety of permeants and mixture components archived in databases and by low predictive performance. Here, we propose a solution to both problems. We first compiled a novel large database of 412 samples from 261 structurally diverse permeants and 31 solvents reported in the literature. The data were carefully screened to ensure their collection under consistent experimental conditions. To construct a high-performance predictive model, we then applied support vector regression (SVR) and random forest (RF) with greedy stepwise descriptor selection to our database. The models were internally and externally validated. The SVR achieved higher performance statistics than RF. The (externally validated) determination coefficient, root mean square error, and mean absolute error of SVR were 0.899, 0.351, and 0.268, respectively. Moreover, because all descriptors are fully computational, our method can predict as-yet unsynthesized compounds. Our high-performance prediction model offers an attractive alternative to permeability experiments for pharmaceutical and cosmetic candidate screening and optimizing skin-permeable topical formulations.

  6. Novel fragment-based QSAR modeling and combinatorial design of pyrazole-derived CRK3 inhibitors as potent antileishmanials.

    PubMed

    Goyal, Sukriti; Dhanjal, Jaspreet K; Tyagi, Chetna; Goyal, Manisha; Grover, Abhinav

    2014-07-01

    The CRK3 cyclin-dependent kinase of Leishmania plays an important role in regulating the cell-cycle progression at the G2/M phase checkpoint transition, proliferation, and viability inside the host macrophage. In this study, a novel fragment-based QSAR model has been developed using 22 pyrazole-derived compounds exhibiting inhibitory activity against Leishmanial CRK3. Unlike other QSAR methods, this fragment-based method gives flexibility to study the relationship between molecular fragments of interest and their contribution for the variation in the biological response by evaluating cross-term fragment descriptors. Based on the fragment-based QSAR model, a combinatorial library was generated, and top two compounds were reported after predicting their activity. The QSAR model showed satisfactory statistical parameters for the data set (r(2) = 0.8752, q(2) = 0.6690, F-ratio = 30.37, and pred_r(2) = 0.8632) with four descriptors describing the nature of substituent groups and the environment of the substitution site. Evaluation of the model implied that electron-rich substitution at R1 position improves the inhibitory activity, while decline in inhibitory activity was observed in presence of nitrogen at R2 position. The analysis carried out in this study provides a substantial basis for consideration of the designed pyrazole-based leads as potent antileishmanial drugs. © 2014 John Wiley & Sons A/S.

  7. Towards reporting standards for neuropsychological study results: A proposal to minimize communication errors with standardized qualitative descriptors for normalized test scores.

    PubMed

    Schoenberg, Mike R; Rum, Ruba S

    2017-11-01

    Rapid, clear and efficient communication of neuropsychological results is essential to benefit patient care. Errors in communication are a lead cause of medical errors; nevertheless, there remains a lack of consistency in how neuropsychological scores are communicated. A major limitation in the communication of neuropsychological results is the inconsistent use of qualitative descriptors for standardized test scores and the use of vague terminology. PubMed search from 1 Jan 2007 to 1 Aug 2016 to identify guidelines or consensus statements for the description and reporting of qualitative terms to communicate neuropsychological test scores was conducted. The review found the use of confusing and overlapping terms to describe various ranges of percentile standardized test scores. In response, we propose a simplified set of qualitative descriptors for normalized test scores (Q-Simple) as a means to reduce errors in communicating test results. The Q-Simple qualitative terms are: 'very superior', 'superior', 'high average', 'average', 'low average', 'borderline' and 'abnormal/impaired'. A case example illustrates the proposed Q-Simple qualitative classification system to communicate neuropsychological results for neurosurgical planning. The Q-Simple qualitative descriptor system is aimed as a means to improve and standardize communication of standardized neuropsychological test scores. Research are needed to further evaluate neuropsychological communication errors. Conveying the clinical implications of neuropsychological results in a manner that minimizes risk for communication errors is a quintessential component of evidence-based practice. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. [The use of Cantonese pain descriptors among healthy young adults in Hong Kong].

    PubMed

    Chung, W Y; Wong, C H; Yang, J C; Tan, P P

    1998-12-01

    The interpretation and expression of pain are closely related to an individual's social and cultural background. To convey messages on pain, language and words (pain descriptors) is particularly significant in assessment and evaluation of pain severity and its management. Therefore, the study of pain descriptors is crucial in clinical practice. It was of exploratory-descriptive design. Samples were recruited by convenience. Data were collected by structured self-administered questionnaire. Data obtained included demographic information and pain descriptors used by the subjects in various pain conditions. Data were analyzed by descriptive statistics. Pain descriptors were categorized according to nature, process, intensity, aggravating factors, accompanying symptoms and behavioral manifestation. Total number of pain descriptors (in Cantonese) based on real pain experience was 3017, mean was 3 (n = 986). The commonest used descriptors was the nature of pain (41%). The intensity of pain constituted 20%. There was no significant difference in the number of pain descriptors between male and female. However, there was a significant difference between the type of pain descriptors used (Mfemale = 526, Mmale = 453, Z = -2.9729, p = 0.0029). There were also significant differences in the use of pain descriptors among the various age groups (X2 = 15.0157, df = 4, P = 0.0047) and educational levels (X2 = 11.2443, df = 4, P = 0.0240). The types of descriptors used increased with an increase in age and education levels. This exploratory-descriptive study explores the use of pain descriptors among Chinese young adults in Hong Kong. The result shows that female use more pain descriptors than male. The pain descriptors that female used are mostly of nature type. The similarities and differences in findings with those of the Ho's (1991) are compared.

  9. Artificial intelligence systems based on texture descriptors for vaccine development.

    PubMed

    Nanni, Loris; Brahnam, Sheryl; Lumini, Alessandra

    2011-02-01

    The aim of this work is to analyze and compare several feature extraction methods for peptide classification that are based on the calculation of texture descriptors starting from a matrix representation of the peptide. This texture-based representation of the peptide is then used to train a support vector machine classifier. In our experiments, the best results are obtained using local binary patterns variants and the discrete cosine transform with selected coefficients. These results are better than those previously reported that employed texture descriptors for peptide representation. In addition, we perform experiments that combine standard approaches based on amino acid sequence. The experimental section reports several tests performed on a vaccine dataset for the prediction of peptides that bind human leukocyte antigens and on a human immunodeficiency virus (HIV-1). Experimental results confirm the usefulness of our novel descriptors. The matlab implementation of our approaches is available at http://bias.csr.unibo.it/nanni/TexturePeptide.zip.

  10. Covariance descriptor fusion for target detection

    NASA Astrophysics Data System (ADS)

    Cukur, Huseyin; Binol, Hamidullah; Bal, Abdullah; Yavuz, Fatih

    2016-05-01

    Target detection is one of the most important topics for military or civilian applications. In order to address such detection tasks, hyperspectral imaging sensors provide useful images data containing both spatial and spectral information. Target detection has various challenging scenarios for hyperspectral images. To overcome these challenges, covariance descriptor presents many advantages. Detection capability of the conventional covariance descriptor technique can be improved by fusion methods. In this paper, hyperspectral bands are clustered according to inter-bands correlation. Target detection is then realized by fusion of covariance descriptor results based on the band clusters. The proposed combination technique is denoted Covariance Descriptor Fusion (CDF). The efficiency of the CDF is evaluated by applying to hyperspectral imagery to detect man-made objects. The obtained results show that the CDF presents better performance than the conventional covariance descriptor.

  11. Correlating Reactivity and Selectivity to Cyclopentadienyl Ligand Properties in Rh(III)-Catalyzed C-H Activation Reactions: An Experimental and Computational Study.

    PubMed

    Piou, Tiffany; Romanov-Michailidis, Fedor; Romanova-Michaelides, Maria; Jackson, Kelvin E; Semakul, Natthawat; Taggart, Trevor D; Newell, Brian S; Rithner, Christopher D; Paton, Robert S; Rovis, Tomislav

    2017-01-25

    Cp X Rh(III)-catalyzed C-H functionalization reactions are a proven method for the efficient assembly of small molecules. However, rationalization of the effects of cyclopentadienyl (Cp X ) ligand structure on reaction rate and selectivity has been viewed as a black box, and a truly systematic study is lacking. Consequently, predicting the outcomes of these reactions is challenging because subtle variations in ligand structure can cause notable changes in reaction behavior. A predictive tool is, nonetheless, of considerable value to the community as it would greatly accelerate reaction development. Designing a data set in which the steric and electronic properties of the Cp X Rh(III) catalysts were systematically varied allowed us to apply multivariate linear regression algorithms to establish correlations between these catalyst-based descriptors and the regio-, diastereoselectivity, and rate of model reactions. This, in turn, led to the development of quantitative predictive models that describe catalyst performance. Our newly described cone angles and Sterimol parameters for Cp X ligands served as highly correlative steric descriptors in the regression models. Through rational design of training and validation sets, key diastereoselectivity outliers were identified. Computations reveal the origins of the outstanding stereoinduction displayed by these outliers. The results are consistent with partial η 5 -η 3 ligand slippage that occurs in the transition state of the selectivity-determining step. In addition to the instructive value of our study, we believe that the insights gained are transposable to other group 9 transition metals and pave the way toward rational design of C-H functionalization catalysts.

  12. Salient aspects of PBP2A-inhibition; A QSAR Study.

    PubMed

    Ogunleye, Adewale J; Eniafe, Gabriel O; Inyang, Olumide K; Adewumi, Benjamin; Omotuyi, Olaposi I

    2018-05-15

    Backgound: Inhibition of penicillin binding protein 2A (PBP2A) represents a sound drug design strategy in combatting Methicillin resistant Staphylococcus aureus (MRSA). Considering the urgent need for effective antimicrobials in combatting MRSA infections, we have developed a statistically robust ensemble of molecular descriptors (1, 2, & 3-D) from compounds targeting PBP2A in vivo. 37 (training set: 26, test set: 11) PBP2A-inhibitors were submitted for descriptor generation after which an unsupervised, non-exhaustive genetic algorithm (GA) was deployed for fishing out the best descriptor subset. Assignment of descriptors to a regression model was accomplished with the Partial Least Square (PLS) algorithm. At the end, an ensemble of 30 descriptors accurately predicted the ligand bioactivity, IC50 (R = 0.9996, R2 = 0.9992, R2a = 0.9949, SEE =, 0.2297 Q2LOO = 0.9741). Inferentially, we noticed that the overall efficacy of this model greatly depends on atomic polarizability and negative charge (electron) density. Besides the formula derived, the high dimensional model also offers critical insights into salient cheminformatics parameter to note during hit-to-lead PBP2A-antagonist optimization. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  13. Objective scoring of transformed foci in BALB/c 3T3 cell transformation assay by statistical image descriptors.

    PubMed

    Urani, C; Corvi, R; Callegaro, G; Stefanini, F M

    2013-09-01

    In vitro cell transformation assays (CTAs) have been shown to model important stages of in vivo carcinogenesis and have the potential to predict carcinogenicity in humans. Advantages of CTAs are their ability of revealing both genotoxic and non-genotoxic carcinogens while reducing both experimental costs and the number of animals used. The endpoint of the CTA is foci formation, and requires classification under light microscopy based on morphology. Thus current limitations for the wide adoption of the assay partially depend on a fair degree of subjectivity in foci scoring. An objective evaluation may be obtained after separating foci from background monolayer in the digital image, and quantifying values of statistical descriptors which are selected to capture eye-scored morphological features. The aim of this study was to develop statistical descriptors to be applied to transformed foci of BALB/c 3T3, which cover foci size, multilayering and invasive cell growth into the background monolayer. Proposed descriptors were applied to a database of 407 foci images to explore the numerical features, and to illustrate open problems and potential solutions. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Structure-based predictions of 13C-NMR chemical shifts for a series of 2-functionalized 5-(methylsulfonyl)-1-phenyl-1H-indoles derivatives using GA-based MLR method

    NASA Astrophysics Data System (ADS)

    Ghavami, Raouf; Sadeghi, Faridoon; Rasouli, Zolikha; Djannati, Farhad

    2012-12-01

    Experimental values for the 13C NMR chemical shifts (ppm, TMS = 0) at 300 K ranging from 96.28 ppm (C4' of indole derivative 17) to 159.93 ppm (C4' of indole derivative 23) relative to deuteride chloroform (CDCl3, 77.0 ppm) or dimethylsulfoxide (DMSO, 39.50 ppm) as internal reference in CDCl3 or DMSO-d6 solutions have been collected from literature for thirty 2-functionalized 5-(methylsulfonyl)-1-phenyl-1H-indole derivatives containing different substituted groups. An effective quantitative structure-property relationship (QSPR) models were built using hybrid method combining genetic algorithm (GA) based on stepwise selection multiple linear regression (SWS-MLR) as feature-selection tools and correlation models between each carbon atom of indole derivative and calculated descriptors. Each compound was depicted by molecular structural descriptors that encode constitutional, topological, geometrical, electrostatic, and quantum chemical features. The accuracy of all developed models were confirmed using different types of internal and external procedures and various statistical tests. Furthermore, the domain of applicability for each model which indicates the area of reliable predictions was defined.

  15. Isoelectric point is an inadequate descriptor of MS2, Phi X 174 and PRD1 phages adhesion on abiotic surfaces.

    PubMed

    Dika, Christelle; Duval, Jérôme F L; Francius, Gregory; Perrin, Aline; Gantzer, Christophe

    2015-05-15

    MS2, Phi X 174 and PRD1 bacteriophages are commonly used as surrogates to evaluate pathogenic virus behavior in natural aquatic media. The interfacial properties of these model soft bioparticles are herein discussed in connection with their propensities to adhere onto abiotic surfaces that differ in terms of surface charges and hydrophobicities. The phages considered in this work exhibit distinct multilayered surface structures and their electrostatic charges are evaluated from the dependence of their electrophoretic mobilities on electrolyte concentration at neutral pH on the basis of electrokinetic theory for soft (bio)particles. The charges of the viruses probed by electrokinetics vary according to the sequence Phi X 174⩽PRD1≪MS2, where '<' stands for 'less charged than'. The hydrophobic/hydrophilic balances of the phages are further derived from their adhesions onto model hydrophobic and hydrophilic self-assembled mono-layers. The corresponding results lead to the following hydrophobicity sequence Phi X 174≪MS2

  16. Modeling the drugs' passive transfer in the body based on their chromatographic behavior.

    PubMed

    Kouskoura, Maria G; Kachrimanis, Kyriakos G; Markopoulou, Catherine K

    2014-11-01

    One of the most challenging aims in modern analytical chemistry and pharmaceutical analysis is to create models for drugs' behavior based on simulation experiments. Since drugs' effects are closely related to their molecular properties, numerous characteristics of drugs are used in order to acquire a model of passive absorption and transfer in the human body. Importantly, such direction in innovative bioanalytical methodologies is also of stressful need in the area of personalized medicine to implement nanotechnological and genomics advancements. Simulation experiments were carried out by examining and interpreting the chromatographic behavior of 113 analytes/drugs (400 observations) in RP-HPLC. The dataset employed for this purpose included 73 descriptors which are referring to the physicochemical properties of the mobile phase mixture in different proportions, the physicochemical properties of the analytes and the structural characteristics of their molecules. A series of different software packages was used to calculate all the descriptors apart from those referring to the structure of analytes. The correlation of the descriptors with the retention time of the analytes eluted from a C4 column with an aqueous mobile phase was employed as dataset to introduce the behavior models in the human body. Their evaluation with a Partial Least Squares (PLS) software proved that the chromatographic behavior of a drug on a lipophilic stationary and a polar mobile phase is directly related to its drug-ability. At the same time, the behavior of an unknown drug in the human body can be predicted with reliability via the Artificial Neural Networks (ANNs) software. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Diagnostic performance and reproducibility of T2w based and diffusion weighted imaging (DWI) based PI-RADSv2 lexicon descriptors for prostate MRI.

    PubMed

    Benndorf, Matthias; Hahn, Felix; Krönig, Malte; Jilg, Cordula Annette; Krauss, Tobias; Langer, Mathias; Dovi-Akué, Philippe

    2017-08-01

    To examine the diagnostic performance of PI-RADSv2 T2w and diffusion weighted imaging (DWI) based lexicon descriptors, inter-observer agreement for descriptor assignment and diagnostic accuracy of the PI-RADSv2 assessment categories for multiparametric prostate MRI. 176 lesions in 79 consecutive patients are analyzed, lesions are histopathologically verified by MRI-ultrasound fusion biopsy. All lesions are rated according to the PI-RADSv2 lexicon, descriptors for T2w and DWI sequences and resulting assessment categories are assigned by two independent blinded radiologists. We perform receiver-operating-characteristic analysis using the assessment categories. To analyze inter-observer agreement, we calculate weighted kappa values for assessment category assignment and unweighted kappa values for descriptor assignment. PI-RADSv2 assessment categories yield an area under the curve of 0.76/0.74 (radiologist 1/radiologist 2), P >0.05. Weighted kappa for agreement is 0.601 in the peripheral zone and 0.580 in the transition zone. We detect a difference in the cancer rate for PI-RADSv2 category 3 between peripheral zone (32%) and transition zone (12%), P <0.05. We obtain moderate agreement at most for descriptor assignment with kappa values ranging from 0.082 (T2w shape in the transition zone) to 0.407 (T2w signal intensity in the peripheral zone) and 0.493 (ADC pattern in the peripheral zone). Our analysis corroborates typical descriptors for benign/malignant lesions, but also reveals insights into potential pitfalls - T2w wedge shaped lesions in the peripheral zone have a considerable cancer rate, despite being labelled category 2 in the lexicon. Agreement for descriptor assignment in the PI-RADSv2 lexicon is at most moderate in our study. Typical descriptors for benign and malignant lesions are validated, whereas the discriminatory power of some descriptors is challenged. The difference in the cancer rate for PI-RADSv2 category 3 between peripheral zone and transition zone should be considered when management recommendations are linked to assessment categories in the future. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Real-time, resource-constrained object classification on a micro-air vehicle

    NASA Astrophysics Data System (ADS)

    Buck, Louis; Ray, Laura

    2013-12-01

    A real-time embedded object classification algorithm is developed through the novel combination of binary feature descriptors, a bag-of-visual-words object model and the cortico-striatal loop (CSL) learning algorithm. The BRIEF, ORB and FREAK binary descriptors are tested and compared to SIFT descriptors with regard to their respective classification accuracies, execution times, and memory requirements when used with CSL on a 12.6 g ARM Cortex embedded processor running at 800 MHz. Additionally, the effect of x2 feature mapping and opponent-color representations used with these descriptors is examined. These tests are performed on four data sets of varying sizes and difficulty, and the BRIEF descriptor is found to yield the best combination of speed and classification accuracy. Its use with CSL achieves accuracies between 67% and 95% of those achieved with SIFT descriptors and allows for the embedded classification of a 128x192 pixel image in 0.15 seconds, 60 times faster than classification with SIFT. X2 mapping is found to provide substantial improvements in classification accuracy for all of the descriptors at little cost, while opponent-color descriptors are offer accuracy improvements only on colorful datasets.

  19. Discovering collectively informative descriptors from high-throughput experiments

    PubMed Central

    2009-01-01

    Background Improvements in high-throughput technology and its increasing use have led to the generation of many highly complex datasets that often address similar biological questions. Combining information from these studies can increase the reliability and generalizability of results and also yield new insights that guide future research. Results This paper describes a novel algorithm called BLANKET for symmetric analysis of two experiments that assess informativeness of descriptors. The experiments are required to be related only in that their descriptor sets intersect substantially and their definitions of case and control are consistent. From resulting lists of n descriptors ranked by informativeness, BLANKET determines shortlists of descriptors from each experiment, generally of different lengths p and q. For any pair of shortlists, four numbers are evident: the number of descriptors appearing in both shortlists, in exactly one shortlist, or in neither shortlist. From the associated contingency table, BLANKET computes Right Fisher Exact Test (RFET) values used as scores over a plane of possible pairs of shortlist lengths [1,2]. BLANKET then chooses a pair or pairs with RFET score less than a threshold; the threshold depends upon n and shortlist length limits and represents a quality of intersection achieved by less than 5% of random lists. Conclusions Researchers seek within a universe of descriptors some minimal subset that collectively and efficiently predicts experimental outcomes. Ideally, any smaller subset should be insufficient for reliable prediction and any larger subset should have little additional accuracy. As a method, BLANKET is easy to conceptualize and presents only moderate computational complexity. Many existing databases could be mined using BLANKET to suggest optimal sets of predictive descriptors. PMID:20021653

  20. How young water fractions can delineate travel time distributions in contrasting catchments

    NASA Astrophysics Data System (ADS)

    Lutz, Stefanie; Zink, Matthias; Merz, Ralf

    2017-04-01

    Travel time distributions (TTDs) are crucial descriptors of flow and transport processes in catchments. Tracking fluxes of environmental tracers such as stable water isotopes offers a practicable method to determine TTDs. The mean transit time (MTT) is the most commonly reported statistic of TTDs; however, MTT assessments are prone to large aggregation biases resulting from spatial heterogeneity and non-stationarity in real-world catchments. Recently, the young water fraction (Fyw) has been introduced as a more robust statistic that can be derived from seasonal tracer cycles. In this study, we aimed at improving the assessment of TTDs by using Fyw as additional information in lumped isotope models. First, we calculated Fyw from monthly δ18O-samples for 24 contrasting sub-catchments in a meso-scale catchment (3300 km2). Fyw ranged from 0.01 to 0.27 (mean= 0.11) and was not significantly correlated with catchment characteristics (e.g., mean slope, catchment area, and baseflow index) apart from the dominant soil type. Second, assuming gamma-shaped TTDs, we determined time-invariant TTDs for each sub-catchment by optimization of lumped isotope models using the convolution integral method. Whereas multiple optimization runs for the same sub-catchment showed a wide range of TTD parameters, the use of Fyw as additional information allowed constraining this range and thus improving the assessment of MTTs. Hence, the best model fit to observed isotope data might not be the desired solution, as the resulting TTD might define a young water fraction non-consistent with the tracer-cycle based Fyw. Given that the latter is a robust descriptor of fast-flow contribution, isotope models should instead aim at accurately describing both Fyw and the isotope time series in order to improve our understanding of flow and transport in catchments.

  1. Vapor–Liquid Equilibrium and Polarization Behavior of the GCP Water Model: Gaussian Charge-on-Spring versus Dipole Self-Consistent Field Approaches to Induced Polarization

    DOE PAGES

    Chialvo, Ariel A.; Moucka, Filip; Vlcek, Lukas; ...

    2015-03-24

    Here we implemented the Gaussian charge-on-spring (GCOS) version of the original self-consistent field implementation of the Gaussian Charge Polarizable water model and test its accuracy to represent the polarization behavior of the original model involving smeared charges and induced dipole moments. Moreover, for that purpose we adapted the recently developed multiple-particle-move (MPM) within the Gibbs and isochoric-isothermal ensembles Monte Carlo methods for the efficient simulation of polarizable fluids. We also assessed the accuracy of the GCOS representation by a direct comparison of the resulting vapor-liquid phase envelope, microstructure, and relevant microscopic descriptors of water polarization along the orthobaric curve againstmore » the corresponding quantities from the actual GCP water model.« less

  2. Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    von Lilienfeld, O. Anatole; Ramakrishnan, Raghunathan; Rupp, Matthias

    We introduce a fingerprint representation of molecules based on a Fourier series of atomic radial distribution functions. This fingerprint is unique (except for chirality), continuous, and differentiable with respect to atomic coordinates and nuclear charges. It is invariant with respect to translation, rotation, and nuclear permutation, and requires no preconceived knowledge about chemical bonding, topology, or electronic orbitals. As such, it meets many important criteria for a good molecular representation, suggesting its usefulness for machine learning models of molecular properties trained across chemical compound space. To assess the performance of this new descriptor, we have trained machine learning models ofmore » molecular enthalpies of atomization for training sets with up to 10 k organic molecules, drawn at random from a published set of 134 k organic molecules with an average atomization enthalpy of over 1770 kcal/mol. We validate the descriptor on all remaining molecules of the 134 k set. For a training set of 10 k molecules, the fingerprint descriptor achieves a mean absolute error of 8.0 kcal/mol. This is slightly worse than the performance attained using the Coulomb matrix, another popular alternative, reaching 6.2 kcal/mol for the same training and test sets. (c) 2015 Wiley Periodicals, Inc.« less

  3. Predicting Retention Times of Naturally Occurring Phenolic Compounds in Reversed-Phase Liquid Chromatography: A Quantitative Structure-Retention Relationship (QSRR) Approach

    PubMed Central

    Akbar, Jamshed; Iqbal, Shahid; Batool, Fozia; Karim, Abdul; Chan, Kim Wei

    2012-01-01

    Quantitative structure-retention relationships (QSRRs) have successfully been developed for naturally occurring phenolic compounds in a reversed-phase liquid chromatographic (RPLC) system. A total of 1519 descriptors were calculated from the optimized structures of the molecules using MOPAC2009 and DRAGON softwares. The data set of 39 molecules was divided into training and external validation sets. For feature selection and mapping we used step-wise multiple linear regression (SMLR), unsupervised forward selection followed by step-wise multiple linear regression (UFS-SMLR) and artificial neural networks (ANN). Stable and robust models with significant predictive abilities in terms of validation statistics were obtained with negation of any chance correlation. ANN models were found better than remaining two approaches. HNar, IDM, Mp, GATS2v, DISP and 3D-MoRSE (signals 22, 28 and 32) descriptors based on van der Waals volume, electronegativity, mass and polarizability, at atomic level, were found to have significant effects on the retention times. The possible implications of these descriptors in RPLC have been discussed. All the models are proven to be quite able to predict the retention times of phenolic compounds and have shown remarkable validation, robustness, stability and predictive performance. PMID:23203132

  4. An efficient descriptor model for designing materials for solar cells

    NASA Astrophysics Data System (ADS)

    Alharbi, Fahhad H.; Rashkeev, Sergey N.; El-Mellouhi, Fedwa; Lüthi, Hans P.; Tabet, Nouar; Kais, Sabre

    2015-11-01

    An efficient descriptor model for fast screening of potential materials for solar cell applications is presented. It works for both excitonic and non-excitonic solar cells materials, and in addition to the energy gap it includes the absorption spectrum (α(E)) of the material. The charge transport properties of the explored materials are modelled using the characteristic diffusion length (Ld) determined for the respective family of compounds. The presented model surpasses the widely used Scharber model developed for bulk heterojunction solar cells. Using published experimental data, we show that the presented model is more accurate in predicting the achievable efficiencies. To model both excitonic and non-excitonic systems, two different sets of parameters are used to account for the different modes of operation. The analysis of the presented descriptor model clearly shows the benefit of including α(E) and Ld in view of improved screening results.

  5. Applications of genetic algorithms on the structure-activity relationship analysis of some cinnamamides.

    PubMed

    Hou, T J; Wang, J M; Liao, N; Xu, X J

    1999-01-01

    Quantitative structure-activity relationships (QSARs) for 35 cinnamamides were studied. By using a genetic algorithm (GA), a group of multiple regression models with high fitness scores was generated. From the statistical analyses of the descriptors used in the evolution procedure, the principal features affecting the anticonvulsant activity were found. The significant descriptors include the partition coefficient, the molar refraction, the Hammet sigma constant of the substituents on the benzene ring, and the formation energy of the molecules. It could be found that the steric complementarity and the hydrophobic interaction between the inhibitors and the receptor were very important to the biological activity, while the contribution of the electronic effect was not so obvious. Moreover, by construction of the spline models for these four principal descriptors, the effective range for each descriptor was identified.

  6. A blur-invariant local feature for motion blurred image matching

    NASA Astrophysics Data System (ADS)

    Tong, Qiang; Aoki, Terumasa

    2017-07-01

    Image matching between a blurred (caused by camera motion, out of focus, etc.) image and a non-blurred image is a critical task for many image/video applications. However, most of the existing local feature schemes fail to achieve this work. This paper presents a blur-invariant descriptor and a novel local feature scheme including the descriptor and the interest point detector based on moment symmetry - the authors' previous work. The descriptor is based on a new concept - center peak moment-like element (CPME) which is robust to blur and boundary effect. Then by constructing CPMEs, the descriptor is also distinctive and suitable for image matching. Experimental results show our scheme outperforms state of the art methods for blurred image matching

  7. Selection of morphoagronomic descriptors for the characterization of accessions of cassava of the Eastern Brazilian Amazon.

    PubMed

    Silva, R S; Moura, E F; Farias-Neto, J T; Ledo, C A S; Sampaio, J E

    2017-04-13

    The aim of this study was to select morphoagronomic descriptors to characterize cassava accessions representative of Eastern Brazilian Amazonia. It was characterized 262 accessions using 21 qualitative descriptors. The multiple-correspondence analysis (MCA) technique was applied using the criteria: contribution of the descriptor in the last factorial axis of analysis in successive cycles (SMCA); reverse order of the descriptor's contribution in the last factorial axis of analysis with all descriptors ('O'´p') of Jolliffe's method; mean of the contribution orders of the descriptor in the first three factorial axes in the analysis with all descriptors ('Os') together with ('O'´p'); and order of contribution of weighted mean in the first three factorial axes in the analysis of all descriptors ('Oz'). The dissimilarity coefficient was measured by the method of multicategorical variables. The correlation among the matrix generated with all descriptors and matrices based on each criteria varied (r = 0.21, r = 0.97, r = 0.98, r = 0.13 for SMCA, 'Os', 'Oz' and 'O'´p', respectively). The least informative descriptors were discarded independently and according to both 'Os' and 'Oz' criteria. Thirteen descriptors were capable to discriminate the accessions and to represent the morphological variability of accessions sampled in Brazilian Eastern Amazonia: color of apical leaves, petiole color, color of stem exterior, external color of storage root, color of stem cortex, color of root pulp, texture of root epidermis, color of leaf vein, color of stem epidermis, color of end branches of adult plant, branching habit, root shape, and constriction of root.

  8. Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes.

    PubMed

    Marrero-Ponce, Yovani; Contreras-Torres, Ernesto; García-Jacas, César R; Barigye, Stephen J; Cubillán, Néstor; Alvarado, Ysaías J

    2015-06-07

    In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝ(n) space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝ(n) space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC(2)) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data,more » optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed (c) DTB and (d) DTF regression models to predict the T. pyriformis toxicity of diverse chemicals. - Highlights: • Ensemble learning (EL) based models constructed for toxicity prediction of chemicals • Predictive models used a few simple non-quantum mechanical molecular descriptors. • EL-based DTB/DTF models successfully discriminated toxic and non-toxic chemicals. • DTB/DTF regression models precisely predicted toxicity of chemicals in multi-species. • Proposed EL based models can be used as tool to predict toxicity of new chemicals.« less

  10. Self-organizing maps of molecular descriptors for sesquiterpene lactones and their application to the chemotaxonomy of the Asteraceae family.

    PubMed

    Scotti, Marcus T; Emerenciano, Vicente; Ferreira, Marcelo J P; Scotti, Luciana; Stefani, Ricardo; da Silva, Marcelo S; Mendonça Junior, Francisco Jaime B

    2012-04-20

    The Asteraceae, one of the largest families among angiosperms, is chemically characterised by the production of sesquiterpene lactones (SLs). A total of 1,111 SLs, which were extracted from 658 species, 161 genera, 63 subtribes and 15 tribes of Asteraceae, were represented and registered in two dimensions in the SISTEMATX, an in-house software system, and were associated with their botanical sources. The respective 11 block of descriptors: Constitutional, Functional groups, BCUT, Atom-centred, 2D autocorrelations, Topological, Geometrical, RDF, 3D-MoRSE, GETAWAY and WHIM were used as input data to separate the botanical occurrences through self-organising maps. Maps that were generated with each descriptor divided the Asteraceae tribes, with total index values between 66.7% and 83.6%. The analysis of the results shows evident similarities among the Heliantheae, Helenieae and Eupatorieae tribes as well as between the Anthemideae and Inuleae tribes. Those observations are in agreement with systematic classifications that were proposed by Bremer, which use mainly morphological and molecular data, therefore chemical markers partially corroborate with these classifications. The results demonstrate that the atom-centred and RDF descriptors can be used as a tool for taxonomic classification in low hierarchical levels, such as tribes. Descriptors obtained through fragments or by the two-dimensional representation of the SL structures were sufficient to obtain significant results, and better results were not achieved by using descriptors derived from three-dimensional representations of SLs. Such models based on physico-chemical properties can project new design SLs, similar structures from literature or even unreported structures in two-dimensional chemical space. Therefore, the generated SOMs can predict the most probable tribe where a biologically active molecule can be found according Bremer classification.

  11. Discovery of Novel HIV-1 Integrase Inhibitors Using QSAR-Based Virtual Screening of the NCI Open Database.

    PubMed

    Ko, Gene M; Garg, Rajni; Bailey, Barbara A; Kumar, Sunil

    2016-01-01

    Quantitative structure-activity relationship (QSAR) models can be used as a predictive tool for virtual screening of chemical libraries to identify novel drug candidates. The aims of this paper were to report the results of a study performed for descriptor selection, QSAR model development, and virtual screening for identifying novel HIV-1 integrase inhibitor drug candidates. First, three evolutionary algorithms were compared for descriptor selection: differential evolution-binary particle swarm optimization (DE-BPSO), binary particle swarm optimization, and genetic algorithms. Next, three QSAR models were developed from an ensemble of multiple linear regression, partial least squares, and extremely randomized trees models. A comparison of the performances of three evolutionary algorithms showed that DE-BPSO has a significant improvement over the other two algorithms. QSAR models developed in this study were used in consensus as a predictive tool for virtual screening of the NCI Open Database containing 265,242 compounds to identify potential novel HIV-1 integrase inhibitors. Six compounds were predicted to be highly active (plC50 > 6) by each of the three models. The use of a hybrid evolutionary algorithm (DE-BPSO) for descriptor selection and QSAR model development in drug design is a novel approach. Consensus modeling may provide better predictivity by taking into account a broader range of chemical properties within the data set conducive for inhibition that may be missed by an individual model. The six compounds identified provide novel drug candidate leads in the design of next generation HIV- 1 integrase inhibitors targeting drug resistant mutant viruses.

  12. Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou's PseAAC.

    PubMed

    Contreras-Torres, Ernesto

    2018-06-02

    In this study, I introduce novel global and local 0D-protein descriptors based on a statistical quantity named Total Sum of Squares (TSS). This quantity represents the sum of the squares differences of amino acid properties from the arithmetic mean property. As an extension, the amino acid-types and amino acid-groups formalisms are used for describing zones of interest in proteins. To assess the effectiveness of the proposed descriptors, a Nearest Neighbor model for predicting the major four protein structural classes was built. This model has a success rate of 98.53% on the jackknife cross-validation test; this performance being superior to other reported methods despite the simplicity of the predictor. Additionally, this predictor has an average success rate of 98.35% in different cross-validation tests performed. A value of 0.98 for the Kappa statistic clearly discriminates this model from a random predictor. The results obtained by the Nearest Neighbor model demonstrated the ability of the proposed descriptors not only to reflect relevant biochemical information related to the structural classes of proteins but also to allow appropriate interpretability. It can thus be expected that the current method may play a supplementary role to other existing approaches for protein structural class prediction and other protein attributes. Copyright © 2018 Elsevier Ltd. All rights reserved.

  13. Local chemical potential, local hardness, and dual descriptors in temperature dependent chemical reactivity theory.

    PubMed

    Franco-Pérez, Marco; Ayers, Paul W; Gázquez, José L; Vela, Alberto

    2017-05-31

    In this work we establish a new temperature dependent procedure within the grand canonical ensemble, to avoid the Dirac delta function exhibited by some of the second order chemical reactivity descriptors based on density functional theory, at a temperature of 0 K. Through the definition of a local chemical potential designed to integrate to the global temperature dependent electronic chemical potential, the local chemical hardness is expressed in terms of the derivative of this local chemical potential with respect to the average number of electrons. For the three-ground-states ensemble model, this local hardness contains a term that is equal to the one intuitively proposed by Meneses, Tiznado, Contreras and Fuentealba, which integrates to the global hardness given by the difference in the first ionization potential, I, and the electron affinity, A, at any temperature. However, in the present approach one finds an additional temperature-dependent term that introduces changes at the local level and integrates to zero. Additionally, a τ-hard dual descriptor and a τ-soft dual descriptor given in terms of the product of the global hardness and the global softness multiplied by the dual descriptor, respectively, are derived. Since all these reactivity indices are given by expressions composed of terms that correspond to products of the global properties multiplied by the electrophilic or nucleophilic Fukui functions, they may be useful for studying and comparing equivalent sites in different chemical environments.

  14. Chemometric modeling of 5-Phenylthiophenecarboxylic acid derivatives as anti-rheumatic agents.

    PubMed

    Adhikari, Nilanjan; Jana, Dhritiman; Halder, Amit K; Mondal, Chanchal; Maiti, Milan K; Jha, Tarun

    2012-09-01

    Arthritis involves joint inflammation, synovial proliferation and damage of cartilage. Interleukin-1 undergoes acute and chronic inflammatory mechanisms of arthritis. Non-steroidal anti-inflammatory drugs can produce symptomatic relief but cannot act through mechanisms of arthritis. Diseases modifying anti-rheumatoid drugs reduce the symptoms of arthritis like decrease in pain and disability score, reduction of swollen joints, articular index and serum concentration of acute phage proteins. Recently, some literature references are obtained on molecular modeling of antirheumatic agents. We have tried chemometric modeling through 2D-QSAR studies on a dataset of fifty-one compounds out of which forty-four 5-Phenylthiophenecarboxylic acid derivatives have IL-1 inhibitory activity and forty-six 5-Phenylthiophenecarboxylic acid derivatives have %AIA suppressive activity. The work was done to find out the structural requirements of these anti-rheumatic agents. 2D QSAR models were generated by 2D and 3D descriptors by using multiple linear regression and partial least square method where IL-1 antagonism was considered as the biological activity parameter. Statistically significant models were developed on the training set developed by k-means cluster analysis. Sterimol parameters, electronic interaction at atom number 9, 2D autocorrelation descriptors, information content descriptor, average connectivity index chi-3, radial distribution function, Balaban 3D index and 3D-MoRSE descriptors were found to play crucial roles to modulate IL-1 inhibitory activity. 2D autocorrelation descriptors like Broto-Moreau autocorrelation of topological structure-lag 3 weighted by atomic van der Waals volumes, Geary autocorrelation-lag 7 associated with weighted atomic Sanderson electronegativities and 3D-MoRSE descriptors like 3D-MoRSE-signal 22 related to atomic van der Waals volumes, 3D-MoRSE-signal 28 related to atomic van der Waals volumes and 3D-MoRSE-signal 9 which was unweighted, were found to play important roles to model %AIA suppressive activity.

  15. A proposal for a computer-based framework of support for public health in the management of biological incidents: the Czech Republic experience.

    PubMed

    Bures, Vladimír; Otcenásková, Tereza; Cech, Pavel; Antos, Karel

    2012-11-01

    Biological incidents jeopardising public health require decision-making that consists of one dominant feature: complexity. Therefore, public health decision-makers necessitate appropriate support. Based on the analogy with business intelligence (BI) principles, the contextual analysis of the environment and available data resources, and conceptual modelling within systems and knowledge engineering, this paper proposes a general framework for computer-based decision support in the case of a biological incident. At the outset, the analysis of potential inputs to the framework is conducted and several resources such as demographic information, strategic documents, environmental characteristics, agent descriptors and surveillance systems are considered. Consequently, three prototypes were developed, tested and evaluated by a group of experts. Their selection was based on the overall framework scheme. Subsequently, an ontology prototype linked with an inference engine, multi-agent-based model focusing on the simulation of an environment, and expert-system prototypes were created. All prototypes proved to be utilisable support tools for decision-making in the field of public health. Nevertheless, the research revealed further issues and challenges that might be investigated by both public health focused researchers and practitioners.

  16. Notes on quantitative structure-properties relationships (QSPR) (1): A discussion on a QSPR dimensionality paradox (QSPR DP) and its quantum resolution.

    PubMed

    Carbó-Dorca, Ramon; Gallegos, Ana; Sánchez, Angel J

    2009-05-01

    Classical quantitative structure-properties relationship (QSPR) statistical techniques unavoidably present an inherent paradoxical computational context. They rely on the definition of a Gram matrix in descriptor spaces, which is used afterwards to reduce the original dimension via several possible kinds of algebraic manipulations. From there, effective models for the computation of unknown properties of known molecular structures are obtained. However, the reduced descriptor dimension causes linear dependence within the set of discrete vector molecular representations, leading to positive semi-definite Gram matrices in molecular spaces. To resolve this QSPR dimensionality paradox (QSPR DP) here is proposed to adopt as starting point the quantum QSPR (QQSPR) computational framework perspective, where density functions act as infinite dimensional descriptors. The fundamental QQSPR equation, deduced from employing quantum expectation value numerical evaluation, can be approximately solved in order to obtain models exempt of the QSPR DP. The substitution of the quantum similarity matrix by an empirical Gram matrix in molecular spaces, build up with the original non manipulated discrete molecular descriptor vectors, permits to obtain classical QSPR models with the same characteristics as in QQSPR, that is: possessing a certain degree of causality and explicitly independent of the descriptor dimension. 2008 Wiley Periodicals, Inc.

  17. The proposal of architecture for chemical splitting to optimize QSAR models for aquatic toxicity.

    PubMed

    Colombo, Andrea; Benfenati, Emilio; Karelson, Mati; Maran, Uko

    2008-06-01

    One of the challenges in the field of quantitative structure-activity relationship (QSAR) analysis is the correct classification of a chemical compound to an appropriate model for the prediction of activity. Thus, in previous studies, compounds have been divided into distinct groups according to their mode of action or chemical class. In the current study, theoretical molecular descriptors were used to divide 568 organic substances into subsets with toxicity measured for the 96-h lethal median concentration for the Fathead minnow (Pimephales promelas). Simple constitutional descriptors such as the number of aliphatic and aromatic rings and a quantum chemical descriptor, maximum bond order of a carbon atom divide compounds into nine subsets. For each subset of compounds the automatic forward selection of descriptors was applied to construct QSAR models. Significant correlations were achieved for each subset of chemicals and all models were validated with the leave-one-out internal validation procedure (R(2)(cv) approximately 0.80). The results encourage to consider this alternative way for the prediction of toxicity using QSAR subset models without direct reference to the mechanism of toxic action or the traditional chemical classification.

  18. PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions.

    PubMed

    Dong, Jie; Yao, Zhi-Jiang; Zhang, Lin; Luo, Feijun; Lin, Qinlu; Lu, Ai-Ping; Chen, Alex F; Cao, Dong-Sheng

    2018-03-20

    With the increasing development of biotechnology and informatics technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these data needs to be extracted and transformed to useful knowledge by various data mining methods. Considering the amazing rate at which data are accumulated in chemistry and biology fields, new tools that process and interpret large and complex interaction data are increasingly important. So far, there are no suitable toolkits that can effectively link the chemical and biological space in view of molecular representation. To further explore these complex data, an integrated toolkit for various molecular representation is urgently needed which could be easily integrated with data mining algorithms to start a full data analysis pipeline. Herein, the python library PyBioMed is presented, which comprises functionalities for online download for various molecular objects by providing different IDs, the pretreatment of molecular structures, the computation of various molecular descriptors for chemicals, proteins, DNAs and their interactions. PyBioMed is a feature-rich and highly customized python library used for the characterization of various complex chemical and biological molecules and interaction samples. The current version of PyBioMed could calculate 775 chemical descriptors and 19 kinds of chemical fingerprints, 9920 protein descriptors based on protein sequences, more than 6000 DNA descriptors from nucleotide sequences, and interaction descriptors from pairwise samples using three different combining strategies. Several examples and five real-life applications were provided to clearly guide the users how to use PyBioMed as an integral part of data analysis projects. By using PyBioMed, users are able to start a full pipelining from getting molecular data, pretreating molecules, molecular representation to constructing machine learning models conveniently. PyBioMed provides various user-friendly and highly customized APIs to calculate various features of biological molecules and complex interaction samples conveniently, which aims at building integrated analysis pipelines from data acquisition, data checking, and descriptor calculation to modeling. PyBioMed is freely available at http://projects.scbdd.com/pybiomed.html .

  19. Descriptors for ions and ion-pairs for use in linear free energy relationships.

    PubMed

    Abraham, Michael H; Acree, William E

    2016-01-22

    The determination of Abraham descriptors for single ions is reviewed, and equations are given for the partition of single ions from water to a number of solvents. These ions include permanent anions and cations and ionic species such as carboxylic acid anions, phenoxide anions and protonated base cations. Descriptors for a large number of ions and ionic species are listed, and equations for the prediction of Abraham descriptors for ionic species are given. The application of descriptors for ions and ionic species to physicochemical processes is given; these are to water-solvent partitions, HPLC retention data, immobilised artificial membranes, the Finkelstein reaction and diffusion in water. Applications to biological processes include brain permeation, microsomal degradation of drugs, skin permeation and human intestinal absorption. The review concludes with a section on the determination of descriptors for ion-pairs. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Design of an optimal preview controller for linear discrete-time descriptor systems with state delay

    NASA Astrophysics Data System (ADS)

    Cao, Mengjuan; Liao, Fucheng

    2015-04-01

    In this paper, the linear discrete-time descriptor system with state delay is studied, and a design method for an optimal preview controller is proposed. First, by using the discrete lifting technique, the original system is transformed into a general descriptor system without state delay in form. Then, taking advantage of the first-order forward difference operator, we construct a descriptor augmented error system, including the state vectors of the lifted system, error vectors, and desired target signals. Rigorous mathematical proofs are given for the regularity, stabilisability, causal controllability, and causal observability of the descriptor augmented error system. Based on these, the optimal preview controller with preview feedforward compensation for the original system is obtained by using the standard optimal regulator theory of the descriptor system. The effectiveness of the proposed method is shown by numerical simulation.

  1. Skin injury model classification based on shape vector analysis

    PubMed Central

    2012-01-01

    Background: Skin injuries can be crucial in judicial decision making. Forensic experts base their classification on subjective opinions. This study investigates whether known classes of simulated skin injuries are correctly classified statistically based on 3D surface models and derived numerical shape descriptors. Methods: Skin injury surface characteristics are simulated with plasticine. Six injury classes – abrasions, incised wounds, gunshot entry wounds, smooth and textured strangulation marks as well as patterned injuries - with 18 instances each are used for a k-fold cross validation with six partitions. Deformed plasticine models are captured with a 3D surface scanner. Mean curvature is estimated for each polygon surface vertex. Subsequently, distance distributions and derived aspect ratios, convex hulls, concentric spheres, hyperbolic points and Fourier transforms are used to generate 1284-dimensional shape vectors. Subsequent descriptor reduction maximizing SNR (signal-to-noise ratio) result in an average of 41 descriptors (varying across k-folds). With non-normal multivariate distribution of heteroskedastic data, requirements for LDA (linear discriminant analysis) are not met. Thus, shrinkage parameters of RDA (regularized discriminant analysis) are optimized yielding a best performance with λ = 0.99 and γ = 0.001. Results: Receiver Operating Characteristic of a descriptive RDA yields an ideal Area Under the Curve of 1.0for all six categories. Predictive RDA results in an average CRR (correct recognition rate) of 97,22% under a 6 partition k-fold. Adding uniform noise within the range of one standard deviation degrades the average CRR to 71,3%. Conclusions: Digitized 3D surface shape data can be used to automatically classify idealized shape models of simulated skin injuries. Deriving some well established descriptors such as histograms, saddle shape of hyperbolic points or convex hulls with subsequent reduction of dimensionality while maximizing SNR seem to work well for the data at hand, as predictive RDA results in CRR of 97,22%. Objective basis for discrimination of non-overlapping hypotheses or categories are a major issue in medicolegal skin injury analysis and that is where this method appears to be strong. Technical surface quality is important in that adding noise clearly degrades CRR. Trial registration: This study does not cover the results of a controlled health care intervention as only plasticine was used. Thus, there was no trial registration. PMID:23497357

  2. The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases.

    PubMed

    Udatha, D B R K Gupta; Kouskoumvekaki, Irene; Olsson, Lisbeth; Panagiotou, Gianni

    2011-01-01

    One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs. Copyright © 2010 Elsevier Inc. All rights reserved.

  3. Improving quantitative structure-activity relationship models using Artificial Neural Networks trained with dropout.

    PubMed

    Mendenhall, Jeffrey; Meiler, Jens

    2016-02-01

    Dropout is an Artificial Neural Network (ANN) training technique that has been shown to improve ANN performance across canonical machine learning (ML) datasets. Quantitative Structure Activity Relationship (QSAR) datasets used to relate chemical structure to biological activity in Ligand-Based Computer-Aided Drug Discovery pose unique challenges for ML techniques, such as heavily biased dataset composition, and relatively large number of descriptors relative to the number of actives. To test the hypothesis that dropout also improves QSAR ANNs, we conduct a benchmark on nine large QSAR datasets. Use of dropout improved both enrichment false positive rate and log-scaled area under the receiver-operating characteristic curve (logAUC) by 22-46 % over conventional ANN implementations. Optimal dropout rates are found to be a function of the signal-to-noise ratio of the descriptor set, and relatively independent of the dataset. Dropout ANNs with 2D and 3D autocorrelation descriptors outperform conventional ANNs as well as optimized fingerprint similarity search methods.

  4. Improving Quantitative Structure-Activity Relationship Models using Artificial Neural Networks Trained with Dropout

    PubMed Central

    Mendenhall, Jeffrey; Meiler, Jens

    2016-01-01

    Dropout is an Artificial Neural Network (ANN) training technique that has been shown to improve ANN performance across canonical machine learning (ML) datasets. Quantitative Structure Activity Relationship (QSAR) datasets used to relate chemical structure to biological activity in Ligand-Based Computer-Aided Drug Discovery (LB-CADD) pose unique challenges for ML techniques, such as heavily biased dataset composition, and relatively large number of descriptors relative to the number of actives. To test the hypothesis that dropout also improves QSAR ANNs, we conduct a benchmark on nine large QSAR datasets. Use of dropout improved both Enrichment false positive rate (FPR) and log-scaled area under the receiver-operating characteristic curve (logAUC) by 22–46% over conventional ANN implementations. Optimal dropout rates are found to be a function of the signal-to-noise ratio of the descriptor set, and relatively independent of the dataset. Dropout ANNs with 2D and 3D autocorrelation descriptors outperform conventional ANNs as well as optimized fingerprint similarity search methods. PMID:26830599

  5. Local gradient Gabor pattern (LGGP) with applications in face recognition, cross-spectral matching, and soft biometrics

    NASA Astrophysics Data System (ADS)

    Chen, Cunjian; Ross, Arun

    2013-05-01

    Researchers in face recognition have been using Gabor filters for image representation due to their robustness to complex variations in expression and illumination. Numerous methods have been proposed to model the output of filter responses by employing either local or global descriptors. In this work, we propose a novel but simple approach for encoding Gradient information on Gabor-transformed images to represent the face, which can be used for identity, gender and ethnicity assessment. Extensive experiments on the standard face benchmark FERET (Visible versus Visible), as well as the heterogeneous face dataset HFB (Near-infrared versus Visible), suggest that the matching performance due to the proposed descriptor is comparable against state-of-the-art descriptor-based approaches in face recognition applications. Furthermore, the same feature set is used in the framework of a Collaborative Representation Classification (CRC) scheme for deducing soft biometric traits such as gender and ethnicity from face images in the AR, Morph and CAS-PEAL databases.

  6. RANZCR Body Systems Framework of diagnostic imaging examination descriptors.

    PubMed

    Pitman, Alexander G; Penlington, Lisa; Doromal, Darren; Slater, Gregory; Vukolova, Natalia

    2014-08-01

    A unified and logical system of descriptors for diagnostic imaging examinations and procedures is a desirable resource for radiology in Australia and New Zealand and is needed to support core activities of RANZCR. Existing descriptor systems available in Australia and New Zealand (including the Medicare DIST and the ACC Schedule) have significant limitations and are inappropriate for broader clinical application. An anatomically based grid was constructed, with anatomical structures arranged in rows and diagnostic imaging modalities arranged in columns (including nuclear medicine and positron emission tomography). The grid was segregated into five body systems. The cells at the intersection of an anatomical structure row and an imaging modality column were populated with short, formulaic descriptors of the applicable diagnostic imaging examinations. Clinically illogical or physically impossible combinations were 'greyed out'. Where the same examination applied to different anatomical structures, the descriptor was kept identical for the purposes of streamlining. The resulting Body Systems Framework of diagnostic imaging examination descriptors lists all the reasonably common diagnostic imaging examinations currently performed in Australia and New Zealand using a unified grid structure allowing navigation by both referrers and radiologists. The Framework has been placed on the RANZCR website and is available for access free of charge by registered users. The Body Systems Framework of diagnostic imaging examination descriptors is a system of descriptors based on relationships between anatomical structures and imaging modalities. The Framework is now available as a resource and reference point for the radiology profession and to support core College activities. © 2014 The Royal Australian and New Zealand College of Radiologists.

  7. Impact of low alcohol verbal descriptors on perceived strength: An experimental study.

    PubMed

    Vasiljevic, Milica; Couturier, Dominique-Laurent; Marteau, Theresa M

    2018-02-01

    Low alcohol labels are a set of labels that carry descriptors such as 'low' or 'lighter' to denote alcohol content in beverages. There is growing interest from policymakers and producers in lower strength alcohol products. However, there is a lack of evidence on how the general population perceives verbal descriptors of strength. The present research examines consumers' perceptions of strength (% ABV) and appeal of alcohol products using low or high alcohol verbal descriptors. A within-subjects experimental study in which participants rated the strength and appeal of 18 terms denoting low (nine terms), high (eight terms) and regular (one term) strengths for either (1) wine or (2) beer according to drinking preference. Thousand six hundred adults (796 wine and 804 beer drinkers) sampled from a nationally representative UK panel. Low, Lower, Light, Lighter, and Reduced formed a cluster and were rated as denoting lower strength products than Regular, but higher strength than the cluster with intensifiers consisting of Extra Low, Super Low, Extra Light, and Super Light. Similar clustering in perceived strength was observed amongst the high verbal descriptors. Regular was the most appealing strength descriptor, with the low and high verbal descriptors using intensifiers rated least appealing. The perceived strength and appeal of alcohol products diminished the more the verbal descriptors implied a deviation from Regular. The implications of these findings are discussed in terms of policy implications for lower strength alcohol labelling and associated public health outcomes. Statement of contribution What is already known about this subject? Current UK and EU legislation limits the number of low strength verbal descriptors and the associated alcohol by volume (ABV) to 1.2% ABV and lower. There is growing interest from policymakers and producers to extend the range of lower strength alcohol products above the current cap of 1.2% ABV set out in national legislation. There is a lack of evidence on how the general population perceives verbal descriptors of alcohol product strength (both low and high). What does this study add? Verbal descriptors of lower strength wine and beer form two clusters and effectively communicate reduced alcohol content. Low, Lower, Light, Lighter, and Reduced were considered lower in strength than Regular (average % ABV). Descriptors using intensifiers (Extra Low, Super Low, Extra Light, and Super Light) were considered lowest in strength. Similar clustering in perceived strength was observed amongst the high verbal descriptors. The appeal of alcohol products reduced the more the verbal descriptors implied a deviation from Regular. © 2017 The Authors. British Journal of Health Psychology published by John Wiley & Sons Ltd on behalf of British Psychological Society.

  8. Character context: a shape descriptor for Arabic handwriting recognition

    NASA Astrophysics Data System (ADS)

    Mudhsh, Mohammed; Almodfer, Rolla; Duan, Pengfei; Xiong, Shengwu

    2017-11-01

    In the handwriting recognition field, designing good descriptors are substantial to obtain rich information of the data. However, the handwriting recognition research of a good descriptor is still an open issue due to unlimited variation in human handwriting. We introduce a "character context descriptor" that efficiently dealt with the structural characteristics of Arabic handwritten characters. First, the character image is smoothed and normalized, then the character context descriptor of 32 feature bins is built based on the proposed "distance function." Finally, a multilayer perceptron with regularization is used as a classifier. On experimentation with a handwritten Arabic characters database, the proposed method achieved a state-of-the-art performance with recognition rate equal to 98.93% and 99.06% for the 66 and 24 classes, respectively.

  9. A novel binary shape context for 3D local surface description

    NASA Astrophysics Data System (ADS)

    Dong, Zhen; Yang, Bisheng; Liu, Yuan; Liang, Fuxun; Li, Bijun; Zang, Yufu

    2017-08-01

    3D local surface description is now at the core of many computer vision technologies, such as 3D object recognition, intelligent driving, and 3D model reconstruction. However, most of the existing 3D feature descriptors still suffer from low descriptiveness, weak robustness, and inefficiency in both time and memory. To overcome these challenges, this paper presents a robust and descriptive 3D Binary Shape Context (BSC) descriptor with high efficiency in both time and memory. First, a novel BSC descriptor is generated for 3D local surface description, and the performance of the BSC descriptor under different settings of its parameters is analyzed. Next, the descriptiveness, robustness, and efficiency in both time and memory of the BSC descriptor are evaluated and compared to those of several state-of-the-art 3D feature descriptors. Finally, the performance of the BSC descriptor for 3D object recognition is also evaluated on a number of popular benchmark datasets, and an urban-scene dataset is collected by a terrestrial laser scanner system. Comprehensive experiments demonstrate that the proposed BSC descriptor obtained high descriptiveness, strong robustness, and high efficiency in both time and memory and achieved high recognition rates of 94.8%, 94.1% and 82.1% on the considered UWA, Queen, and WHU datasets, respectively.

  10. Development of TLSER model and QSAR model for predicting partition coefficients of hydrophobic organic chemicals between low density polyethylene film and water.

    PubMed

    Liu, Huihui; Wei, Mengbi; Yang, Xianhai; Yin, Cen; He, Xiao

    2017-01-01

    Partition coefficients are vital parameters for measuring accurately the chemicals concentrations by passive sampling devices. Given the wide use of low density polyethylene (LDPE) film in passive sampling, we developed a theoretical linear solvation energy relationship (TLSER) model and a quantitative structure-activity relationship (QSAR) model for the prediction of the partition coefficient of chemicals between LDPE and water (K pew ). For chemicals with the octanol-water partition coefficient (log K ow ) <8, a TLSER model with V x (McGowan volume) and qA - (the most negative charge on O, N, S, X atoms) as descriptors was developed, but the model had relatively low determination coefficient (R 2 ) and cross-validated coefficient (Q 2 ). In order to further explore the theoretical mechanisms involved in the partition process, a QSAR model with four descriptors (MLOGP (Moriguchi octanol-water partition coeff.), P_VSA_s_3 (P_VSA-like on I-state, bin 3), Hy (hydrophilic factor) and NssO (number of atoms of type ssO)) was established, and statistical analysis indicated that the model had satisfactory goodness-of-fit, robustness and predictive ability. For chemicals with log K OW >8, a TLSER model with V x and a QSAR model with MLOGP as descriptor were developed. This is the first paper to explore the models for highly hydrophobic chemicals. The applicability domain of the models, characterized by the Euclidean distance-based method and Williams plot, covered a large number of structurally diverse chemicals, which included nearly all the common hydrophobic organic compounds. Additionally, through mechanism interpretation, we explored the structural features those governing the partition behavior of chemicals between LDPE and water. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. The application of feature selection to the development of Gaussian process models for percutaneous absorption.

    PubMed

    Lam, Lun Tak; Sun, Yi; Davey, Neil; Adams, Rod; Prapopoulou, Maria; Brown, Marc B; Moss, Gary P

    2010-06-01

    The aim was to employ Gaussian processes to assess mathematically the nature of a skin permeability dataset and to employ these methods, particularly feature selection, to determine the key physicochemical descriptors which exert the most significant influence on percutaneous absorption, and to compare such models with established existing models. Gaussian processes, including automatic relevance detection (GPRARD) methods, were employed to develop models of percutaneous absorption that identified key physicochemical descriptors of percutaneous absorption. Using MatLab software, the statistical performance of these models was compared with single linear networks (SLN) and quantitative structure-permeability relationships (QSPRs). Feature selection methods were used to examine in more detail the physicochemical parameters used in this study. A range of statistical measures to determine model quality were used. The inherently nonlinear nature of the skin data set was confirmed. The Gaussian process regression (GPR) methods yielded predictive models that offered statistically significant improvements over SLN and QSPR models with regard to predictivity (where the rank order was: GPR > SLN > QSPR). Feature selection analysis determined that the best GPR models were those that contained log P, melting point and the number of hydrogen bond donor groups as significant descriptors. Further statistical analysis also found that great synergy existed between certain parameters. It suggested that a number of the descriptors employed were effectively interchangeable, thus questioning the use of models where discrete variables are output, usually in the form of an equation. The use of a nonlinear GPR method produced models with significantly improved predictivity, compared with SLN or QSPR models. Feature selection methods were able to provide important mechanistic information. However, it was also shown that significant synergy existed between certain parameters, and as such it was possible to interchange certain descriptors (i.e. molecular weight and melting point) without incurring a loss of model quality. Such synergy suggested that a model constructed from discrete terms in an equation may not be the most appropriate way of representing mechanistic understandings of skin absorption.

  12. Electron-density descriptors as predictors in quantitative structure--activity/property relationships and drug design.

    PubMed

    Matta, Chérif F; Arabi, Alya A

    2011-06-01

    The use of electron density-based molecular descriptors in drug research, particularly in quantitative structure--activity relationships/quantitative structure--property relationships studies, is reviewed. The exposition starts by a discussion of molecular similarity and transferability in terms of the underlying electron density, which leads to a qualitative introduction to the quantum theory of atoms in molecules (QTAIM). The starting point of QTAIM is the topological analysis of the molecular electron-density distributions to extract atomic and bond properties that characterize every atom and bond in the molecule. These atomic and bond properties have considerable potential as bases for the construction of robust quantitative structure--activity/property relationships models as shown by selected examples in this review. QTAIM is applicable to the electron density calculated from quantum-chemical calculations and/or that obtained from ultra-high resolution x-ray diffraction experiments followed by nonspherical refinement. Atomic and bond properties are introduced followed by examples of application of each of these two families of descriptors. The review ends with a study whereby the molecular electrostatic potential, uniquely determined by the density, is used in conjunction with atomic properties to elucidate the reasons for the biological similarity of bioisosteres.

  13. Nanodosimetry-Based Plan Optimization for Particle Therapy

    PubMed Central

    Schulte, Reinhard W.

    2015-01-01

    Treatment planning for particle therapy is currently an active field of research due uncertainty in how to modify physical dose in order to create a uniform biological dose response in the target. A novel treatment plan optimization strategy based on measurable nanodosimetric quantities rather than biophysical models is proposed in this work. Simplified proton and carbon treatment plans were simulated in a water phantom to investigate the optimization feasibility. Track structures of the mixed radiation field produced at different depths in the target volume were simulated with Geant4-DNA and nanodosimetric descriptors were calculated. The fluences of the treatment field pencil beams were optimized in order to create a mixed field with equal nanodosimetric descriptors at each of the multiple positions in spread-out particle Bragg peaks. For both proton and carbon ion plans, a uniform spatial distribution of nanodosimetric descriptors could be obtained by optimizing opposing-field but not single-field plans. The results obtained indicate that uniform nanodosimetrically weighted plans, which may also be radiobiologically uniform, can be obtained with this approach. Future investigations need to demonstrate that this approach is also feasible for more complicated beam arrangements and that it leads to biologically uniform response in tumor cells and tissues. PMID:26167202

  14. Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure

    EPA Science Inventory

    Background: The U.S. EPA ToxCastTM program is screening thousands of environmental chemicals for bioactivity using hundreds of high-throughput in vitro assays to build predictive models of toxicity. We represented chemicals based on bioactivity and chemical structure descriptors ...

  15. Revealing chemophoric sites in organophosphorus insecticides through the MIA-QSPR modeling of soil sorption data.

    PubMed

    Daré, Joyce K; Silva, Cristina F; Freitas, Matheus P

    2017-10-01

    Soil sorption of insecticides employed in agriculture is an important parameter to probe the environmental fate of organic chemicals. Therefore, methods for the prediction of soil sorption of new agrochemical candidates, as well as for the rationalization of the molecular characteristics responsible for a given sorption profile, are extremely beneficial for the environment. A quantitative structure-property relationship method based on chemical structure images as molecular descriptors provided a reliable model for the soil sorption prediction of 24 widely used organophosphorus insecticides. By means of contour maps obtained from the partial least squares regression coefficients and the variable importance in projection scores, key molecular moieties were targeted for possible structural modification, in order to obtain novel and more environmentally friendly insecticide candidates. The image-based descriptors applied encode molecular arrangement, atoms connectivity, groups size, and polarity; consequently, the findings in this work cannot be achieved by a simple relationship with hydrophobicity, usually described by the octanol-water partition coefficient. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Role of physicochemical properties in the activation of peroxisome proliferator-activated receptor δ.

    PubMed

    Maltarollo, Vinícius G; Homem-de-Mello, Paula; Honorio, Káthia M

    2011-10-01

    Current researches on treatments for metabolic diseases involve a class of biological receptors called peroxisome proliferator-activated receptors (PPARs), which control the metabolism of carbohydrates and lipids. A subclass of these receptors, PPARδ, regulates several metabolic processes, and the substances that activate them are being studied as new drug candidates for the treatment of diabetes mellitus and metabolic syndrome. In this study, several PPARδ agonists with experimental biological activity were selected for a structural and chemical study. Electronic, stereochemical, lipophilic and topological descriptors were calculated for the selected compounds using various theoretical methods, such as density functional theory (DFT). Fisher's weight and principal components analysis (PCA) methods were employed to select the most relevant variables for this study. The partial least squares (PLS) method was used to construct the multivariate statistical model, and the best model obtained had 4 PCs, q ( 2 ) = 0.80 and r ( 2 ) = 0.90, indicating a good internal consistency. The prediction residues calculated for the compounds in the test set had low values, indicating the good predictive capability of our PLS model. The model obtained in this study is reliable and can be used to predict the biological activity of new untested compounds. Docking studies have also confirmed the importance of the molecular descriptors selected for this system.

  17. PyGlobal: A toolkit for automated compilation of DFT-based descriptors.

    PubMed

    Nath, Shilpa R; Kurup, Sudheer S; Joshi, Kaustubh A

    2016-06-15

    Density Functional Theory (DFT)-based Global reactivity descriptor calculations have emerged as powerful tools for studying the reactivity, selectivity, and stability of chemical and biological systems. A Python-based module, PyGlobal has been developed for systematically parsing a typical Gaussian outfile and extracting the relevant energies of the HOMO and LUMO. Corresponding global reactivity descriptors are further calculated and the data is saved into a spreadsheet compatible with applications like Microsoft Excel and LibreOffice. The efficiency of the module has been accounted by measuring the time interval for randomly selected Gaussian outfiles for 1000 molecules. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  18. Fourier descriptor analysis and unification of voice range profile contours: method and applications.

    PubMed

    Pabon, Peter; Ternström, Sten; Lamarche, Anick

    2011-06-01

    To describe a method for unified description, statistical modeling, and comparison of voice range profile (VRP) contours, even from diverse sources. A morphologic modeling technique, which is based on Fourier descriptors (FDs), is applied to the VRP contour. The technique, which essentially involves resampling of the curve of the contour, is assessed and also is compared to density-based VRP averaging methods that use the overlap count. VRP contours can be usefully described and compared using FDs. The method also permits the visualization of the local covariation along the contour average. For example, the FD-based analysis shows that the population variance for ensembles of VRP contours is usually smallest at the upper left part of the VRP. To illustrate the method's advantages and possible further application, graphs are given that compare the averaged contours from different authors and recording devices--for normal, trained, and untrained male and female voices as well as for child voices. The proposed technique allows any VRP shape to be brought to the same uniform base. On this uniform base, VRP contours or contour elements coming from a variety of sources may be placed within the same graph for comparison and for statistical analysis.

  19. Predictive Modeling of Human Perception Subjectivity: Feasibility Study of Mammographic Lesion Similarity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Songhua; Tourassi, Georgia

    2012-01-01

    The majority of clinical content-based image retrieval (CBIR) studies disregard human perception subjectivity, aiming to duplicate the consensus expert assessment of the visual similarity on example cases. The purpose of our study is twofold: (i) discern better the extent of human perception subjectivity when assessing the visual similarity of two images with similar semantic content, and (ii) explore the feasibility of personalized predictive modeling of visual similarity. We conducted a human observer study in which five observers of various expertise were shown ninety-nine triplets of mammographic masses with similar BI-RADS descriptors and were asked to select the two masses withmore » the highest visual relevance. Pairwise agreement ranged between poor and fair among the five observers, as assessed by the kappa statistic. The observers' self-consistency rate was remarkably low, based on repeated questions where either the orientation or the presentation order of a mass was changed. Various machine learning algorithms were explored to determine whether they can predict each observer's personalized selection using textural features. Many algorithms performed with accuracy that exceeded each observer's self-consistency rate, as determined using a cross-validation scheme. This accuracy was statistically significantly higher than would be expected by chance alone (two-tailed p-value ranged between 0.001 and 0.01 for all five personalized models). The study confirmed that human perception subjectivity should be taken into account when developing CBIR-based medical applications.« less

  20. Automated detection of microaneurysms using robust blob descriptors

    NASA Astrophysics Data System (ADS)

    Adal, K.; Ali, S.; Sidibé, D.; Karnowski, T.; Chaum, E.; Mériaudeau, F.

    2013-03-01

    Microaneurysms (MAs) are among the first signs of diabetic retinopathy (DR) that can be seen as round dark-red structures in digital color fundus photographs of retina. In recent years, automated computer-aided detection and diagnosis (CAD) of MAs has attracted many researchers due to its low-cost and versatile nature. In this paper, the MA detection problem is modeled as finding interest points from a given image and several interest point descriptors are introduced and integrated with machine learning techniques to detect MAs. The proposed approach starts by applying a novel fundus image contrast enhancement technique using Singular Value Decomposition (SVD) of fundus images. Then, Hessian-based candidate selection algorithm is applied to extract image regions which are more likely to be MAs. For each candidate region, robust low-level blob descriptors such as Speeded Up Robust Features (SURF) and Intensity Normalized Radon Transform are extracted to characterize candidate MA regions. The combined features are then classified using SVM which has been trained using ten manually annotated training images. The performance of the overall system is evaluated on Retinopathy Online Challenge (ROC) competition database. Preliminary results show the competitiveness of the proposed candidate selection techniques against state-of-the art methods as well as the promising future for the proposed descriptors to be used in the localization of MAs from fundus images.

  1. Predicting Cell Association of Surface-Modified Nanoparticles Using Protein Corona Structure - Activity Relationships (PCSAR).

    PubMed

    Kamath, Padmaja; Fernandez, Alberto; Giralt, Francesc; Rallo, Robert

    2015-01-01

    Nanoparticles are likely to interact in real-case application scenarios with mixtures of proteins and biomolecules that will absorb onto their surface forming the so-called protein corona. Information related to the composition of the protein corona and net cell association was collected from literature for a library of surface-modified gold and silver nanoparticles. For each protein in the corona, sequence information was extracted and used to calculate physicochemical properties and statistical descriptors. Data cleaning and preprocessing techniques including statistical analysis and feature selection methods were applied to remove highly correlated, redundant and non-significant features. A weighting technique was applied to construct specific signatures that represent the corona composition for each nanoparticle. Using this basic set of protein descriptors, a new Protein Corona Structure-Activity Relationship (PCSAR) that relates net cell association with the physicochemical descriptors of the proteins that form the corona was developed and validated. The features that resulted from the feature selection were in line with already published literature, and the computational model constructed on these features had a good accuracy (R(2)LOO=0.76 and R(2)LMO(25%)=0.72) and stability, with the advantage that the fingerprints based on physicochemical descriptors were independent of the specific proteins that form the corona.

  2. Discriminatively learning for representing local image features with quadruplet model

    NASA Astrophysics Data System (ADS)

    Zhang, Da-long; Zhao, Lei; Xu, Duan-qing; Lu, Dong-ming

    2017-11-01

    Traditional hand-crafted features for representing local image patches are evolving into current data-driven and learning-based image feature, but learning a robust and discriminative descriptor which is capable of controlling various patch-level computer vision tasks is still an open problem. In this work, we propose a novel deep convolutional neural network (CNN) to learn local feature descriptors. We utilize the quadruplets with positive and negative training samples, together with a constraint to restrict the intra-class variance, to learn good discriminative CNN representations. Compared with previous works, our model reduces the overlap in feature space between corresponding and non-corresponding patch pairs, and mitigates margin varying problem caused by commonly used triplet loss. We demonstrate that our method achieves better embedding result than some latest works, like PN-Net and TN-TG, on benchmark dataset.

  3. Prediction of atmospheric degradation data for POPs by gene expression programming.

    PubMed

    Luan, F; Si, H Z; Liu, H T; Wen, Y Y; Zhang, X Y

    2008-01-01

    Quantitative structure-activity relationship models for the prediction of the mean and the maximum atmospheric degradation half-life values of persistent organic pollutants were developed based on the linear heuristic method (HM) and non-linear gene expression programming (GEP). Molecular descriptors, calculated from the structures alone, were used to represent the characteristics of the compounds. HM was used both to pre-select the whole descriptor sets and to build the linear model. GEP yielded satisfactory prediction results: the square of the correlation coefficient r(2) was 0.80 and 0.81 for the mean and maximum half-life values of the test set, and the root mean square errors were 0.448 and 0.426, respectively. The results of this work indicate that the GEP is a very promising tool for non-linear approximations.

  4. Bulk-surface relationship of an electronic structure for high-throughput screening of metal oxide catalysts

    NASA Astrophysics Data System (ADS)

    Kweun, Joshua Minwoo; Li, Chenzhe; Zheng, Yongping; Cho, Maenghyo; Kim, Yoon Young; Cho, Kyeongjae

    2016-05-01

    Designing metal-oxides consisting of earth-abundant elements has been a crucial issue to replace precious metal catalysts. To achieve efficient screening of metal-oxide catalysts via bulk descriptors rather than surface descriptors, we investigated the relationship between the electronic structure of bulk and that of the surface for lanthanum-based perovskite oxides, LaMO3 (M = Ti, V, Cr, Mn, Fe, Co, Ni, Cu). Through density functional theory calculations, we examined the d-band occupancy of the bulk and surface transition-metal atoms (nBulk and nSurf) and the adsorption energy of an oxygen atom (Eads) on (001), (110), and (111) surfaces. For the (001) surface, we observed strong correlation between the nBulk and nSurf with an R-squared value over 94%, and the result was interpreted in terms of ligand field splitting and antibonding/bonding level splitting. Moreover, the Eads on the surfaces was highly correlated with the nBulk with an R-squared value of more than 94%, and different surface relaxations could be explained by the bulk electronic structure (e.g., LaMnO3 vs. LaTiO3). These results suggest that a bulk-derived descriptor such as nBulk can be used to screen metal-oxide catalysts.

  5. Issues and solutions for storage, retrieval, and searching of MPEG-7 documents

    NASA Astrophysics Data System (ADS)

    Chang, Yuan-Chi; Lo, Ming-Ling; Smith, John R.

    2000-10-01

    The ongoing MPEG-7 standardization activity aims at creating a standard for describing multimedia content in order to facilitate the interpretation of the associated information content. Attempting to address a broad range of applications, MPEG-7 has defined a flexible framework consisting of Descriptors, Description Schemes, and Description Definition Language. Descriptors and Description Schemes describe features, structure and semantics of multimedia objects. They are written in the Description Definition Language (DDL). In the most recent revision, DDL applies XML (Extensible Markup Language) Schema with MPEG-7 extensions. DDL has constructs that support inclusion, inheritance, reference, enumeration, choice, sequence, and abstract type of Description Schemes and Descriptors. In order to enable multimedia systems to use MPEG-7, a number of important problems in storing, retrieving and searching MPEG-7 documents need to be solved. This paper reports on initial finding on issues and solutions of storing and accessing MPEG-7 documents. In particular, we discuss the benefits of using a virtual document management framework based on XML Access Server (XAS) in order to bridge the MPEG-7 multimedia applications and database systems. The need arises partly because MPEG-7 descriptions need customized storage schema, indexing and search engines. We also discuss issues arising in managing dependence and cross-description scheme search.

  6. Structural similarity based kriging for quantitative structure activity and property relationship modeling.

    PubMed

    Teixeira, Ana L; Falcao, Andre O

    2014-07-28

    Structurally similar molecules tend to have similar properties, i.e. closer molecules in the molecular space are more likely to yield similar property values while distant molecules are more likely to yield different values. Based on this principle, we propose the use of a new method that takes into account the high dimensionality of the molecular space, predicting chemical, physical, or biological properties based on the most similar compounds with measured properties. This methodology uses ordinary kriging coupled with three different molecular similarity approaches (based on molecular descriptors, fingerprints, and atom matching) which creates an interpolation map over the molecular space that is capable of predicting properties/activities for diverse chemical data sets. The proposed method was tested in two data sets of diverse chemical compounds collected from the literature and preprocessed. One of the data sets contained dihydrofolate reductase inhibition activity data, and the second molecules for which aqueous solubility was known. The overall predictive results using kriging for both data sets comply with the results obtained in the literature using typical QSPR/QSAR approaches. However, the procedure did not involve any type of descriptor selection or even minimal information about each problem, suggesting that this approach is directly applicable to a large spectrum of problems in QSAR/QSPR. Furthermore, the predictive results improve significantly with the similarity threshold between the training and testing compounds, allowing the definition of a confidence threshold of similarity and error estimation for each case inferred. The use of kriging for interpolation over the molecular metric space is independent of the training data set size, and no reparametrizations are necessary when more compounds are added or removed from the set, and increasing the size of the database will consequentially improve the quality of the estimations. Finally it is shown that this model can be used for checking the consistency of measured data and for guiding an extension of the training set by determining the regions of the molecular space for which new experimental measurements could be used to maximize the model's predictive performance.

  7. Stargate GTM: Bridging Descriptor and Activity Spaces.

    PubMed

    Gaspar, Héléna A; Baskin, Igor I; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre

    2015-11-23

    Predicting the activity profile of a molecule or discovering structures possessing a specific activity profile are two important goals in chemoinformatics, which could be achieved by bridging activity and molecular descriptor spaces. In this paper, we introduce the "Stargate" version of the Generative Topographic Mapping approach (S-GTM) in which two different multidimensional spaces (e.g., structural descriptor space and activity space) are linked through a common 2D latent space. In the S-GTM algorithm, the manifolds are trained simultaneously in two initial spaces using the probabilities in the 2D latent space calculated as a weighted geometric mean of probability distributions in both spaces. S-GTM has the following interesting features: (1) activities are involved during the training procedure; therefore, the method is supervised, unlike conventional GTM; (2) using molecular descriptors of a given compound as input, the model predicts a whole activity profile, and (3) using an activity profile as input, areas populated by relevant chemical structures can be detected. To assess the performance of S-GTM prediction models, a descriptor space (ISIDA descriptors) of a set of 1325 GPCR ligands was related to a B-dimensional (B = 1 or 8) activity space corresponding to pKi values for eight different targets. S-GTM outperforms conventional GTM for individual activities and performs similarly to the Lasso multitask learning algorithm, although it is still slightly less accurate than the Random Forest method.

  8. Using Geometry-Based Metrics as Part of Fitness-for-Purpose Evaluations of 3D City Models

    NASA Astrophysics Data System (ADS)

    Wong, K.; Ellul, C.

    2016-10-01

    Three-dimensional geospatial information is being increasingly used in a range of tasks beyond visualisation. 3D datasets, however, are often being produced without exact specifications and at mixed levels of geometric complexity. This leads to variations within the models' geometric and semantic complexity as well as the degree of deviation from the corresponding real world objects. Existing descriptors and measures of 3D data such as CityGML's level of detail are perhaps only partially sufficient in communicating data quality and fitness-for-purpose. This study investigates whether alternative, automated, geometry-based metrics describing the variation of complexity within 3D datasets could provide additional relevant information as part of a process of fitness-for-purpose evaluation. The metrics include: mean vertex/edge/face counts per building; vertex/face ratio; minimum 2D footprint area and; minimum feature length. Each metric was tested on six 3D city models from international locations. The results show that geometry-based metrics can provide additional information on 3D city models as part of fitness-for-purpose evaluations. The metrics, while they cannot be used in isolation, may provide a complement to enhance existing data descriptors if backed up with local knowledge, where possible.

  9. Predicting DPP-IV inhibitors with machine learning approaches

    NASA Astrophysics Data System (ADS)

    Cai, Jie; Li, Chanjuan; Liu, Zhihong; Du, Jiewen; Ye, Jiming; Gu, Qiong; Xu, Jun

    2017-04-01

    Dipeptidyl peptidase IV (DPP-IV) is a promising Type 2 diabetes mellitus (T2DM) drug target. DPP-IV inhibitors prolong the action of glucagon-like peptide-1 (GLP-1) and gastric inhibitory peptide (GIP), improve glucose homeostasis without weight gain, edema, and hypoglycemia. However, the marketed DPP-IV inhibitors have adverse effects such as nasopharyngitis, headache, nausea, hypersensitivity, skin reactions and pancreatitis. Therefore, it is still expected for novel DPP-IV inhibitors with minimal adverse effects. The scaffolds of existing DPP-IV inhibitors are structurally diversified. This makes it difficult to build virtual screening models based upon the known DPP-IV inhibitor libraries using conventional QSAR approaches. In this paper, we report a new strategy to predict DPP-IV inhibitors with machine learning approaches involving naïve Bayesian (NB) and recursive partitioning (RP) methods. We built 247 machine learning models based on 1307 known DPP-IV inhibitors with optimized molecular properties and topological fingerprints as descriptors. The overall predictive accuracies of the optimized models were greater than 80%. An external test set, composed of 65 recently reported compounds, was employed to validate the optimized models. The results demonstrated that both NB and RP models have a good predictive ability based on different combinations of descriptors. Twenty "good" and twenty "bad" structural fragments for DPP-IV inhibitors can also be derived from these models for inspiring the new DPP-IV inhibitor scaffold design.

  10. Quantitative property-structural relation modeling on polymeric dielectric materials

    NASA Astrophysics Data System (ADS)

    Wu, Ke

    Nowadays, polymeric materials have attracted more and more attention in dielectric applications. But searching for a material with desired properties is still largely based on trial and error. To facilitate the development of new polymeric materials, heuristic models built using the Quantitative Structure Property Relationships (QSPR) techniques can provide reliable "working solutions". In this thesis, the application of QSPR on polymeric materials is studied from two angles: descriptors and algorithms. A novel set of descriptors, called infinite chain descriptors (ICD), are developed to encode the chemical features of pure polymers. ICD is designed to eliminate the uncertainty of polymer conformations and inconsistency of molecular representation of polymers. Models for the dielectric constant, band gap, dielectric loss tangent and glass transition temperatures of organic polymers are built with high prediction accuracy. Two new algorithms, the physics-enlightened learning method (PELM) and multi-mechanism detection, are designed to deal with two typical challenges in material QSPR. PELM is a meta-algorithm that utilizes the classic physical theory as guidance to construct the candidate learning function. It shows better out-of-domain prediction accuracy compared to the classic machine learning algorithm (support vector machine). Multi-mechanism detection is built based on a cluster-weighted mixing model similar to a Gaussian mixture model. The idea is to separate the data into subsets where each subset can be modeled by a much simpler model. The case study on glass transition temperature shows that this method can provide better overall prediction accuracy even though less data is available for each subset model. In addition, the techniques developed in this work are also applied to polymer nanocomposites (PNC). PNC are new materials with outstanding dielectric properties. As a key factor in determining the dispersion state of nanoparticles in the polymer matrix, the surface tension components of polymers are modeled using ICD. Compared to the 3D surface descriptors used in a previous study, the model with ICD has a much improved prediction accuracy and stability particularly for the polar component. In predicting the enhancement effect of grafting functional groups on the breakdown strength of PNC, a simple local charge transfer model is proposed where the electron affinity (EA) and ionization energy (IE) determines the main charge trap depth in the system. This physical model is supported by first principle computation. QSPR models for EA and IE are also built, decreasing the computation time of EA and IE for a single molecule from several hours to less than one second. Furthermore, the designs of two web-based tools are introduced. The tools represent two commonly used applications for QSPR studies: data inquiry and prediction. Making models and data public available and easy to use is particularly crucial for QSPR research. The web tools described in this work should provide a good guidance and starting point for the further development of information tools enabling more efficient cooperation between computational and experimental communities.

  11. Identification of molecular descriptors for design of novel Isoalloxazine derivatives as potential Acetylcholinesterase inhibitors against Alzheimer's disease.

    PubMed

    Gurung, Arun Bahadur; Aguan, Kripamoy; Mitra, Sivaprasad; Bhattacharjee, Atanu

    2017-06-01

    In Alzheimer's disease (AD), the level of Acetylcholine (ACh) neurotransmitter is reduced. Since Acetylcholinesterase (AChE) cleaves ACh, inhibitors of AChE are very much sought after for AD treatment. The side effects of current inhibitors necessitate development of newer AChE inhibitors. Isoalloxazine derivatives have proved to be promising (AChE) inhibitors. However, their structure-activity relationship studies have not been reported till date. In the present work, various quantitative structure-activity relationship (QSAR) building methods such as multiple linear regression (MLR), partial least squares ,and principal component regression were employed to derive 3D-QSAR models using steric and electrostatic field descriptors. Statistically significant model was obtained using MLR coupled with stepwise selection method having r 2  = .9405, cross validated r 2 (q 2 ) = .6683, and a high predictability (pred_r 2  = .6206 and standard error, pred_r 2 se = .2491). Steric and electrostatic contribution plot revealed three electrostatic fields E_496, E_386 and E_577 and one steric field S_60 contributing towards biological activity. A ligand-based 3D-pharmacophore model was generated consisting of eight pharmacophore features. Isoalloxazine derivatives were docked against human AChE, which revealed critical residues implicated in hydrogen bonds as well as hydrophobic interactions. The binding modes of docked complexes (AChE_IA1 and AChE_IA14) were validated by molecular dynamics simulation which showed their stable trajectories in terms of root mean square deviation and molecular mechanics/Poisson-Boltzmann surface area binding free energy analysis revealed key residues contributing significantly to overall binding energy. The present study may be useful in the design of more potent Isoalloxazine derivatives as AChE inhibitors.

  12. Learning physical descriptors for materials science by compressed sensing

    NASA Astrophysics Data System (ADS)

    Ghiringhelli, Luca M.; Vybiral, Jan; Ahmetcik, Emre; Ouyang, Runhai; Levchenko, Sergey V.; Draxl, Claudia; Scheffler, Matthias

    2017-02-01

    The availability of big data in materials science offers new routes for analyzing materials properties and functions and achieving scientific understanding. Finding structure in these data that is not directly visible by standard tools and exploitation of the scientific information requires new and dedicated methodology based on approaches from statistical learning, compressed sensing, and other recent methods from applied mathematics, computer science, statistics, signal processing, and information science. In this paper, we explain and demonstrate a compressed-sensing based methodology for feature selection, specifically for discovering physical descriptors, i.e., physical parameters that describe the material and its properties of interest, and associated equations that explicitly and quantitatively describe those relevant properties. As showcase application and proof of concept, we describe how to build a physical model for the quantitative prediction of the crystal structure of binary compound semiconductors.

  13. Experiments and improvements of ear recognition based on local texture descriptors

    NASA Astrophysics Data System (ADS)

    Benzaoui, Amir; Adjabi, Insaf; Boukrouche, Abdelhani

    2017-04-01

    The morphology of the human ear presents rich and stable information embedded on the curved 3-D surface and has as a result attracted considerable attention from forensic scientists and engineers as a biometric recognition modality. However, recognizing a person's identity from the morphology of the human ear in unconstrained environments, with insufficient and incomplete training data, strong person-specificity, and high within-range variance, can be very challenging. Following our previous work on ear recognition based on local texture descriptors, we propose to use anatomical and embryological information about the human ear in order to find the autonomous components and the locations where large interindividual variations can be detected. Embryology is particularly relevant to our approach as it provides information on the possible changes that can be observed in the external structure of the ear. We experimented with three publicly available databases, namely: IIT Delhi-1, IIT Delhi-2, and USTB-1, consisting of several ear benchmarks acquired under varying conditions and imaging qualities. The experiments show excellent results, beyond the state of the art.

  14. Prediction of anticancer property of bowsellic acid derivatives by quantitative structure activity relationship analysis and molecular docking study.

    PubMed

    Satpathy, Raghunath; Guru, R K; Behera, R; Nayak, B

    2015-01-01

    Boswellic acid consists of a series of pentacyclic triterpene molecules that are produced by the plant Boswellia serrata. The potential applications of Bowsellic acid for treatment of cancer have been focused here. To predict the property of the bowsellic acid derivatives as anticancer compounds by various computational approaches. In this work, all total 65 derivatives of bowsellic acids from the PubChem database were considered for the study. After energy minimization of the ligands various types of molecular descriptors were computed and corresponding two-dimensional quantitative structure activity relationship (QSAR) models were obtained by taking Andrews coefficient as the dependent variable. Different types of comparative analysis were used for QSAR study are multiple linear regression, partial least squares, support vector machines and artificial neural network. From the study geometrical descriptors shows the highest correlation coefficient, which indicates the binding factor of the compound. To evaluate the anticancer property molecular docking study of six selected ligands based on Andrews affinity were performed with nuclear factor-kappa protein kinase (Protein Data Bank ID 4G3D), which is an established therapeutic target for cancers. Along with QSAR study and docking result, it was predicted that bowsellic acid can also be treated as a potential anticancer compound. Along with QSAR study and docking result, it was predicted that bowsellic acid can also be treated as a potential anticancer compound.

  15. Compilation and physicochemical classification analysis of a diverse hERG inhibition database

    NASA Astrophysics Data System (ADS)

    Didziapetris, Remigijus; Lanevskij, Kiril

    2016-12-01

    A large and chemically diverse hERG inhibition data set comprised of 6690 compounds was constructed on the basis of ChEMBL bioactivity database and original publications dealing with experimental determination of hERG activities using patch-clamp and competitive displacement assays. The collected data were converted to binary format at 10 µM activity threshold and subjected to gradient boosting machine classification analysis using a minimal set of physicochemical and topological descriptors. The tested parameters involved lipophilicity (log P), ionization (p K a ), polar surface area, aromaticity, molecular size and flexibility. The employed approach allowed classifying the compounds with an overall 75-80 % accuracy, even though it only accounted for non-specific interactions between hERG and ligand molecules. The observed descriptor-response profiles were consistent with common knowledge about hERG ligand binding site, but also revealed several important quantitative trends, as well as slight inter-assay variability in hERG inhibition data. The results suggest that even weakly basic groups (p K a < 6) might substantially contribute to hERG inhibition potential, whereas the role of lipophilicity depends on the compound's ionization state, and the influence of log P decreases in the order of bases > zwitterions > neutrals > acids. Given its robust performance and clear physicochemical interpretation, the proposed model may provide valuable information to direct drug discovery efforts towards compounds with reduced risk of hERG-related cardiotoxicity.

  16. Selection of molecular descriptors with artificial intelligence for the understanding of HIV-1 protease peptidomimetic inhibitors-activity.

    PubMed

    Sirois, S; Tsoukas, C M; Chou, Kuo-Chen; Wei, Dongqing; Boucher, C; Hatzakis, G E

    2005-03-01

    Quantitative Structure Activity Relationship (QSAR) techniques are used routinely by computational chemists in drug discovery and development to analyze datasets of compounds. Quantitative numerical methods like Partial Least Squares (PLS) and Artificial Neural Networks (ANN) have been used on QSAR to establish correlations between molecular properties and bioactivity. However, ANN may be advantageous over PLS because it considers the interrelations of the modeled variables. This study focused on the HIV-1 Protease (HIV-1 Pr) inhibitors belonging to the peptidomimetic class of compounds. The main objective was to select molecular descriptors with the best predictive value for antiviral potency (Ki). PLS and ANN were used to predict Ki activity of HIV-1 Pr inhibitors and the results were compared. To address the issue of dimensionality reduction, Genetic Algorithms (GA) were used for variable selection and their performance was compared against that of ANN. Finally, the structure of the optimum ANN achieving the highest Pearson's-R coefficient was determined. On the basis of Pearson's-R, PLS and ANN were compared to determine which exhibits maximum performance. Training and validation of models was performed on 15 random split sets of the master dataset consisted of 231 compounds. For each compound 192 molecular descriptors were considered. The molecular structure and constant of inhibition (Ki) were selected from the NIAID database. Study findings suggested that non-covalent interactions such as hydrophobicity, shape and hydrogen bonding describe well the antiviral activity of the HIV-1 Pr compounds. The significance of lipophilicity and relationship to HIV-1 associated hyperlipidemia and lipodystrophy syndrome warrant further investigation.

  17. Using Theoretical Descriptions in Structure Activity Relations. 3. Electronic Descriptors

    DTIC Science & Technology

    1988-08-01

    Activity Relationships (QSAR) have been used successfully in the past to develop predictive equations for several biological and physical properties...Linear Free Energy Relationships (,FF.3) and is based on work by Hammet in which he derived electronic descriptors for the dissociation of substituted...structure of a compound and its activity in a system. Several different structural descriptors have been used in QSAR equations . These range from

  18. Predictive Models of Acute Mountain Sickness after Rapid Ascent to Various Altitudes

    DTIC Science & Technology

    2013-01-01

    unclassified relational mountain medicine database containing individ- ual ascent profiles, demographic and physiologic subject descriptors, and...course of AMS, and define the baseline demographics and physiologic descriptors that increase the risk of AMS. In addition, these models provide...substantiated this finding in un- acclimatized women (24). Other physiologic differences between men and women (i.e., differences in endothelial

  19. Building Scientific Confidence in the Development and ...

    EPA Pesticide Factsheets

    Read-across remains a popular data gap filling technique within category and analogue approaches for regulatory purposes. Acceptance of read-across is an ongoing challenge with several efforts underway for identifying and addressing uncertainties. Here we demonstrate an algorithmic approach to facilitate read-across using ToxCast in vitro bioactivity data in conjunction with chemical descriptor information to predict in vivo outcomes in guideline testing studies from ToxRefDB. Over 3400 different chemical structure descriptors were generated for a set of 976 chemicals and supplemented with the outcomes from 821 in vitro assays. The read-across prediction for a given chemical was based on the similarity weighted endpoint outcomes of its nearest neighbors calculated using in vitro bioactivity and chemical structure descriptors, called GenRA. GenRA is based on a computational approach for: (i) defining local validity domains using chemical and bioactivity descriptors, (ii) systematically deriving endpoint read-across predictions within these domains using similarity weighted activity of nearest neighbours, (iii) objectively evaluating predicted performance using tested chemicals, and (iv) assigning read-across predictions to untested chemicals along with estimates of uncertainty. We found in vitro bioactivity descriptors were often found to be more predictive of in vivo toxicity outcomes than chemical structure descriptors. We believe GenRA is an important first st

  20. Issues in assessing multi-institutional performance of BI-RADS-based CAD systems

    NASA Astrophysics Data System (ADS)

    Markey, Mia K.; Lo, Joseph Y.

    2005-04-01

    The purpose of this study was to investigate factors that impact the generalization of breast cancer computer-aided diagnosis (CAD) systems that utilize the Breast Imaging Reporting and Data System (BI-RADS). Data sets from four institutions were analyzed: Duke University Medical Center, University of Pennsylvania Medical Center, Massachusetts General Hospital, and Wake Forest University. The latter two data sets are subsets of the Digital Database for Screening Mammography. Each data set consisted of descriptions of mammographic lesions according to the BI-RADS lexicon, patient age, and pathology status (benign/malignant). Models were developed to predict pathology status from the BI-RADS descriptors and the patient age. Comparisons between the models built on data from the different institutions were made in terms of empirical (non-parametric) receiver operating characteristic (ROC) curves. Results suggest that BI-RADS-based CAD systems focused on specific classes of lesions may be more generally applicable than models that cover several lesion types. However, better generalization was seen in terms of the area under the ROC curve than in the partial area index (>90% sensitivity). Previous studies have illustrated the challenges in translating a BI-RADS-based CAD system from one institution to another. This study provides new insights into possible approaches to improve the generalization of BI-RADS-based CAD systems.

  1. Three-Dimensional Object Recognition and Registration for Robotic Grasping Systems Using a Modified Viewpoint Feature Histogram

    PubMed Central

    Chen, Chin-Sheng; Chen, Po-Chun; Hsu, Chih-Ming

    2016-01-01

    This paper presents a novel 3D feature descriptor for object recognition and to identify poses when there are six-degrees-of-freedom for mobile manipulation and grasping applications. Firstly, a Microsoft Kinect sensor is used to capture 3D point cloud data. A viewpoint feature histogram (VFH) descriptor for the 3D point cloud data then encodes the geometry and viewpoint, so an object can be simultaneously recognized and registered in a stable pose and the information is stored in a database. The VFH is robust to a large degree of surface noise and missing depth information so it is reliable for stereo data. However, the pose estimation for an object fails when the object is placed symmetrically to the viewpoint. To overcome this problem, this study proposes a modified viewpoint feature histogram (MVFH) descriptor that consists of two parts: a surface shape component that comprises an extended fast point feature histogram and an extended viewpoint direction component. The MVFH descriptor characterizes an object’s pose and enhances the system’s ability to identify objects with mirrored poses. Finally, the refined pose is further estimated using an iterative closest point when the object has been recognized and the pose roughly estimated by the MVFH descriptor and it has been registered on a database. The estimation results demonstrate that the MVFH feature descriptor allows more accurate pose estimation. The experiments also show that the proposed method can be applied in vision-guided robotic grasping systems. PMID:27886080

  2. Oral LD50 toxicity modeling and prediction of per- and polyfluorinated chemicals on rat and mouse.

    PubMed

    Bhhatarai, Barun; Gramatica, Paola

    2011-05-01

    Quantitative structure-activity relationship (QSAR) analyses were performed using the LD(50) oral toxicity data of per- and polyfluorinated chemicals (PFCs) on rodents: rat and mouse. PFCs are studied under the EU project CADASTER which uses the available experimental data for prediction and prioritization of toxic chemicals for risk assessment by using the in silico tools. The methodology presented here applies chemometrical analysis on the existing experimental data and predicts the toxicity of new compounds. QSAR analyses were performed on the available 58 mouse and 50 rat LD(50) oral data using multiple linear regression (MLR) based on theoretical molecular descriptors selected by genetic algorithm (GA). Training and prediction sets were prepared a priori from available experimental datasets in terms of structure and response. These sets were used to derive statistically robust and predictive (both internally and externally) models. The structural applicability domain (AD) of the models were verified on 376 per- and polyfluorinated chemicals including those in REACH preregistration list. The rat and mouse endpoints were predicted by each model for the studied compounds, and finally 30 compounds, all perfluorinated, were prioritized as most important for experimental toxicity analysis under the project. In addition, cumulative study on compounds within the AD of all four models, including two earlier published models on LC(50) rodent analysis was studied and the cumulative toxicity trend was observed using principal component analysis (PCA). The similarities and the differences observed in terms of descriptors and chemical/mechanistic meaning encoded by descriptors to prioritize the most toxic compounds are highlighted.

  3. Dyspnea descriptors developed in Brazil: application in obese patients and in patients with cardiorespiratory diseases.

    PubMed

    Teixeira, Christiane Aires; Rodrigues Júnior, Antonio Luiz; Straccia, Luciana Cristina; Vianna, Elcio Dos Santos Oliveira; Silva, Geruza Alves da; Martinez, José Antônio Baddini

    2011-01-01

    To develop a set of descriptive terms applied to the sensation of dyspnea (dyspnea descriptors) for use in Brazil and to investigate the usefulness of these descriptors in four distinct clinical conditions that can be accompanied by dyspnea. We collected 111 dyspnea descriptors from 67 patients and 10 health professionals. These descriptors were analyzed and reduced to 15 based on their frequency of use, similarity of meaning, and potential pathophysiological value. Those 15 descriptors were applied in 50 asthma patients, 50 COPD patients, 30 patients with heart failure, and 50 patients with class II or III obesity. The three best descriptors, as selected by the patients, were studied by cluster analysis. Potential associations between the identified clusters and the four clinical conditions were also investigated. The use of this set of descriptors led to a solution with seven clusters, designated sufoco (suffocating), aperto (tight), rápido (rapid), fadiga (fatigue), abafado (stuffy), trabalho/inspiração (work/inhalation), and falta de ar (shortness of breath). Overlapping of descriptors was quite common among the patients, regardless of their clinical condition. Asthma was significantly associated with the sufoco and trabalho/inspiração clusters, whereas COPD and heart failure were associated with the sufoco, trabalho/inspiração, and falta de ar clusters. Obesity was associated only with the falta de ar cluster. In Brazil, patients who are accustomed to perceiving dyspnea employ various descriptors in order to describe the symptom, and these descriptors can be grouped into similar clusters. In our study sample, such clusters showed no usefulness in differentiating among the four clinical conditions evaluated.

  4. Novel Spectral Representations and Sparsity-Driven Algorithms for Shape Modeling and Analysis

    NASA Astrophysics Data System (ADS)

    Zhong, Ming

    In this dissertation, we focus on extending classical spectral shape analysis by incorporating spectral graph wavelets and sparsity-seeking algorithms. Defined with the graph Laplacian eigenbasis, the spectral graph wavelets are localized both in the vertex domain and graph spectral domain, and thus are very effective in describing local geometry. With a rich dictionary of elementary vectors and forcing certain sparsity constraints, a real life signal can often be well approximated by a very sparse coefficient representation. The many successful applications of sparse signal representation in computer vision and image processing inspire us to explore the idea of employing sparse modeling techniques with dictionary of spectral basis to solve various shape modeling problems. Conventional spectral mesh compression uses the eigenfunctions of mesh Laplacian as shape bases, which are highly inefficient in representing local geometry. To ameliorate, we advocate an innovative approach to 3D mesh compression using spectral graph wavelets as dictionary to encode mesh geometry. The spectral graph wavelets are locally defined at individual vertices and can better capture local shape information than Laplacian eigenbasis. The multi-scale SGWs form a redundant dictionary as shape basis, so we formulate the compression of 3D shape as a sparse approximation problem that can be readily handled by greedy pursuit algorithms. Surface inpainting refers to the completion or recovery of missing shape geometry based on the shape information that is currently available. We devise a new surface inpainting algorithm founded upon the theory and techniques of sparse signal recovery. Instead of estimating the missing geometry directly, our novel method is to find this low-dimensional representation which describes the entire original shape. More specifically, we find that, for many shapes, the vertex coordinate function can be well approximated by a very sparse coefficient representation with respect to the dictionary comprising its Laplacian eigenbasis, and it is then possible to recover this sparse representation from partial measurements of the original shape. Taking advantage of the sparsity cue, we advocate a novel variational approach for surface inpainting, integrating data fidelity constraints on the shape domain with coefficient sparsity constraints on the transformed domain. Because of the powerful properties of Laplacian eigenbasis, the inpainting results of our method tend to be globally coherent with the remaining shape. Informative and discriminative feature descriptors are vital in qualitative and quantitative shape analysis for a large variety of graphics applications. We advocate novel strategies to define generalized, user-specified features on shapes. Our new region descriptors are primarily built upon the coefficients of spectral graph wavelets that are both multi-scale and multi-level in nature, consisting of both local and global information. Based on our novel spectral feature descriptor, we developed a user-specified feature detection framework and a tensor-based shape matching algorithm. Through various experiments, we demonstrate the competitive performance of our proposed methods and the great potential of spectral basis and sparsity-driven methods for shape modeling.

  5. Neural network-based feature point descriptors for registration of optical and SAR images

    NASA Astrophysics Data System (ADS)

    Abulkhanov, Dmitry; Konovalenko, Ivan; Nikolaev, Dmitry; Savchik, Alexey; Shvets, Evgeny; Sidorchuk, Dmitry

    2018-04-01

    Registration of images of different nature is an important technique used in image fusion, change detection, efficient information representation and other problems of computer vision. Solving this task using feature-based approaches is usually more complex than registration of several optical images because traditional feature descriptors (SIFT, SURF, etc.) perform poorly when images have different nature. In this paper we consider the problem of registration of SAR and optical images. We train neural network to build feature point descriptors and use RANSAC algorithm to align found matches. Experimental results are presented that confirm the method's effectiveness.

  6. A novel prediction approach for antimalarial activities of Trimethoprim, Pyrimethamine, and Cycloguanil analogues using extremely randomized trees.

    PubMed

    Nattee, Cholwich; Khamsemanan, Nirattaya; Lawtrakul, Luckhana; Toochinda, Pisanu; Hannongbua, Supa

    2017-01-01

    Malaria is still one of the most serious diseases in tropical regions. This is due in part to the high resistance against available drugs for the inhibition of parasites, Plasmodium, the cause of the disease. New potent compounds with high clinical utility are urgently needed. In this work, we created a novel model using a regression tree to study structure-activity relationships and predict the inhibition constant, K i of three different antimalarial analogues (Trimethoprim, Pyrimethamine, and Cycloguanil) based on their molecular descriptors. To the best of our knowledge, this work is the first attempt to study the structure-activity relationships of all three analogues combined. The most relevant descriptors and appropriate parameters of the regression tree are harvested using extremely randomized trees. These descriptors are water accessible surface area, Log of the aqueous solubility, total hydrophobic van der Waals surface area, and molecular refractivity. Out of all possible combinations of these selected parameters and descriptors, the tree with the strongest coefficient of determination is selected to be our prediction model. Predicted K i values from the proposed model show a strong coefficient of determination, R 2 =0.996, to experimental K i values. From the structure of the regression tree, compounds with high accessible surface area of all hydrophobic atoms (ASA_H) and low aqueous solubility of inhibitors (Log S) generally possess low K i values. Our prediction model can also be utilized as a screening test for new antimalarial drug compounds which may reduce the time and expenses for new drug development. New compounds with high predicted K i should be excluded from further drug development. It is also our inference that a threshold of ASA_H greater than 575.80 and Log S less than or equal to -4.36 is a sufficient condition for a new compound to possess a low K i . Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Determination of Abraham model solute descriptors for the monomeric and dimeric forms of trans-cinnamic acid using measured solubilities from the Open Notebook Science Challenge.

    PubMed

    Bradley, Jean-Claude; Abraham, Michael H; Acree, William E; Lang, Andrew Sid; Beck, Samantha N; Bulger, David A; Clark, Elizabeth A; Condron, Lacey N; Costa, Stephanie T; Curtin, Evan M; Kurtu, Sozit B; Mangir, Mark I; McBride, Matthew J

    2015-01-01

    Calculating Abraham descriptors from solubility values requires that the solute have the same form when dissolved in all solvents. However, carboxylic acids can form dimers when dissolved in non-polar solvents. For such compounds Abraham descriptors can be calculated for both the monomeric and dimeric forms by treating the polar and non-polar systems separately. We illustrate the method of how this can be done by calculating the Abraham descriptors for both the monomeric and dimeric forms of trans-cinnamic acid, the first time that descriptors for a carboxylic acid dimer have been obtained. Abraham descriptors were calculated for the monomeric form of trans-cinnamic acid using experimental solubility measurements in polar solvents from the Open Notebook Science Challenge together with a number of water-solvent partition coefficients from the literature. Similarly, experimental solubility measurements in non-polar solvents were used to determine Abraham descriptors for the trans-cinnamic acid dimer. Abraham descriptors were calculated for both the monomeric and dimeric forms of trans-cinnamic acid. This allows for the prediction of further solubilities of trans-cinnamic acid in both polar and non-polar solvents with an error of about 0.10 log units. Graphical abstractMolar concentration of trans-cinnamic acid in various polar and non-polar solvents.

  8. Synthesis and quantitative structure-activity relationship (QSAR) study of novel isoxazoline and oxime derivatives of podophyllotoxin as insecticidal agents.

    PubMed

    Wang, Yi; Shao, Yonghua; Wang, Yangyang; Fan, Lingling; Yu, Xiang; Zhi, Xiaoyan; Yang, Chun; Qu, Huan; Yao, Xiaojun; Xu, Hui

    2012-08-29

    In continuation of our program aimed at the discovery and development of natural-product-based insecticidal agents, 33 isoxazoline and oxime derivatives of podophyllotoxin modified in the C and D rings were synthesized and their structures were characterized by Proton nuclear magnetic resonance ((1)H NMR), high-resolution mass spectrometry (HRMS), electrospray ionization-mass spectrometry (ESI-MS), optical rotation, melting point (mp), and infrared (IR) spectroscopy. The stereochemical configurations of compounds 5e, 5f, and 9f were unambiguously determined by X-ray crystallography. Their insecticidal activity was evaluated against the pre-third-instar larvae of northern armyworm, Mythimna separata (Walker), in vivo. Compounds 5e, 9c, 11g, and 11h especially exhibited more promising insecticidal activity than toosendanin, a commercial botanical insecticide extracted from Melia azedarach . A genetic algorithm combined with multiple linear regression (GA-MLR) calculation is performed by the MOBY DIGS package. Five selected descriptors are as follows: one two-dimensional (2D) autocorrelation descriptor (GATS4e), one edge adjacency indice (EEig06x), one RDF descriptor (RDF080v), one three-dimensional (3D) MoRSE descriptor (Mor09v), and one atom-centered fragment (H-052) descriptor. Quantitative structure-activity relationship studies demonstrated that the insecticidal activity of these compounds was mainly influenced by many factors, such as electronic distribution, steric factors, etc. For this model, the standard deviation error in prediction (SDEP) is 0.0592, the correlation coefficient (R(2)) is 0.861, and the leave-one-out cross-validation correlation coefficient (Q(2)loo) is 0.797.

  9. Numeric promoter description - A comparative view on concepts and general application.

    PubMed

    Beier, Rico; Labudde, Dirk

    2016-01-01

    Nucleic acid molecules play a key role in a variety of biological processes. Starting from storage and transfer tasks, this also comprises the triggering of biological processes, regulatory effects and the active influence gained by target binding. Based on the experimental output (in this case promoter sequences), further in silico analyses aid in gaining new insights into these processes and interactions. The numerical description of nucleic acids thereby constitutes a bridge between the concrete biological issues and the analytical methods. Hence, this study compares 26 descriptor sets obtained by applying well-known numerical description concepts to an established dataset of 38 DNA promoter sequences. The suitability of the description sets was evaluated by computing partial least squares regression models and assessing the model accuracy. We conclude that the major importance regarding the descriptive power is attached to positional information rather than to explicitly incorporated physico-chemical information, since a sufficient amount of implicit physico-chemical information is already encoded in the nucleobase classification. The regression models especially benefited from employing the information that is encoded in the sequential and structural neighborhood of the nucleobases. Thus, the analyses of n-grams (short fragments of length n) suggested that they are valuable descriptors for DNA target interactions. A mixed n-gram descriptor set thereby yielded the best description of the promoter sequences. The corresponding regression model was checked and found to be plausible as it was able to reproduce the characteristic binding motifs of promoter sequences in a reasonable degree. As most functional nucleic acids are based on the principle of molecular recognition, the findings are not restricted to promoter sequences, but can rather be transferred to other kinds of functional nucleic acids. Thus, the concepts presented in this study could provide advantages for future nucleic acid-based technologies, like biosensoring, therapeutics and molecular imaging. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Spiking Cortical Model Based Multimodal Medical Image Fusion by Combining Entropy Information with Weber Local Descriptor

    PubMed Central

    Zhang, Xuming; Ren, Jinxia; Huang, Zhiwen; Zhu, Fei

    2016-01-01

    Multimodal medical image fusion (MIF) plays an important role in clinical diagnosis and therapy. Existing MIF methods tend to introduce artifacts, lead to loss of image details or produce low-contrast fused images. To address these problems, a novel spiking cortical model (SCM) based MIF method has been proposed in this paper. The proposed method can generate high-quality fused images using the weighting fusion strategy based on the firing times of the SCM. In the weighting fusion scheme, the weight is determined by combining the entropy information of pulse outputs of the SCM with the Weber local descriptor operating on the firing mapping images produced from the pulse outputs. The extensive experiments on multimodal medical images show that compared with the numerous state-of-the-art MIF methods, the proposed method can preserve image details very well and avoid the introduction of artifacts effectively, and thus it significantly improves the quality of fused images in terms of human vision and objective evaluation criteria such as mutual information, edge preservation index, structural similarity based metric, fusion quality index, fusion similarity metric and standard deviation. PMID:27649190

  11. Spiking Cortical Model Based Multimodal Medical Image Fusion by Combining Entropy Information with Weber Local Descriptor.

    PubMed

    Zhang, Xuming; Ren, Jinxia; Huang, Zhiwen; Zhu, Fei

    2016-09-15

    Multimodal medical image fusion (MIF) plays an important role in clinical diagnosis and therapy. Existing MIF methods tend to introduce artifacts, lead to loss of image details or produce low-contrast fused images. To address these problems, a novel spiking cortical model (SCM) based MIF method has been proposed in this paper. The proposed method can generate high-quality fused images using the weighting fusion strategy based on the firing times of the SCM. In the weighting fusion scheme, the weight is determined by combining the entropy information of pulse outputs of the SCM with the Weber local descriptor operating on the firing mapping images produced from the pulse outputs. The extensive experiments on multimodal medical images show that compared with the numerous state-of-the-art MIF methods, the proposed method can preserve image details very well and avoid the introduction of artifacts effectively, and thus it significantly improves the quality of fused images in terms of human vision and objective evaluation criteria such as mutual information, edge preservation index, structural similarity based metric, fusion quality index, fusion similarity metric and standard deviation.

  12. EMD-Based Symbolic Dynamic Analysis for the Recognition of Human and Nonhuman Pyroelectric Infrared Signals.

    PubMed

    Zhao, Jiaduo; Gong, Weiguo; Tang, Yuzhen; Li, Weihong

    2016-01-20

    In this paper, we propose an effective human and nonhuman pyroelectric infrared (PIR) signal recognition method to reduce PIR detector false alarms. First, using the mathematical model of the PIR detector, we analyze the physical characteristics of the human and nonhuman PIR signals; second, based on the analysis results, we propose an empirical mode decomposition (EMD)-based symbolic dynamic analysis method for the recognition of human and nonhuman PIR signals. In the proposed method, first, we extract the detailed features of a PIR signal into five symbol sequences using an EMD-based symbolization method, then, we generate five feature descriptors for each PIR signal through constructing five probabilistic finite state automata with the symbol sequences. Finally, we use a weighted voting classification strategy to classify the PIR signals with their feature descriptors. Comparative experiments show that the proposed method can effectively classify the human and nonhuman PIR signals and reduce PIR detector's false alarms.

  13. Predictive Signatures from ToxCast Data for Chronic, Developmental and Reproductive Toxicity Endpoints

    EPA Science Inventory

    The EPA ToxCast program is using in vitro assay data and chemical descriptors to build predictive models for in vivo toxicity endpoints. In vitro assays measure activity of chemicals against molecular targets such as enzymes and receptors (measured in cell-free and cell-based sys...

  14. A macrophysical life cycle description for precipitating systems

    NASA Astrophysics Data System (ADS)

    Evaristo, Raquel; Xie, Xinxin; Troemel, Silke; Diederich, Malte; Simon, Juergen; Simmer, Clemens

    2014-05-01

    The lack of understanding of cloud and precipitation processes is still the overarching problem of climate simulation, and prediction. The work presented is part of the HD(CP)2 project (High Definition Clouds and Precipitation for Advancing Climate Predictions) which aims at building a very high resolution model in order to evaluate and exploit regional hindcasts for the purpose of parameterization development. To this end, an observational object-based climatology for precipitation systems will be built, and shall later be compared with a twin model-based climatological data base for pseudo precipitation events within an event-based model validation approach. This is done by identifying internal structures, described by means of macrophysical descriptors used to characterize the temporal development of tracked rain events. 2 pre-requisites are necessary for this: 1) a tracking algorithm, and 2) 3D radar/satellite composite. Both prerequisites are ready to be used, and have already been applied to a few case studies. Some examples of these macrophysical descriptors are differential reflectivity columns, bright band fraction and trend, cloud top heights, the spatial extent of updrafts or downdrafts or the ice content. We will show one case study from August 5th 2012, when convective precipitation was observed simultaneously by the BOXPOL and JUXPOL X-band polarimetric radars. We will follow the main paths identified by the tracking algorithm during this event and identify in the 3D composite the descriptors that characterize precipitation development, their temporal evolution, and the different macrophysical processes that are ultimately related to the precipitation observed. In a later stage these observations will be compared to the results of hydrometeor classification algorithm, in order to link the macrophysical and microphysical aspects of the storm evolution. The detailed microphysical processes are the subject of a closely related work also presented in this session: Microphysical processes observed by X band polarimetric radars during the evolution of storm systems, by Xinxin Xie et al.

  15. Microscopic structural descriptor of liquid water

    NASA Astrophysics Data System (ADS)

    Shi, Rui; Tanaka, Hajime

    2018-03-01

    The microscopic structure of liquid water has been believed to be the key to the understanding of the unique properties of this extremely important substance. Many structural descriptors have been developed for revealing local structural order in water, but their properties are still not well understood. The essential difficulty comes from structural fluctuations due to thermal noise, which are intrinsic to the liquid state. The most popular and widely used descriptors are the local structure index (LSI) and d5. Recently, Russo and Tanaka [Nat. Commun. 3, 3556 (2014)] introduced a new descriptor ζ which measures the translational order between the first and second shells considering hydrogen bonding (H-bonding) in the first shell. In this work, we compare the performance of these three structural descriptors for a popular water model known as TIP5P water. We show that local structural ordering can be properly captured only by the structural descriptor ζ, but not by the other two descriptors particularly at a high temperature, where thermal noise effects are severe. The key difference of ζ from LSI and d5 is that only ζ considers H-bonding which is crucial to detect high translational and tetrahedral order of not only oxygen but also hydrogen atoms. The importance of H-bonding is very natural, considering the fact that the locally favored structures are stabilized by energy gain due to the formation of four hydrogen bonds between the central water molecule and its neighboring ones in the first shell. Our analysis of the water structure by using ζ strongly supports the two-state model of water: water is a dynamic mixture of locally favored (ordered) and normal-liquid (disordered) structures. This work demonstrates the importance of H-bonding in the characterization of water's structures and provides a useful structural descriptor for water-type tetrahedral liquids to study their structure and dynamics.

  16. Learning structure-property relationship in crystalline materials: A study of lanthanide-transition metal alloys

    NASA Astrophysics Data System (ADS)

    Pham, Tien-Lam; Nguyen, Nguyen-Duong; Nguyen, Van-Doan; Kino, Hiori; Miyake, Takashi; Dam, Hieu-Chi

    2018-05-01

    We have developed a descriptor named Orbital Field Matrix (OFM) for representing material structures in datasets of multi-element materials. The descriptor is based on the information regarding atomic valence shell electrons and their coordination. In this work, we develop an extension of OFM called OFM1. We have shown that these descriptors are highly applicable in predicting the physical properties of materials and in providing insights on the materials space by mapping into a low embedded dimensional space. Our experiments with transition metal/lanthanide metal alloys show that the local magnetic moments and formation energies can be accurately reproduced using simple nearest-neighbor regression, thus confirming the relevance of our descriptors. Using kernel ridge regressions, we could accurately reproduce formation energies and local magnetic moments calculated based on first-principles, with mean absolute errors of 0.03 μB and 0.10 eV/atom, respectively. We show that meaningful low-dimensional representations can be extracted from the original descriptor using descriptive learning algorithms. Intuitive prehension on the materials space, qualitative evaluation on the similarities in local structures or crystalline materials, and inference in the designing of new materials by element substitution can be performed effectively based on these low-dimensional representations.

  17. Molecular Descriptors

    NASA Astrophysics Data System (ADS)

    Consonni, Viviana; Todeschini, Roberto

    In the last decades, several scientific researches have been focused on studying how to encompass and convert - by a theoretical pathway - the information encoded in the molecular structure into one or more numbers used to establish quantitative relationships between structures and properties, biological activities, or other experimental properties. Molecular descriptors are formally mathematical representations of a molecule obtained by a well-specified algorithm applied to a defined molecular representation or a well-specified experimental procedure. They play a fundamental role in chemistry, pharmaceutical sciences, environmental protection policy, toxicology, ecotoxicology, health research, and quality control. Evidence of the interest of the scientific community in the molecular descriptors is provided by the huge number of descriptors proposed up today: more than 5000 descriptors derived from different theories and approaches are defined in the literature and most of them can be calculated by means of dedicated software applications. Molecular descriptors are of outstanding importance in the research fields of quantitative structure-activity relationships (QSARs) and quantitative structure-property relationships (QSPRs), where they are the independent chemical information used to predict the properties of interest. Along with the definition of appropriate molecular descriptors, the molecular structure representation and the mathematical tools for deriving and assessing models are other fundamental components of the QSAR/QSPR approach. The remarkable progress during the last few years in chemometrics and chemoinformatics has led to new strategies for finding mathematical meaningful relationships between the molecular structure and biological activities, physico-chemical, toxicological, and environmental properties of chemicals. Different approaches for deriving molecular descriptors here reviewed and some of the most relevant descriptors are presented in detail with numerical examples.

  18. Quantitative structure-toxicity relationship (QSTR) studies on the organophosphate insecticides.

    PubMed

    Can, Alper

    2014-11-04

    Organophosphate insecticides are the most commonly used pesticides in the world. In this study, quantitative structure-toxicity relationship (QSTR) models were derived for estimating the acute oral toxicity of organophosphate insecticides to male rats. The 20 chemicals of the training set and the seven compounds of the external testing set were described by means of using descriptors. Descriptors for lipophilicity, polarity and molecular geometry, as well as quantum chemical descriptors for energy were calculated. Model development to predict toxicity of organophosphate insecticides in different matrices was carried out using multiple linear regression. The model was validated internally and externally. In the present study, QSTR model was used for the first time to understand the inherent relationships between the organophosphate insecticide molecules and their toxicity behavior. Such studies provide mechanistic insight about structure-toxicity relationship and help in the design of less toxic insecticides. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  19. Modeling of the relationship between dipeptide structure and dipeptide stability, permeability, and ACE inhibitory activity.

    PubMed

    Foltz, Martin; van Buren, Leo; Klaffke, Werner; Duchateau, Guus S M J E

    2009-09-01

    Selected di- and tripeptides exhibit angiotensin-I converting enzyme (ACE) inhibitory activity in vitro. However, the efficacy in vivo is most likely limited for most peptides due to low bioavailability. The purpose of this study was to identify descriptors of intestinal stability, permeability, and ACE inhibitory activity of dipeptides. A total of 228 dipeptides were synthesized; intestinal stability was obtained by in vitro digestion, intestinal permeability using Caco-2 cells and ACE inhibitory activity by an in vitro assay. Databases were constructed to study the relationship between structure and activity, permeability, and stability. Quantitative structure-activity relationship (QSAR) modeling was performed based on computed models using partial least squares regression based on 400 molecular descriptors. QSAR modeling of dipeptide stability revealed high correlation coefficients (R > 0.65) for models based on Z and X scales. However, amino acid (AA) clustering showed the best results in describing stability of dipeptides. The N-terminal AA residues Asp, Gly, and Pro as well as the C-terminal residues Pro, Ser, Thr, and Asp stabilize dipeptides toward luminal enzymatic peptide hydrolysis. QSAR modeling did not reveal significant correlation models for intestinal permeability. 2D-fingerprint models were identified describing ACE inhibitory activity of dipeptides. The intestinal stability of 12 peptides was predicted. Peptides were synthesized and stability was confirmed in simulated digestion experiments. Based on the results, specific dipeptides can be designed to meet both stability and activity criteria. However, postabsorptive ACE inhibitory activities of dipeptides in vivo are most likely limited due to the very low intestinal permeability of dipeptides.

  20. Approaching Pharmacological Space: Events and Components.

    PubMed

    Vistoli, Giulio; Pedretti, Alessandro; Mazzolari, Angelica; Testa, Bernard

    2018-01-01

    With a view to introducing the concept of pharmacological space and its potential applications in investigating and predicting the toxic mechanisms of xenobiotics, this opening chapter describes the logical relations between conformational behavior, physicochemical properties and binding spaces, which are seen as the three key elements composing the pharmacological space. While the concept of conformational space is routinely used to encode molecular flexibility, the concepts of property spaces and, particularly, of binding spaces are more innovative. Indeed, their descriptors can find fruitful applications (a) in describing the dynamic adaptability a given ligand experiences when inserted into a specific environment, and (b) in parameterizing the flexibility a ligand retains when bound to a biological target. Overall, these descriptors can conveniently account for the often disregarded entropic factors and as such they prove successful when inserted in ligand- or structure-based predictive models. Notably, and although binding space parameters can clearly be derived from MD simulations, the chapter will illustrate how docking calculations, despite their static nature, are able to evaluate ligand's flexibility by analyzing several poses for each ligand. Such an approach, which represents the founding core of the binding space concept, can find various applications in which the related descriptors show an impressive enhancing effect on the statistical performances of the resulting predictive models.

  1. Shape model of the maxillary dental arch using Fourier descriptors with an application in the rehabilitation for edentulous patient.

    PubMed

    Rijal, Omar M; Abdullah, Norli A; Isa, Zakiah M; Noor, Norliza M; Tawfiq, Omar F

    2013-01-01

    The knowledge of teeth positions on the maxillary arch is useful in the rehabilitation of the edentulous patient. A combination of angular (θ), and linear (l) variables representing position of four teeth were initially proposed as the shape descriptor of the maxillary dental arch. Three categories of shape were established, each having a multivariate normal distribution. It may be argued that 4 selected teeth on the standardized digital images of the dental casts could be considered as insufficient with respect to representing shape. However, increasing the number of points would create problems with dimensions and proof of existence of the multivariate normal distribution is extremely difficult. This study investigates the ability of Fourier descriptors (FD) using all maxillary teeth to find alternative shape models. Eight FD terms were sufficient to represent 21 points on the arch. Using these 8 FD terms as an alternative shape descriptor, three categories of shape were verified, each category having the complex normal distribution.

  2. Non-linear quantitative structure-activity relationship for adenine derivatives as competitive inhibitors of adenosine deaminase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sadat Hayatshahi, Sayyed Hamed; Abdolmaleki, Parviz; Safarian, Shahrokh

    2005-12-16

    Logistic regression and artificial neural networks have been developed as two non-linear models to establish quantitative structure-activity relationships between structural descriptors and biochemical activity of adenosine based competitive inhibitors, toward adenosine deaminase. The training set included 24 compounds with known k {sub i} values. The models were trained to solve two-class problems. Unlike the previous work in which multiple linear regression was used, the highest of positive charge on the molecules was recognized to be in close relation with their inhibition activity, while the electric charge on atom N1 of adenosine was found to be a poor descriptor. Consequently, themore » previously developed equation was improved and the newly formed one could predict the class of 91.66% of compounds correctly. Also optimized 2-3-1 and 3-4-1 neural networks could increase this rate to 95.83%.« less

  3. Texture Descriptors Ensembles Enable Image-Based Classification of Maturation of Human Stem Cell-Derived Retinal Pigmented Epithelium

    PubMed Central

    Caetano dos Santos, Florentino Luciano; Skottman, Heli; Juuti-Uusitalo, Kati; Hyttinen, Jari

    2016-01-01

    Aims A fast, non-invasive and observer-independent method to analyze the homogeneity and maturity of human pluripotent stem cell (hPSC) derived retinal pigment epithelial (RPE) cells is warranted to assess the suitability of hPSC-RPE cells for implantation or in vitro use. The aim of this work was to develop and validate methods to create ensembles of state-of-the-art texture descriptors and to provide a robust classification tool to separate three different maturation stages of RPE cells by using phase contrast microscopy images. The same methods were also validated on a wide variety of biological image classification problems, such as histological or virus image classification. Methods For image classification we used different texture descriptors, descriptor ensembles and preprocessing techniques. Also, three new methods were tested. The first approach was an ensemble of preprocessing methods, to create an additional set of images. The second was the region-based approach, where saliency detection and wavelet decomposition divide each image in two different regions, from which features were extracted through different descriptors. The third method was an ensemble of Binarized Statistical Image Features, based on different sizes and thresholds. A Support Vector Machine (SVM) was trained for each descriptor histogram and the set of SVMs combined by sum rule. The accuracy of the computer vision tool was verified in classifying the hPSC-RPE cell maturation level. Dataset and Results The RPE dataset contains 1862 subwindows from 195 phase contrast images. The final descriptor ensemble outperformed the most recent stand-alone texture descriptors, obtaining, for the RPE dataset, an area under ROC curve (AUC) of 86.49% with the 10-fold cross validation and 91.98% with the leave-one-image-out protocol. The generality of the three proposed approaches was ascertained with 10 more biological image datasets, obtaining an average AUC greater than 97%. Conclusions Here we showed that the developed ensembles of texture descriptors are able to classify the RPE cell maturation stage. Moreover, we proved that preprocessing and region-based decomposition improves many descriptors’ accuracy in biological dataset classification. Finally, we built the first public dataset of stem cell-derived RPE cells, which is publicly available to the scientific community for classification studies. The proposed tool is available at https://www.dei.unipd.it/node/2357 and the RPE dataset at http://www.biomeditech.fi/data/RPE_dataset/. Both are available at https://figshare.com/s/d6fb591f1beb4f8efa6f. PMID:26895509

  4. Linear and nonlinear methods in modeling the aqueous solubility of organic compounds.

    PubMed

    Catana, Cornel; Gao, Hua; Orrenius, Christian; Stouten, Pieter F W

    2005-01-01

    Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.

  5. A comparative study for chest radiograph image retrieval using binary texture and deep learning classification.

    PubMed

    Anavi, Yaron; Kogan, Ilya; Gelbart, Elad; Geva, Ofer; Greenspan, Hayit

    2015-08-01

    In this work various approaches are investigated for X-ray image retrieval and specifically chest pathology retrieval. Given a query image taken from a data set of 443 images, the objective is to rank images according to similarity. Different features, including binary features, texture features, and deep learning (CNN) features are examined. In addition, two approaches are investigated for the retrieval task. One approach is based on the distance of image descriptors using the above features (hereon termed the "descriptor"-based approach); the second approach ("classification"-based approach) is based on a probability descriptor, generated by a pair-wise classification of each two classes (pathologies) and their decision values using an SVM classifier. Best results are achieved using deep learning features in a classification scheme.

  6. Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients

    NASA Astrophysics Data System (ADS)

    Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.

    2017-12-01

    The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.

  7. Extended Graph-Based Models for Enhanced Similarity Search in Cavbase.

    PubMed

    Krotzky, Timo; Fober, Thomas; Hüllermeier, Eyke; Klebe, Gerhard

    2014-01-01

    To calculate similarities between molecular structures, measures based on the maximum common subgraph are frequently applied. For the comparison of protein binding sites, these measures are not fully appropriate since graphs representing binding sites on a detailed atomic level tend to get very large. In combination with an NP-hard problem, a large graph leads to a computationally demanding task. Therefore, for the comparison of binding sites, a less detailed coarse graph model is used building upon so-called pseudocenters. Consistently, a loss of structural data is caused since many atoms are discarded and no information about the shape of the binding site is considered. This is usually resolved by performing subsequent calculations based on additional information. These steps are usually quite expensive, making the whole approach very slow. The main drawback of a graph-based model solely based on pseudocenters, however, is the loss of information about the shape of the protein surface. In this study, we propose a novel and efficient modeling formalism that does not increase the size of the graph model compared to the original approach, but leads to graphs containing considerably more information assigned to the nodes. More specifically, additional descriptors considering surface characteristics are extracted from the local surface and attributed to the pseudocenters stored in Cavbase. These properties are evaluated as additional node labels, which lead to a gain of information and allow for much faster but still very accurate comparisons between different structures.

  8. 3D molecular descriptors important for clinical success.

    PubMed

    Kombo, David C; Tallapragada, Kartik; Jain, Rachit; Chewning, Joseph; Mazurov, Anatoly A; Speake, Jason D; Hauser, Terry A; Toler, Steve

    2013-02-25

    The pharmacokinetic and safety profiles of clinical drug candidates are greatly influenced by their requisite physicochemical properties. In particular, it has been shown that 2D molecular descriptors such as fraction of Sp3 carbon atoms (Fsp3) and number of stereo centers correlate with clinical success. Using the proteomic off-target hit rate of nicotinic ligands, we found that shape-based 3D descriptors such as the radius of gyration and shadow indices discriminate off-target promiscuity better than do Fsp3 and the number of stereo centers. We have deduced the relevant descriptor values required for a ligand to be nonpromiscuous. Investigating the MDL Drug Data Report (MDDR) database as compounds move from the preclinical stage toward the market, we have found that these shape-based 3D descriptors predict clinical success of compounds at preclinical and phase1 stages vs compounds withdrawn from the market better than do Fsp3 and LogD. Further, these computed 3D molecular descriptors correlate well with experimentally observed solubility, which is among well-known physicochemical properties that drive clinical success. We also found that about 84% of launched drugs satisfy either Shadow index or Fsp3 criteria, whereas withdrawn and discontinued compounds fail to meet the same criteria. Our studies suggest that spherical compounds (rather than their elongated counterparts) with a minimal number of aromatic rings may exhibit a high propensity to advance from clinical trials to market.

  9. Predicting hepatotoxicity using ToxCast in vitro bioactivity and ...

    EPA Pesticide Factsheets

    Background: The U.S. EPA ToxCastTM program is screening thousands of environmental chemicals for bioactivity using hundreds of high-throughput in vitro assays to build predictive models of toxicity. We represented chemicals based on bioactivity and chemical structure descriptors then used supervised machine learning to predict their hepatotoxic effects.Results: A set of 677 chemicals were represented by 711 in vitro bioactivity descriptors (from ToxCast assays), 4,376 chemical structure descriptors (from QikProp, OpenBabel, PADEL, and PubChem), and three hepatotoxicity categories (from animal studies). Hepatotoxicants were defined by rat liver histopathology observed after chronic chemical testing and grouped into hypertrophy (161), injury (101) and proliferative lesions (99). Classifiers were built using six machine learning algorithms: linear discriminant analysis (LDA), Naïve Bayes (NB), support vector classification (SVM), classification and regression trees (CART), k-nearest neighbors (KNN) and an ensemble of classifiers (ENSMB). Classifiers of hepatotoxicity were built using chemical structure, ToxCast bioactivity, and a hybrid representation. Predictive performance was evaluated using 10-fold cross-validation testing and in-loop, filter-based, feature subset selection. Hybrid classifiers had the best balanced accuracy for predicting hypertrophy (0.78±0.08), injury (0.73±0.10) and proliferative lesions (0.72±0.09). Though chemical and bioactivity class

  10. A Viral-Human Interactome Based on Structural Motif-Domain Interactions Captures the Human Infectome

    PubMed Central

    Guo, Xianwu; Rodríguez-Pérez, Mario A.

    2013-01-01

    Protein interactions between a pathogen and its host are fundamental in the establishment of the pathogen and underline the infection mechanism. In the present work, we developed a single predictive model for building a host-viral interactome based on the identification of structural descriptors from motif-domain interactions of protein complexes deposited in the Protein Data Bank (PDB). The structural descriptors were used for searching, in a database of protein sequences of human and five clinically important viruses; therefore, viral and human proteins sharing a descriptor were predicted as interacting proteins. The analysis of the host-viral interactome allowed to identify a set of new interactions that further explain molecular mechanism associated with viral infections and showed that it was able to capture human proteins already associated to viral infections (human infectome) and non-infectious diseases (human diseasome). The analysis of human proteins targeted by viral proteins in the context of a human interactome showed that their neighbors are enriched in proteins reported with differential expression under infection and disease conditions. It is expected that the findings of this work will contribute to the development of systems biology for infectious diseases, and help guide the rational identification and prioritization of novel drug targets. PMID:23951184

  11. A neurally inspired musical instrument classification system based upon the sound onset.

    PubMed

    Newton, Michael J; Smith, Leslie S

    2012-06-01

    Physiological evidence suggests that sound onset detection in the auditory system may be performed by specialized neurons as early as the cochlear nucleus. Psychoacoustic evidence shows that the sound onset can be important for the recognition of musical sounds. Here the sound onset is used in isolation to form tone descriptors for a musical instrument classification task. The task involves 2085 isolated musical tones from the McGill dataset across five instrument categories. A neurally inspired tone descriptor is created using a model of the auditory system's response to sound onset. A gammatone filterbank and spiking onset detectors, built from dynamic synapses and leaky integrate-and-fire neurons, create parallel spike trains that emphasize the sound onset. These are coded as a descriptor called the onset fingerprint. Classification uses a time-domain neural network, the echo state network. Reference strategies, based upon mel-frequency cepstral coefficients, evaluated either over the whole tone or only during the sound onset, provide context to the method. Classification success rates for the neurally-inspired method are around 75%. The cepstral methods perform between 73% and 76%. Further testing with tones from the Iowa MIS collection shows that the neurally inspired method is considerably more robust when tested with data from an unrelated dataset.

  12. Violence detection based on histogram of optical flow orientation

    NASA Astrophysics Data System (ADS)

    Yang, Zhijie; Zhang, Tao; Yang, Jie; Wu, Qiang; Bai, Li; Yao, Lixiu

    2013-12-01

    In this paper, we propose a novel approach for violence detection and localization in a public scene. Currently, violence detection is considerably under-researched compared with the common action recognition. Although existing methods can detect the presence of violence in a video, they cannot precisely locate the regions in the scene where violence is happening. This paper will tackle the challenge and propose a novel method to locate the violence location in the scene, which is important for public surveillance. The Gaussian Mixed Model is extended into the optical flow domain in order to detect candidate violence regions. In each region, a new descriptor, Histogram of Optical Flow Orientation (HOFO), is proposed to measure the spatial-temporal features. A linear SVM is trained based on the descriptor. The performance of the method is demonstrated on the publicly available data sets, BEHAVE and CAVIAR.

  13. Calculating the dermal flux of chemicals with OELs based on their molecular structure: An attempt to assign the skin notation.

    PubMed

    Kupczewska-Dobecka, Małgorzata; Jakubowski, Marek; Czerczak, Sławomir

    2010-09-01

    Our objectives included calculating the permeability coefficient and dermal penetration rates (flux value) for 112 chemicals with occupational exposure limits (OELs) according to the LFER (linear free-energy relationship) model developed using published methods. We also attempted to assign skin notations based on each chemical's molecular structure. There are many studies available where formulae for coefficients of permeability from saturated aqueous solutions (K(p)) have been related to physicochemical characteristics of chemicals. The LFER model is based on the solvation equation, which contains five main descriptors predicted from chemical structure: solute excess molar refractivity, dipolarity/polarisability, summation hydrogen bond acidity and basicity, and the McGowan characteristic volume. Descriptor values, available for about 5000 compounds in the Pharma Algorithms Database were used to calculate permeability coefficients. Dermal penetration rate was estimated as a ratio of permeability coefficient and concentration of chemical in saturated aqueous solution. Finally, estimated dermal penetration rates were used to assign the skin notation to chemicals. Defined critical fluxes defined from the literature were recommended as reference values for skin notation. The application of Abraham descriptors predicted from chemical structure and LFER analysis in calculation of permeability coefficients and flux values for chemicals with OELs was successful. Comparison of calculated K(p) values with data obtained earlier from other models showed that LFER predictions were comparable to those obtained by some previously published models, but the differences were much more significant for others. It seems reasonable to conclude that skin should not be characterised as a simple lipophilic barrier alone. Both lipophilic and polar pathways of permeation exist across the stratum corneum. It is feasible to predict skin notation on the basis of the LFER and other published models; from among 112 chemicals 94 (84%) should have the skin notation in the OEL list based on the LFER calculations. The skin notation had been estimated by other published models for almost 94% of the chemicals. Twenty-nine (25.8%) chemicals were identified to have significant absorption and 65 (58%) the potential for dermal toxicity. We found major differences between alternative published analytical models and their ability to determine whether particular chemicals were potentially dermotoxic. Copyright © 2010 Elsevier B.V. All rights reserved.

  14. Lagrangian descriptors of driven chemical reaction manifolds.

    PubMed

    Craven, Galen T; Junginger, Andrej; Hernandez, Rigoberto

    2017-08-01

    The persistence of a transition state structure in systems driven by time-dependent environments allows the application of modern reaction rate theories to solution-phase and nonequilibrium chemical reactions. However, identifying this structure is problematic in driven systems and has been limited by theories built on series expansion about a saddle point. Recently, it has been shown that to obtain formally exact rates for reactions in thermal environments, a transition state trajectory must be constructed. Here, using optimized Lagrangian descriptors [G. T. Craven and R. Hernandez, Phys. Rev. Lett. 115, 148301 (2015)PRLTAO0031-900710.1103/PhysRevLett.115.148301], we obtain this so-called distinguished trajectory and the associated moving reaction manifolds on model energy surfaces subject to various driving and dissipative conditions. In particular, we demonstrate that this is exact for harmonic barriers in one dimension and this verification gives impetus to the application of Lagrangian descriptor-based methods in diverse classes of chemical reactions. The development of these objects is paramount in the theory of reaction dynamics as the transition state structure and its underlying network of manifolds directly dictate reactivity and selectivity.

  15. Scene text recognition in mobile applications by character descriptor and structure configuration.

    PubMed

    Yi, Chucai; Tian, Yingli

    2014-07-01

    Text characters and strings in natural scene can provide valuable information for many applications. Extracting text directly from natural scene images or videos is a challenging task because of diverse text patterns and variant background interferences. This paper proposes a method of scene text recognition from detected text regions. In text detection, our previously proposed algorithms are applied to obtain text regions from scene image. First, we design a discriminative character descriptor by combining several state-of-the-art feature detectors and descriptors. Second, we model character structure at each character class by designing stroke configuration maps. Our algorithm design is compatible with the application of scene text extraction in smart mobile devices. An Android-based demo system is developed to show the effectiveness of our proposed method on scene text information extraction from nearby objects. The demo system also provides us some insight into algorithm design and performance improvement of scene text extraction. The evaluation results on benchmark data sets demonstrate that our proposed scheme of text recognition is comparable with the best existing methods.

  16. Learning Rotation-Invariant Local Binary Descriptor.

    PubMed

    Duan, Yueqi; Lu, Jiwen; Feng, Jianjiang; Zhou, Jie

    2017-08-01

    In this paper, we propose a rotation-invariant local binary descriptor (RI-LBD) learning method for visual recognition. Compared with hand-crafted local binary descriptors, such as local binary pattern and its variants, which require strong prior knowledge, local binary feature learning methods are more efficient and data-adaptive. Unlike existing learning-based local binary descriptors, such as compact binary face descriptor and simultaneous local binary feature learning and encoding, which are susceptible to rotations, our RI-LBD first categorizes each local patch into a rotational binary pattern (RBP), and then jointly learns the orientation for each pattern and the projection matrix to obtain RI-LBDs. As all the rotation variants of a patch belong to the same RBP, they are rotated into the same orientation and projected into the same binary descriptor. Then, we construct a codebook by a clustering method on the learned binary codes, and obtain a histogram feature for each image as the final representation. In order to exploit higher order statistical information, we extend our RI-LBD to the triple rotation-invariant co-occurrence local binary descriptor (TRICo-LBD) learning method, which learns a triple co-occurrence binary code for each local patch. Extensive experimental results on four different visual recognition tasks, including image patch matching, texture classification, face recognition, and scene classification, show that our RI-LBD and TRICo-LBD outperform most existing local descriptors.

  17. Inductive electronegativity scale. Iterative calculation of inductive partial charges.

    PubMed

    Cherkasov, Artem

    2003-01-01

    A number of novel QSAR descriptors have been introduced on the basis of the previously elaborated models for steric and inductive effects. The developed "inductive" parameters include absolute and effective electronegativity, atomic partial charges, and local and global chemical hardness and softness. Being based on traditional inductive and steric substituent constants these 3D descriptors provide a valuable insight into intramolecular steric and electronic interactions and can find broad application in structure-activity studies. Possible interpretation of physical meaning of the inductive descriptors has been suggested by considering a neutral molecule as an electrical capacitor formed by charged atomic spheres. This approximation relates inductive chemical softness and hardness of bound atom(s) with the total area of the facings of electrical capacitor formed by the atom(s) and the rest of the molecule. The derived full electronegativity equalization scheme allows iterative calculation of inductive partial charges on the basis of atomic electronegativities, covalent radii, and intramolecular distances. A range of inductive descriptors has been computed for a variety of organic compounds. The calculated inductive charges in the studied molecules have been validated by experimental C-1s Electron Core Binding Energies and molecular dipole moments. Several semiempirical chemical rules, such as equalized electronegativity's arithmetic mean, principle of maximum hardness, and principle of hardness borrowing could be explicitly illustrated in the framework of the developed approach.

  18. Automatic orientation and 3D modelling from markerless rock art imagery

    NASA Astrophysics Data System (ADS)

    Lerma, J. L.; Navarro, S.; Cabrelles, M.; Seguí, A. E.; Hernández, D.

    2013-02-01

    This paper investigates the use of two detectors and descriptors on image pyramids for automatic image orientation and generation of 3D models. The detectors and descriptors replace manual measurements and are used to detect, extract and match features across multiple imagery. The Scale-Invariant Feature Transform (SIFT) and the Speeded Up Robust Features (SURF) will be assessed based on speed, number of features, matched features, and precision in image and object space depending on the adopted hierarchical matching scheme. The influence of applying in addition Area Based Matching (ABM) with normalised cross-correlation (NCC) and least squares matching (LSM) is also investigated. The pipeline makes use of photogrammetric and computer vision algorithms aiming minimum interaction and maximum accuracy from a calibrated camera. Both the exterior orientation parameters and the 3D coordinates in object space are sequentially estimated combining relative orientation, single space resection and bundle adjustment. The fully automatic image-based pipeline presented herein to automate the image orientation step of a sequence of terrestrial markerless imagery is compared with manual bundle block adjustment and terrestrial laser scanning (TLS) which serves as ground truth. The benefits of applying ABM after FBM will be assessed both in image and object space for the 3D modelling of a complex rock art shelter.

  19. Multistressor predictive models of invertebrate condition in the Corn Belt, USA

    USGS Publications Warehouse

    Waite, Ian R.; Van Metre, Peter C.

    2017-01-01

    Understanding the complex relations between multiple environmental stressors and ecological conditions in streams can help guide resource-management decisions. During 14 weeks in spring/summer 2013, personnel from the US Geological Survey and the US Environmental Protection Agency sampled 98 wadeable streams across the Midwest Corn Belt region of the USA for water and sediment quality, physical and habitat characteristics, and ecological communities. We used these data to develop independent predictive disturbance models for 3 macroinvertebrate metrics and a multimetric index. We developed the models based on boosted regression trees (BRT) for 3 stressor categories, land use/land cover (geographic information system [GIS]), all in-stream stressors combined (nutrients, habitat, and contaminants), and for GIS plus in-stream stressors. The GIS plus in-stream stressor models had the best overall performance with an average cross-validation R2 across all models of 0.41. The models were generally consistent in the explanatory variables selected within each stressor group across the 4 invertebrate metrics modeled. Variables related to riparian condition, substrate size or embeddedness, velocity and channel shape, nutrients (primarily NH3), and contaminants (pyrethroid degradates) were important descriptors of the invertebrate metrics. Models based on all measured in-stream stressors performed comparably to models based on GIS landscape variables, suggesting that the in-stream stressor characterization reasonably represents the dominant factors affecting invertebrate communities and that GIS variables are acting as surrogates for in-stream stressors that directly affect in-stream biota.

  20. The Use of Descriptors with Exemplar and Model Answers to Improve Quality of Students' Narrative Writing in English French and Arabic

    ERIC Educational Resources Information Center

    Somba, Anne W.; Obura, Ger; Njuguna, Margaret; Itevete, Boniface; Mulwa, Jones; Wandera, Nooh

    2015-01-01

    The importance of writing skills in enhancing student performance in language exams and even other subject areas is widely acknowledged. At Jaffery secondary, the approach to the teaching of writing has generally been to use three approaches: product-based approach with focus on what the students composed; process-based approach that is focused on…

  1. [The use of complex interval models for predicting activity of non-nucleoside reverse transcriptase activity].

    PubMed

    Burliaeva, E V; Tarkhov, A E; Burliaev, V V; Iurkevich, A M; Shvets, V I

    2002-01-01

    Searching of new anti-HIV agents is still crucial now. In general, researches are looking for inhibitors of certain HIV's vital enzymes, especially for reverse transcriptase (RT) inhibitors. Modern generation of anti-HIV agents represents non-nucleoside reverse transcriptase inhibitors (NNRTIs). They are much less toxic than nucleoside analogues and more chemically stable, thus being slower metabolized and emitted from the human body. Thus, search of new NNRTIs is actual today. Synthesis and study of new anti-HIV drugs is very expensive. So employment of the activity prediction techniques for such a search is very beneficial. This technique allows predicting the activities for newly proposed structures. It is based on the property model built by investigation of a series of known compounds with measured activity. This paper presents an approach of activity prediction based on "structure-activity" models designed to form a hypothesis about probably activity interval estimate. This hypothesis formed is based on structure descriptor domains, calculated for all energetically allowed conformers for each compound in the studied sef. Tetrahydroimidazobenzodiazipenone (TIBO) derivatives and phenylethyltiazolyltiourea (PETT) derivatives illustrated the predictive power of this method. The results are consistent with experimental data and allow to predict inhibitory activity of compounds, which were not included into the training set.

  2. Assessment of tautomer distribution using the condensed reaction graph approach

    NASA Astrophysics Data System (ADS)

    Gimadiev, T. R.; Madzhidov, T. I.; Nugmanov, R. I.; Baskin, I. I.; Antipin, I. S.; Varnek, A.

    2018-03-01

    We report the first direct QSPR modeling of equilibrium constants of tautomeric transformations (logK T ) in different solvents and at different temperatures, which do not require intermediate assessment of acidity (basicity) constants for all tautomeric forms. The key step of the modeling consisted in the merging of two tautomers in one sole molecular graph ("condensed reaction graph") which enables to compute molecular descriptors characterizing entire equilibrium. The support vector regression method was used to build the models. The training set consisted of 785 transformations belonging to 11 types of tautomeric reactions with equilibrium constants measured in different solvents and at different temperatures. The models obtained perform well both in cross-validation (Q2 = 0.81 RMSE = 0.7 logK T units) and on two external test sets. Benchmarking studies demonstrate that our models outperform results obtained with DFT B3LYP/6-311 ++ G(d,p) and ChemAxon Tautomerizer applicable only in water at room temperature.

  3. Pairwise registration of TLS point clouds using covariance descriptors and a non-cooperative game

    NASA Astrophysics Data System (ADS)

    Zai, Dawei; Li, Jonathan; Guo, Yulan; Cheng, Ming; Huang, Pengdi; Cao, Xiaofei; Wang, Cheng

    2017-12-01

    It is challenging to automatically register TLS point clouds with noise, outliers and varying overlap. In this paper, we propose a new method for pairwise registration of TLS point clouds. We first generate covariance matrix descriptors with an adaptive neighborhood size from point clouds to find candidate correspondences, we then construct a non-cooperative game to isolate mutual compatible correspondences, which are considered as true positives. The method was tested on three models acquired by two different TLS systems. Experimental results demonstrate that our proposed adaptive covariance (ACOV) descriptor is invariant to rigid transformation and robust to noise and varying resolutions. The average registration errors achieved on three models are 0.46 cm, 0.32 cm and 1.73 cm, respectively. The computational times cost on these models are about 288 s, 184 s and 903 s, respectively. Besides, our registration framework using ACOV descriptors and a game theoretic method is superior to the state-of-the-art methods in terms of both registration error and computational time. The experiment on a large outdoor scene further demonstrates the feasibility and effectiveness of our proposed pairwise registration framework.

  4. QSAR modeling of acute toxicity on mammals caused by aromatic compounds: the case study using oral LD50 for rats.

    PubMed

    Rasulev, Bakhtiyor; Kusić, Hrvoje; Leszczynska, Danuta; Leszczynski, Jerzy; Koprivanac, Natalija

    2010-05-01

    The goal of the study was to predict toxicity in vivo caused by aromatic compounds structured with a single benzene ring and the presence or absence of different substituent groups such as hydroxyl-, nitro-, amino-, methyl-, methoxy-, etc., by using QSAR/QSPR tools. A Genetic Algorithm and multiple regression analysis were applied to select the descriptors and to generate the correlation models. The most predictive model is shown to be the 3-variable model which also has a good ratio of the number of descriptors and their predictive ability to avoid overfitting. The main contributions to the toxicity were shown to be the polarizability weighted MATS2p and the number of certain groups C-026 descriptors. The GA-MLRA approach showed good results in this study, which allows the building of a simple, interpretable and transparent model that can be used for future studies of predicting toxicity of organic compounds to mammals.

  5. Graph Theoretical Representation of Atomic Asymmetry and Molecular Chirality of Benzenoids in Two-Dimensional Space

    PubMed Central

    Zhao, Tanfeng; Zhang, Qingyou; Long, Hailin; Xu, Lu

    2014-01-01

    In order to explore atomic asymmetry and molecular chirality in 2D space, benzenoids composed of 3 to 11 hexagons in 2D space were enumerated in our laboratory. These benzenoids are regarded as planar connected polyhexes and have no internal holes; that is, their internal regions are filled with hexagons. The produced dataset was composed of 357,968 benzenoids, including more than 14 million atoms. Rather than simply labeling the huge number of atoms as being either symmetric or asymmetric, this investigation aims at exploring a quantitative graph theoretical descriptor of atomic asymmetry. Based on the particular characteristics in the 2D plane, we suggested the weighted atomic sum as the descriptor of atomic asymmetry. This descriptor is measured by circulating around the molecule going in opposite directions. The investigation demonstrates that the weighted atomic sums are superior to the previously reported quantitative descriptor, atomic sums. The investigation of quantitative descriptors also reveals that the most asymmetric atom is in a structure with a spiral ring with the convex shape going in clockwise direction and concave shape going in anticlockwise direction from the atom. Based on weighted atomic sums, a weighted F index is introduced to quantitatively represent molecular chirality in the plane, rather than merely regarding benzenoids as being either chiral or achiral. By validating with enumerated benzenoids, the results indicate that the weighted F indexes were in accordance with their chiral classification (achiral or chiral) over the whole benzenoids dataset. Furthermore, weighted F indexes were superior to previously available descriptors. Benzenoids possess a variety of shapes and can be extended to practically represent any shape in 2D space—our proposed descriptor has thus the potential to be a general method to represent 2D molecular chirality based on the difference between clockwise and anticlockwise sums around a molecule. PMID:25032832

  6. Automatic summarization of soccer highlights using audio-visual descriptors.

    PubMed

    Raventós, A; Quijada, R; Torres, Luis; Tarrés, Francesc

    2015-01-01

    Automatic summarization generation of sports video content has been object of great interest for many years. Although semantic descriptions techniques have been proposed, many of the approaches still rely on low-level video descriptors that render quite limited results due to the complexity of the problem and to the low capability of the descriptors to represent semantic content. In this paper, a new approach for automatic highlights summarization generation of soccer videos using audio-visual descriptors is presented. The approach is based on the segmentation of the video sequence into shots that will be further analyzed to determine its relevance and interest. Of special interest in the approach is the use of the audio information that provides additional robustness to the overall performance of the summarization system. For every video shot a set of low and mid level audio-visual descriptors are computed and lately adequately combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. A variety of results are presented with real soccer video sequences that prove the validity of the approach.

  7. Plant Identification Based on Leaf Midrib Cross-Section Images Using Fractal Descriptors.

    PubMed

    da Silva, Núbia Rosa; Florindo, João Batista; Gómez, María Cecilia; Rossatto, Davi Rodrigo; Kolb, Rosana Marta; Bruno, Odemir Martinez

    2015-01-01

    The correct identification of plants is a common necessity not only to researchers but also to the lay public. Recently, computational methods have been employed to facilitate this task, however, there are few studies front of the wide diversity of plants occurring in the world. This study proposes to analyse images obtained from cross-sections of leaf midrib using fractal descriptors. These descriptors are obtained from the fractal dimension of the object computed at a range of scales. In this way, they provide rich information regarding the spatial distribution of the analysed structure and, as a consequence, they measure the multiscale morphology of the object of interest. In Biology, such morphology is of great importance because it is related to evolutionary aspects and is successfully employed to characterize and discriminate among different biological structures. Here, the fractal descriptors are used to identify the species of plants based on the image of their leaves. A large number of samples are examined, being 606 leaf samples of 50 species from Brazilian flora. The results are compared to other imaging methods in the literature and demonstrate that fractal descriptors are precise and reliable in the taxonomic process of plant species identification.

  8. The Relevance Aura of Bibliographic Records.

    ERIC Educational Resources Information Center

    Brooks, Terrence A.

    1997-01-01

    Analyzes relevance assessments of topical descriptors for bibliographic records for two dimensions: (1) a vertical conceptual hierarchy of broad to narrow descriptors, and (2) a horizontal linkage of related terms. The data were analyzed for a semantic distance and semantic direction effect as postulated by the Semantic Distance Model. (Author/LRW)

  9. 2D-QSAR study of fullerene nanostructure derivatives as potent HIV-1 protease inhibitors

    NASA Astrophysics Data System (ADS)

    Barzegar, Abolfazl; Jafari Mousavi, Somaye; Hamidi, Hossein; Sadeghi, Mehdi

    2017-09-01

    The protease of human immunodeficiency virus1 (HIV-PR) is an essential enzyme for antiviral treatments. Carbon nanostructures of fullerene derivatives, have nanoscale dimension with a diameter comparable to the diameter of the active site of HIV-PR which would in turn inhibit HIV. In this research, two dimensional quantitative structure-activity relationships (2D-QSAR) of fullerene derivatives against HIV-PR activity were employed as a powerful tool for elucidation the relationships between structure and experimental observations. QSAR study of 49 fullerene derivatives was performed by employing stepwise-MLR, GAPLS-MLR, and PCA-MLR models for variable (descriptor) selection and model construction. QSAR models were obtained with higher ability to predict the activity of the fullerene derivatives against HIV-PR by a correlation coefficient (R2training) of 0.942, 0.89, and 0.87 as well as R2test values of 0.791, 0.67and 0.674 for stepwise-MLR, GAPLS-MLR, and PCA -MLR models, respectively. Leave-one-out cross-validated correlation coefficient (R2CV) and Y-randomization methods confirmed the models robustness. The descriptors indicated that the HIV-PR inhibition depends on the van der Waals volumes, polarizability, bond order between two atoms and electronegativities of fullerenes derivatives. 2D-QSAR simulation without needing receptor's active site geometry, resulted in useful descriptors mainly denoting ;C60 backbone-functional groups; and ;C60 functional groups; properties. Both properties in fullerene refer to the ligand fitness and improvement van der Waals interactions with HIV-PR active site. Therefore, the QSAR models can be used in the search for novel HIV-PR inhibitors based on fullerene derivatives.

  10. Use of in Vitro HTS-Derived Concentration-Response Data as ...

    EPA Pesticide Factsheets

    Background: Quantitative high-throughput screening (qHTS) assays are increasingly being employed to inform chemical hazard identification. Hundreds of chemicals have been tested in dozens of cell lines across extensive concentration ranges by the National Toxicology Program in collaboration with the NIH Chemical Genomics Center. Objectives: To test a hypothesis that dose-response data points of the qHTS assays can serve as biological descriptors of assayed chemicals and, when combined with conventional chemical descriptors, may improve the accuracy of Quantitative Structure-Activity Relationship (QSAR) models applied to prediction of in vivo toxicity endpoints. Methods and Results: The cell viability qHTS concentration-response data for 1,408 substances assayed in 13 cell lines were obtained from PubChem; for a subset of these compounds rodent acute toxicity LD50 data were also available. The classification k Nearest Neighbor and Random Forest QSAR methods were employed for modeling LD50 data using either chemical descriptors alone (conventional models) or in combination with biological descriptors derived from the concentration-response qHTS data (hybrid models). Critical to our approach was the use of a novel noise-filtering algorithm to treat qHTS data. We show that both the external classification accuracy and coverage (i.e., fraction of compounds in the external set that fall within the applicability domain) of the hybrid QSAR models was superior to convent

  11. Texture classification using non-Euclidean Minkowski dilation

    NASA Astrophysics Data System (ADS)

    Florindo, Joao B.; Bruno, Odemir M.

    2018-03-01

    This study presents a new method to extract meaningful descriptors of gray-scale texture images using Minkowski morphological dilation based on the Lp metric. The proposed approach is motivated by the success previously achieved by Bouligand-Minkowski fractal descriptors on texture classification. In essence, such descriptors are directly derived from the morphological dilation of a three-dimensional representation of the gray-level pixels using the classical Euclidean metric. In this way, we generalize the dilation for different values of p in the Lp metric (Euclidean is a particular case when p = 2) and obtain the descriptors from the cumulated distribution of the distance transform computed over the texture image. The proposed method is compared to other state-of-the-art approaches (such as local binary patterns and textons for example) in the classification of two benchmark data sets (UIUC and Outex). The proposed descriptors outperformed all the other approaches in terms of rate of images correctly classified. The interesting results suggest the potential of these descriptors in this type of task, with a wide range of possible applications to real-world problems.

  12. An Effective 3D Shape Descriptor for Object Recognition with RGB-D Sensors

    PubMed Central

    Liu, Zhong; Zhao, Changchen; Wu, Xingming; Chen, Weihai

    2017-01-01

    RGB-D sensors have been widely used in various areas of computer vision and graphics. A good descriptor will effectively improve the performance of operation. This article further analyzes the recognition performance of shape features extracted from multi-modality source data using RGB-D sensors. A hybrid shape descriptor is proposed as a representation of objects for recognition. We first extracted five 2D shape features from contour-based images and five 3D shape features over point cloud data to capture the global and local shape characteristics of an object. The recognition performance was tested for category recognition and instance recognition. Experimental results show that the proposed shape descriptor outperforms several common global-to-global shape descriptors and is comparable to some partial-to-global shape descriptors that achieved the best accuracies in category and instance recognition. Contribution of partial features and computational complexity were also analyzed. The results indicate that the proposed shape features are strong cues for object recognition and can be combined with other features to boost accuracy. PMID:28245553

  13. An object-based approach to weather analysis and its applications

    NASA Astrophysics Data System (ADS)

    Troemel, Silke; Diederich, Malte; Horvath, Akos; Simmer, Clemens; Kumjian, Matthew

    2013-04-01

    The research group 'Object-based Analysis and SEamless prediction' (OASE) within the Hans Ertel Centre for Weather Research programme (HErZ) pursues an object-based approach to weather analysis. The object-based tracking approach adopts the Lagrange perspective by identifying and following the development of convective events over the course of their lifetime. Prerequisites of the object-based analysis are a high-resolved observational data base and a tracking algorithm. A near real-time radar and satellite remote sensing-driven 3D observation-microphysics composite covering Germany, currently under development, contains gridded observations and estimated microphysical quantities. A 3D scale-space tracking identifies convective rain events in the dual-composite and monitors the development over the course of their lifetime. The OASE-group exploits the object-based approach in several fields of application: (1) For a better understanding and analysis of precipitation processes responsible for extreme weather events, (2) in nowcasting, (3) as a novel approach for validation of meso-γ atmospheric models, and (4) in data assimilation. Results from the different fields of application will be presented. The basic idea of the object-based approach is to identify a small set of radar- and satellite derived descriptors which characterize the temporal development of precipitation systems which constitute the objects. So-called proxies of the precipitation process are e.g. the temporal change of the brightband, vertically extensive columns of enhanced differential reflectivity ZDR or the cloud top temperature and heights identified in the 4D field of ground-based radar reflectivities and satellite retrievals generated by a cell during its life time. They quantify (micro-) physical differences among rain events and relate to the precipitation yield. Analyses on the informative content of ZDR columns as precursor for storm evolution for example will be presented to demonstrate the use of such system-oriented predictors for nowcasting. Columns of differential reflectivity ZDR measured by polarimetric weather radars are prominent signatures associated with thunderstorm updrafts. Since greater vertical velocities can loft larger drops and water-coated ice particles to higher altitudes above the environmental freezing level, the integrated ZDR column above the freezing level increases with increasing updraft intensity. Validation of atmospheric models concerning precipitation representation or prediction is usually confined to comparisons of precipitation fields or their temporal and spatial statistics. A comparison of the rain rates alone, however, does not immediately explain discrepancies between models and observations, because similar rain rates might be produced by different processes. Within the event-based approach for validation of models both observed and modeled rain events are analyzed by means of proxies of the precipitation process. Both sets of descriptors represent the basis for model validation since different leading descriptors - in a statistical sense- hint at process formulations potentially responsible for model failures.

  14. The effect of cigarillo packaging elements on young adult perceptions of product flavor, taste, smell, and appeal.

    PubMed

    Meernik, Clare; Ranney, Leah M; Lazard, Allison J; Kim, KyungSu; Queen, Tara L; Avishai, Aya; Boynton, Marcella H; Sheeran, Paschal J; Goldstein, Adam O

    2018-01-01

    Product packaging has long been used by the tobacco industry to target consumers and manipulate product perceptions. This study examines the extent to which cigarillo packaging influences perceptions of product flavor, taste, smell, and appeal. A web-based experiment was conducted among young adults. Participants viewed three randomly selected cigarillo packs, varying on pack flavor descriptor, color, type, branding, and warning-totaling 180 pack images. Mixed-effects models were used to estimate the effect of pack elements on product perceptions. A total of 2,664 current, ever, and never little cigar and cigarillo users participated. Cigarillo packs with a flavor descriptor were perceived as having a more favorable taste (β = 0.21, p < .001) and smell (β = 0.14, p < .001) compared to packs with no flavor descriptor. Compared to packs with no color, pink and purple packs were more likely to be perceived as containing a flavor (β = 0.11, p < .001), and were rated more favorably on taste (β = 0.17, p < .001), smell (β = 0.15, p < .001), and appeal (β = 0.16, p < .001). While warnings on packs decreased favorable perceptions of product taste (pictorial: β = -0.07, p = .03) and smell (text-only: β = -0.08, p = .01; pictorial: β = -0.09, p = .007), warnings did not moderate the effects of flavor descriptor or color. To our knowledge, this study provides the first quantitative evidence that cigarillo packaging alters consumers' cognitive responses, and warnings on packs do not suffice to overcome the effects of product packaging. The findings support efforts at federal, state, and local levels to prohibit flavor descriptors and their associated product flavoring in non-cigarette products such as cigarillos, along with new data that supports restrictions on flavor cues and colors.

  15. Arabic sign language recognition based on HOG descriptor

    NASA Astrophysics Data System (ADS)

    Ben Jmaa, Ahmed; Mahdi, Walid; Ben Jemaa, Yousra; Ben Hamadou, Abdelmajid

    2017-02-01

    We present in this paper a new approach for Arabic sign language (ArSL) alphabet recognition using hand gesture analysis. This analysis consists in extracting a histogram of oriented gradient (HOG) features from a hand image and then using them to generate an SVM Models. Which will be used to recognize the ArSL alphabet in real-time from hand gesture using a Microsoft Kinect camera. Our approach involves three steps: (i) Hand detection and localization using a Microsoft Kinect camera, (ii) hand segmentation and (iii) feature extraction using Arabic alphabet recognition. One each input image first obtained by using a depth sensor, we apply our method based on hand anatomy to segment hand and eliminate all the errors pixels. This approach is invariant to scale, to rotation and to translation of the hand. Some experimental results show the effectiveness of our new approach. Experiment revealed that the proposed ArSL system is able to recognize the ArSL with an accuracy of 90.12%.

  16. Predicting the aquatic toxicity mode of action using logistic regression and linear discriminant analysis.

    PubMed

    Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X

    2016-09-01

    The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.

  17. Detection and classification of retinal lesions for grading of diabetic retinopathy.

    PubMed

    Usman Akram, M; Khalid, Shehzad; Tariq, Anam; Khan, Shoab A; Azam, Farooque

    2014-02-01

    Diabetic Retinopathy (DR) is an eye abnormality in which the human retina is affected due to an increasing amount of insulin in blood. The early detection and diagnosis of DR is vital to save the vision of diabetes patients. The early signs of DR which appear on the surface of the retina are microaneurysms, haemorrhages, and exudates. In this paper, we propose a system consisting of a novel hybrid classifier for the detection of retinal lesions. The proposed system consists of preprocessing, extraction of candidate lesions, feature set formulation, and classification. In preprocessing, the system eliminates background pixels and extracts the blood vessels and optic disc from the digital retinal image. The candidate lesion detection phase extracts, using filter banks, all regions which may possibly have any type of lesion. A feature set based on different descriptors, such as shape, intensity, and statistics, is formulated for each possible candidate region: this further helps in classifying that region. This paper presents an extension of the m-Mediods based modeling approach, and combines it with a Gaussian Mixture Model in an ensemble to form a hybrid classifier to improve the accuracy of the classification. The proposed system is assessed using standard fundus image databases with the help of performance parameters, such as, sensitivity, specificity, accuracy, and the Receiver Operating Characteristics curves for statistical analysis. Copyright © 2013 Elsevier Ltd. All rights reserved.

  18. Finding Chemical Structures Corresponding to a Set of Coordinates in Chemical Descriptor Space.

    PubMed

    Miyao, Tomoyuki; Funatsu, Kimito

    2017-08-01

    When chemical structures are searched based on descriptor values, or descriptors are interpreted based on values, it is important that corresponding chemical structures actually exist. In order to consider the existence of chemical structures located in a specific region in the chemical space, we propose to search them inside training data domains (TDDs), which are dense areas of a training dataset in the chemical space. We investigated TDDs' features using diverse and local datasets, assuming that GDB11 is the chemical universe. These two analyses showed that considering TDDs gives higher chance of finding chemical structures than a random search-based method, and that novel chemical structures actually exist inside TDDs. In addition to those findings, we tested the hypothesis that chemical structures were distributed on the limited areas of chemical space. This hypothesis was confirmed by the fact that distances among chemical structures in several descriptor spaces were much shorter than those among randomly generated coordinates in the training data range. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Text Extraction from Scene Images by Character Appearance and Structure Modeling

    PubMed Central

    Yi, Chucai; Tian, Yingli

    2012-01-01

    In this paper, we propose a novel algorithm to detect text information from natural scene images. Scene text classification and detection are still open research topics. Our proposed algorithm is able to model both character appearance and structure to generate representative and discriminative text descriptors. The contributions of this paper include three aspects: 1) a new character appearance model by a structure correlation algorithm which extracts discriminative appearance features from detected interest points of character samples; 2) a new text descriptor based on structons and correlatons, which model character structure by structure differences among character samples and structure component co-occurrence; and 3) a new text region localization method by combining color decomposition, character contour refinement, and string line alignment to localize character candidates and refine detected text regions. We perform three groups of experiments to evaluate the effectiveness of our proposed algorithm, including text classification, text detection, and character identification. The evaluation results on benchmark datasets demonstrate that our algorithm achieves the state-of-the-art performance on scene text classification and detection, and significantly outperforms the existing algorithms for character identification. PMID:23316111

  20. Improved protein-protein interactions prediction via weighted sparse representation model combining continuous wavelet descriptor and PseAA composition.

    PubMed

    Huang, Yu-An; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying

    2016-12-23

    Protein-protein interactions (PPIs) are essential to most biological processes. Since bioscience has entered into the era of genome and proteome, there is a growing demand for the knowledge about PPI network. High-throughput biological technologies can be used to identify new PPIs, but they are expensive, time-consuming, and tedious. Therefore, computational methods for predicting PPIs have an important role. For the past years, an increasing number of computational methods such as protein structure-based approaches have been proposed for predicting PPIs. The major limitation in principle of these methods lies in the prior information of the protein to infer PPIs. Therefore, it is of much significance to develop computational methods which only use the information of protein amino acids sequence. Here, we report a highly efficient approach for predicting PPIs. The main improvements come from the use of a novel protein sequence representation by combining continuous wavelet descriptor and Chou's pseudo amino acid composition (PseAAC), and from adopting weighted sparse representation based classifier (WSRC). This method, cross-validated on the PPIs datasets of Saccharomyces cerevisiae, Human and H. pylori, achieves an excellent results with accuracies as high as 92.50%, 95.54% and 84.28% respectively, significantly better than previously proposed methods. Extensive experiments are performed to compare the proposed method with state-of-the-art Support Vector Machine (SVM) classifier. The outstanding results yield by our model that the proposed feature extraction method combing two kinds of descriptors have strong expression ability and are expected to provide comprehensive and effective information for machine learning-based classification models. In addition, the prediction performance in the comparison experiments shows the well cooperation between the combined feature and WSRC. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.

  1. Genetic programming based quantitative structure-retention relationships for the prediction of Kovats retention indices.

    PubMed

    Goel, Purva; Bapat, Sanket; Vyas, Renu; Tambe, Amruta; Tambe, Sanjeev S

    2015-11-13

    The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R(2)) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. QSAR modeling of human serum protein binding with several modeling techniques utilizing structure-information representation.

    PubMed

    Votano, Joseph R; Parham, Marc; Hall, L Mark; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander

    2006-11-30

    Four modeling techniques, using topological descriptors to represent molecular structure, were employed to produce models of human serum protein binding (% bound) on a data set of 1008 experimental values, carefully screened from publicly available sources. To our knowledge, this data is the largest set on human serum protein binding reported for QSAR modeling. The data was partitioned into a training set of 808 compounds and an external validation test set of 200 compounds. Partitioning was accomplished by clustering the compounds in a structure descriptor space so that random sampling of 20% of the whole data set produced an external test set that is a good representative of the training set with respect to both structure and protein binding values. The four modeling techniques include multiple linear regression (MLR), artificial neural networks (ANN), k-nearest neighbors (kNN), and support vector machines (SVM). With the exception of the MLR model, the ANN, kNN, and SVM QSARs were ensemble models. Training set correlation coefficients and mean absolute error ranged from r2=0.90 and MAE=7.6 for ANN to r2=0.61 and MAE=16.2 for MLR. Prediction results from the validation set yielded correlation coefficients and mean absolute errors which ranged from r2=0.70 and MAE=14.1 for ANN to a low of r2=0.59 and MAE=18.3 for the SVM model. Structure descriptors that contribute significantly to the models are discussed and compared with those found in other published models. For the ANN model, structure descriptor trends with respect to their affects on predicted protein binding can assist the chemist in structure modification during the drug design process.

  3. Prediction of drug transport processes using simple parameters and PLS statistics. The use of ACD/logP and ACD/ChemSketch descriptors.

    PubMed

    Osterberg, T; Norinder, U

    2001-01-01

    A method of modelling and predicting biopharmaceutical properties using simple theoretically computed molecular descriptors and multivariate statistics has been investigated for several data sets related to solubility, IAM chromatography, permeability across Caco-2 cell monolayers, human intestinal perfusion, brain-blood partitioning, and P-glycoprotein ATPase activity. The molecular descriptors (e.g. molar refractivity, molar volume, index of refraction, surface tension and density) and logP were computed with ACD/ChemSketch and ACD/logP, respectively. Good statistical models were derived that permit simple computational prediction of biopharmaceutical properties. All final models derived had R(2) values ranging from 0.73 to 0.95 and Q(2) values ranging from 0.69 to 0.86. The RMSEP values for the external test sets ranged from 0.24 to 0.85 (log scale).

  4. From basic physics to mechanisms of toxicity: the "liquid drop" approach applied to develop predictive classification models for toxicity of metal oxide nanoparticles.

    PubMed

    Sizochenko, Natalia; Rasulev, Bakhtiyor; Gajewicz, Agnieszka; Kuz'min, Victor; Puzyn, Tomasz; Leszczynski, Jerzy

    2014-11-21

    Many metal oxide nanoparticles are able to cause persistent stress to live organisms, including humans, when discharged to the environment. To understand the mechanism of metal oxide nanoparticles' toxicity and reduce the number of experiments, the development of predictive toxicity models is important. In this study, performed on a series of nanoparticles, the comparative quantitative-structure activity relationship (nano-QSAR) analyses of their toxicity towards E. coli and HaCaT cells were established. A new approach for representation of nanoparticles' structure is presented. For description of the supramolecular structure of nanoparticles the "liquid drop" model was applied. It is expected that a novel, proposed approach could be of general use for predictions related to nanomaterials. In addition, in our study fragmental simplex descriptors and several ligand-metal binding characteristics were calculated. The developed nano-QSAR models were validated and reliably predict the toxicity of all studied metal oxide nanoparticles. Based on the comparative analysis of contributed properties in both models the LDM-based descriptors were revealed to have an almost similar level of contribution to toxicity in both cases, while other parameters (van der Waals interactions, electronegativity and metal-ligand binding characteristics) have unequal contribution levels. In addition, the models developed here suggest different mechanisms of nanotoxicity for these two types of cells.

  5. Method of data communications with reduced latency

    DOEpatents

    Blocksome, Michael A; Parker, Jeffrey J

    2013-11-05

    Data communications with reduced latency, including: writing, by a producer, a descriptor and message data into at least two descriptor slots of a descriptor buffer, the descriptor buffer comprising allocated computer memory segmented into descriptor slots, each descriptor slot having a fixed size, the descriptor buffer having a header pointer that identifies a next descriptor slot to be processed by a DMA controller, the descriptor buffer having a tail pointer that identifies a descriptor slot for entry of a next descriptor in the descriptor buffer; recording, by the producer, in the descriptor a value signifying that message data has been written into descriptor slots; and setting, by the producer, in dependence upon the recorded value, a tail pointer to point to a next open descriptor slot.

  6. Multi-sensor image registration based on algebraic projective invariants.

    PubMed

    Li, Bin; Wang, Wei; Ye, Hao

    2013-04-22

    A new automatic feature-based registration algorithm is presented for multi-sensor images with projective deformation. Contours are firstly extracted from both reference and sensed images as basic features in the proposed method. Since it is difficult to design a projective-invariant descriptor from the contour information directly, a new feature named Five Sequential Corners (FSC) is constructed based on the corners detected from the extracted contours. By introducing algebraic projective invariants, we design a descriptor for each FSC that is ensured to be robust against projective deformation. Further, no gray scale related information is required in calculating the descriptor, thus it is also robust against the gray scale discrepancy between the multi-sensor image pairs. Experimental results utilizing real image pairs are presented to show the merits of the proposed registration method.

  7. Clinical descriptors for the recognition of central sensitization pain in patients with knee osteoarthritis.

    PubMed

    Lluch, Enrique; Nijs, Jo; Courtney, Carol A; Rebbeck, Trudy; Wylde, Vikki; Baert, Isabel; Wideman, Timothy H; Howells, Nick; Skou, Søren T

    2017-08-02

    Despite growing awareness of the contribution of central pain mechanisms to knee osteoarthritis pain in a subgroup of patients, routine evaluation of central sensitization is yet to be incorporated into clinical practice. The objective of this perspective is to design a set of clinical descriptors for the recognition of central sensitization in patients with knee osteoarthritis that can be implemented in clinical practice. A narrative review of original research papers was conducted by nine clinicians and researchers from seven different countries to reach agreement on clinically relevant descriptors. It is proposed that identification of a dominance of central sensitization pain is based on descriptors derived from the subjective assessment and the physical examination. In the former, clinicians are recommended to inquire about intensity and duration of pain and its association with structural joint changes, pain distribution, behavior of knee pain, presence of neuropathic-like or centrally mediated symptoms and responsiveness to previous treatment. The latter includes assessment of response to clinical test, mechanical hyperalgesia and allodynia, thermal hyperalgesia, hypoesthesia and reduced vibration sense. This article describes a set of clinically relevant descriptors that might indicate the presence of central sensitization in patients with knee osteoarthritis in clinical practice. Although based on research data, the descriptors proposed in this review require experimental testing in future studies. Implications for Rehabilitation Laboratory evaluation of central sensitization for people with knee osteoarthritis is yet to be incorporated into clinical practice. A set of clinical indicators for the recognition of central sensitization in patients with knee osteoarthritis is proposed. Although based on research data, the clinical indicators proposed require further experimental testing of psychometric properties.

  8. Cheminformatics-aided pharmacovigilance: application to Stevens-Johnson Syndrome

    PubMed Central

    Low, Yen S; Caster, Ola; Bergvall, Tomas; Fourches, Denis; Zang, Xiaoling; Norén, G Niklas; Rusyn, Ivan; Edwards, Ralph

    2016-01-01

    Objective Quantitative Structure-Activity Relationship (QSAR) models can predict adverse drug reactions (ADRs), and thus provide early warnings of potential hazards. Timely identification of potential safety concerns could protect patients and aid early diagnosis of ADRs among the exposed. Our objective was to determine whether global spontaneous reporting patterns might allow chemical substructures associated with Stevens-Johnson Syndrome (SJS) to be identified and utilized for ADR prediction by QSAR models. Materials and Methods Using a reference set of 364 drugs having positive or negative reporting correlations with SJS in the VigiBase global repository of individual case safety reports (Uppsala Monitoring Center, Uppsala, Sweden), chemical descriptors were computed from drug molecular structures. Random Forest and Support Vector Machines methods were used to develop QSAR models, which were validated by external 5-fold cross validation. Models were employed for virtual screening of DrugBank to predict SJS actives and inactives, which were corroborated using knowledge bases like VigiBase, ChemoText, and MicroMedex (Truven Health Analytics Inc, Ann Arbor, Michigan). Results We developed QSAR models that could accurately predict if drugs were associated with SJS (area under the curve of 75%–81%). Our 10 most active and inactive predictions were substantiated by SJS reports (or lack thereof) in the literature. Discussion Interpretation of QSAR models in terms of significant chemical descriptors suggested novel SJS structural alerts. Conclusions We have demonstrated that QSAR models can accurately identify SJS active and inactive drugs. Requiring chemical structures only, QSAR models provide effective computational means to flag potentially harmful drugs for subsequent targeted surveillance and pharmacoepidemiologic investigations. PMID:26499102

  9. The effects of variant descriptors on the potential effectiveness of plain packaging.

    PubMed

    Borland, Ron; Savvas, Steven

    2014-01-01

    To examine the effects that variant descriptor labels on cigarette packs have on smokers' perceptions of those packs and the cigarettes contained within. As part of two larger web-based studies (each involved 160 young adult ever-smokers 18-29 years old), respondents were shown a computer image of a plain cigarette pack and sets of related variant descriptors. The sets included terms that varied in terms of descriptors of colours as names, flavour strength, degrees of filter venting, filter types, quality, type of cigarette and numbers. For each set, respondents rated the highest and lowest of two or three of the following four characteristics: quality, strongest or weakest in taste, delivers most or least tar/nicotine, and most or least level of harm. There were significant differences on all four ratings. Quality ratings were the least differentiated. Except for colour descriptors, where 'Gold' rated high in quality but medium in other ratings, ratings of quality, harm, strength and delivery were all positively associated when rated on the same descriptors. Descriptor labels on cigarette packs, can affect smokers' perceptions of the characteristics of the cigarettes contained within. Therefore, they are a potential means by which product differentiation can occur. In particular, having variants differing in perceived strength while not differing in deliveries of harmful ingredients is particularly problematic. Any packaging policy should take into account the possibility that variant descriptors can mislead smokers into making inappropriate product attributions.

  10. Quantitative structure-retention relationship models for the prediction of the reversed-phase HPLC gradient retention based on the heuristic method and support vector machine.

    PubMed

    Du, Hongying; Wang, Jie; Yao, Xiaojun; Hu, Zhide

    2009-01-01

    The heuristic method (HM) and support vector machine (SVM) were used to construct quantitative structure-retention relationship models by a series of compounds to predict the gradient retention times of reversed-phase high-performance liquid chromatography (HPLC) in three different columns. The aims of this investigation were to predict the retention times of multifarious compounds, to find the main properties of the three columns, and to indicate the theory of separation procedures. In our method, we correlated the retention times of many diverse structural analytes in three columns (Symmetry C18, Chromolith, and SG-MIX) with their representative molecular descriptors, calculated from the molecular structures alone. HM was used to select the most important molecular descriptors and build linear regression models. Furthermore, non-linear regression models were built using the SVM method; the performance of the SVM models were better than that of the HM models, and the prediction results were in good agreement with the experimental values. This paper could give some insights into the factors that were likely to govern the gradient retention process of the three investigated HPLC columns, which could theoretically supervise the practical experiment.

  11. RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

    PubMed

    Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch

    2017-06-06

    An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.

  12. Quantitative structure-activity relationship (QSAR) for insecticides: development of predictive in vivo insecticide activity models.

    PubMed

    Naik, P K; Singh, T; Singh, H

    2009-07-01

    Quantitative structure-activity relationship (QSAR) analyses were performed independently on data sets belonging to two groups of insecticides, namely the organophosphates and carbamates. Several types of descriptors including topological, spatial, thermodynamic, information content, lead likeness and E-state indices were used to derive quantitative relationships between insecticide activities and structural properties of chemicals. A systematic search approach based on missing value, zero value, simple correlation and multi-collinearity tests as well as the use of a genetic algorithm allowed the optimal selection of the descriptors used to generate the models. The QSAR models developed for both organophosphate and carbamate groups revealed good predictability with r(2) values of 0.949 and 0.838 as well as [image omitted] values of 0.890 and 0.765, respectively. In addition, a linear correlation was observed between the predicted and experimental LD(50) values for the test set data with r(2) of 0.871 and 0.788 for both the organophosphate and carbamate groups, indicating that the prediction accuracy of the QSAR models was acceptable. The models were also tested successfully from external validation criteria. QSAR models developed in this study should help further design of novel potent insecticides.

  13. Motion control of planar parallel robot using the fuzzy descriptor system approach.

    PubMed

    Vermeiren, Laurent; Dequidt, Antoine; Afroun, Mohamed; Guerra, Thierry-Marie

    2012-09-01

    This work presents the control of a two-degree of freedom parallel robot manipulator. A quasi-LPV approach, through the so-called TS fuzzy model and LMI constraints problems is used. Moreover, in this context a way to derive interesting control laws is to keep the descriptor form of the mechanical system. Therefore, new LMI problems have to be defined that helps to reduce the conservatism of the usual results. Some relaxations are also proposed to leave the pure quadratic stability/stabilization framework. A comparison study between the classical control strategies from robotics and the control design using TS fuzzy descriptor models is carried out to show the interest of the proposed approach. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  14. A preliminary evaluation of the relationship of cannabinoid blood concentrations with the analgesic response to vaporized cannabis

    PubMed Central

    Wilsey, Barth L; Deutsch, Reena; Samara, Emil; Marcotte, Thomas D; Barnes, Allan J; Huestis, Marilyn A; Le, Danny

    2016-01-01

    A randomized, placebo-controlled crossover trial utilizing vaporized cannabis containing placebo and 6.7% and 2.9% delta-9-tetrahydrocannabinol (THC) was performed in 42 subjects with central neuropathic pain related to spinal cord injury and disease. Subjects received two administrations of the study medication in a 4-hour interval. Blood samples for pharmacokinetic evaluation were collected, and pain assessment tests were performed immediately after the second administration and 3 hours later. Pharmacokinetic data, although limited, were consistent with literature reports, namely dose-dependent increase in systemic exposure followed by rapid disappearance of THC. Dose-dependent improvement in pain score was evident across all pain scale elements. Using mixed model regression, an evaluation of the relationship between plasma concentrations of selected cannabinoids and percent change in items from the Neuropathic Pain Scale was conducted. Changes in the concentration of THC and its nonpsychotropic metabolite, 11-nor-9-carboxy-THC, were related to percent change from baseline of several descriptors (eg, itching, burning, and deep pain). However, given the large number of multiple comparisons, false-discovery-rate-adjusted P-values were not significant. Plans for future work are outlined to explore the relationship of plasma concentrations with the analgesic response to different cannabinoids. Such an appraisal of descriptors might contribute to the identification of distinct pathophysiologic mechanisms and, ultimately, the development of mechanism-based treatment approaches for neuropathic pain, a condition that remains difficult to treat. PMID:27621666

  15. A preliminary evaluation of the relationship of cannabinoid blood concentrations with the analgesic response to vaporized cannabis.

    PubMed

    Wilsey, Barth L; Deutsch, Reena; Samara, Emil; Marcotte, Thomas D; Barnes, Allan J; Huestis, Marilyn A; Le, Danny

    2016-01-01

    A randomized, placebo-controlled crossover trial utilizing vaporized cannabis containing placebo and 6.7% and 2.9% delta-9-tetrahydrocannabinol (THC) was performed in 42 subjects with central neuropathic pain related to spinal cord injury and disease. Subjects received two administrations of the study medication in a 4-hour interval. Blood samples for pharmacokinetic evaluation were collected, and pain assessment tests were performed immediately after the second administration and 3 hours later. Pharmacokinetic data, although limited, were consistent with literature reports, namely dose-dependent increase in systemic exposure followed by rapid disappearance of THC. Dose-dependent improvement in pain score was evident across all pain scale elements. Using mixed model regression, an evaluation of the relationship between plasma concentrations of selected cannabinoids and percent change in items from the Neuropathic Pain Scale was conducted. Changes in the concentration of THC and its nonpsychotropic metabolite, 11-nor-9-carboxy-THC, were related to percent change from baseline of several descriptors (eg, itching, burning, and deep pain). However, given the large number of multiple comparisons, false-discovery-rate-adjusted P-values were not significant. Plans for future work are outlined to explore the relationship of plasma concentrations with the analgesic response to different cannabinoids. Such an appraisal of descriptors might contribute to the identification of distinct pathophysiologic mechanisms and, ultimately, the development of mechanism-based treatment approaches for neuropathic pain, a condition that remains difficult to treat.

  16. Multispectral imaging system based on light-emitting diodes for the detection of melanomas and basal cell carcinomas: a pilot study

    NASA Astrophysics Data System (ADS)

    Delpueyo, Xana; Vilaseca, Meritxell; Royo, Santiago; Ares, Miguel; Rey-Barroso, Laura; Sanabria, Ferran; Puig, Susana; Pellacani, Giovanni; Noguero, Fernando; Solomita, Giuseppe; Bosch, Thierry

    2017-06-01

    This article proposes a multispectral system that uses the analysis of the spatial distribution of color and spectral features to improve the detection of skin cancer lesions, specifically melanomas and basal cell carcinomas. The system consists of a digital camera and light-emitting diodes of eight different wavelengths (414 to 995 nm). The parameters based on spectral features of the lesions such as reflectance and color, as well as others empirically computed using reflectance values, were calculated pixel-by-pixel from the images obtained. Statistical descriptors were calculated for every segmented lesion [mean (x˜), standard deviation (σ), minimum, and maximum]; descriptors based on the first-order statistics of the histogram [entropy (Ep), energy (En), and third central moment (μ3)] were also obtained. The study analyzed 429 pigmented and nonpigmented lesions: 290 nevi and 139 malignant (95 melanomas and 44 basal cell carcinomas), which were split into training and validation sets. Fifteen parameters were found to provide the best sensitivity (87.2% melanomas and 100% basal cell carcinomas) and specificity (54.5%). The results suggest that the extraction of textural information can contribute to the diagnosis of melanomas and basal cell carcinomas as a supporting tool to dermoscopy and confocal microscopy.

  17. Elaborate ligand-based modeling reveal new submicromolar Rho kinase inhibitors

    NASA Astrophysics Data System (ADS)

    Shahin, Rand; AlQtaishat, Saja; Taha, Mutasem O.

    2012-02-01

    Rho Kinase (ROCKII) has been recently implicated in several cardiovascular diseases prompting several attempts to discover and optimize new ROCKII inhibitors. Towards this end we explored the pharmacophoric space of 138 ROCKII inhibitors to identify high quality pharmacophores. The pharmacophoric models were subsequently allowed to compete within quantitative structure-activity relationship (QSAR) context. Genetic algorithm and multiple linear regression analysis were employed to select an optimal combination of pharmacophoric models and 2D physicochemical descriptors capable of accessing self-consistent QSAR of optimal predictive potential ( r 77 = 0.84, F = 18.18, r LOO 2 = 0.639, r PRESS 2 against 19 external test inhibitors = 0.494). Two orthogonal pharmacophores emerged in the QSAR equation suggesting the existence of at least two binding modes accessible to ligands within ROCKII binding pocket. Receiver operating characteristic (ROC) curve analyses established the validity of QSAR-selected pharmacophores. Moreover, the successful pharmacophores models were found to be comparable with crystallographically resolved ROCKII binding pocket. We employed the pharmacophoric models and associated QSAR equation to screen the national cancer institute (NCI) list of compounds Eight submicromolar ROCKII inhibitors were identified. The most potent gave IC50 values of 0.7 and 1.0 μM.

  18. Discrete RNA libraries from pseudo-torsional space

    PubMed Central

    Humphris-Narayanan, Elisabeth

    2012-01-01

    The discovery that RNA molecules can fold into complex structures and carry out diverse cellular roles has led to interest in developing tools for modeling RNA tertiary structure. While significant progress has been made in establishing that the RNA backbone is rotameric, few libraries of discrete conformations specifically for use in RNA modeling have been validated. Here, we present six libraries of discrete RNA conformations based on a simplified pseudo-torsional notation of the RNA backbone, comparable to phi and psi in the protein backbone. We evaluate the ability of each library to represent single nucleotide backbone conformations and we show how individual library fragments can be assembled into dinucleotides that are consistent with established RNA backbone descriptors spanning from sugar to sugar. We then use each library to build all-atom models of 20 test folds and we show how the composition of a fragment library can limit model quality. Despite the limitations inherent in using discretized libraries, we find that several hundred discrete fragments can rebuild RNA folds up to 174 nucleotides in length with atomic-level accuracy (<1.5Å RMSD). We anticipate the libraries presented here could easily be incorporated into RNA structural modeling, analysis, or refinement tools. PMID:22425640

  19. Structure-Activity Relationships for Rates of Aromatic Amine Oxidation by Manganese Dioxide.

    PubMed

    Salter-Blanc, Alexandra J; Bylaska, Eric J; Lyon, Molly A; Ness, Stuart C; Tratnyek, Paul G

    2016-05-17

    New energetic compounds are designed to minimize their potential environmental impacts, which includes their transformation and the fate and effects of their transformation products. The nitro groups of energetic compounds are readily reduced to amines, and the resulting aromatic amines are subject to oxidation and coupling reactions. Manganese dioxide (MnO2) is a common environmental oxidant and model system for kinetic studies of aromatic amine oxidation. In this study, a training set of new and previously reported kinetic data for the oxidation of model and energetic-derived aromatic amines was assembled and subjected to correlation analysis against descriptor variables that ranged from general purpose [Hammett σ constants (σ(-)), pKas of the amines, and energies of the highest occupied molecular orbital (EHOMO)] to specific for the likely rate-limiting step [one-electron oxidation potentials (Eox)]. The selection of calculated descriptors (pKa, EHOMO, and Eox) was based on validation with experimental data. All of the correlations gave satisfactory quantitative structure-activity relationships (QSARs), but they improved with the specificity of the descriptor. The scope of correlation analysis was extended beyond MnO2 to include literature data on aromatic amine oxidation by other environmentally relevant oxidants (ozone, chlorine dioxide, and phosphate and carbonate radicals) by correlating relative rate constants (normalized to 4-chloroaniline) to EHOMO (calculated with a modest level of theory).

  20. QSAR modeling of cumulative environmental end-points for the prioritization of hazardous chemicals.

    PubMed

    Gramatica, Paola; Papa, Ester; Sangion, Alessandro

    2018-01-24

    The hazard of chemicals in the environment is inherently related to the molecular structure and derives simultaneously from various chemical properties/activities/reactivities. Models based on Quantitative Structure Activity Relationships (QSARs) are useful to screen, rank and prioritize chemicals that may have an adverse impact on humans and the environment. This paper reviews a selection of QSAR models (based on theoretical molecular descriptors) developed for cumulative multivariate endpoints, which were derived by mathematical combination of multiple effects and properties. The cumulative end-points provide an integrated holistic point of view to address environmentally relevant properties of chemicals.

  1. Quad-phased data mining modeling for dementia diagnosis.

    PubMed

    Bang, Sunjoo; Son, Sangjoon; Roh, Hyunwoong; Lee, Jihye; Bae, Sungyun; Lee, Kyungwon; Hong, Changhyung; Shin, Hyunjung

    2017-05-18

    The number of people with dementia is increasing along with people's ageing trend worldwide. Therefore, there are various researches to improve a dementia diagnosis process in the field of computer-aided diagnosis (CAD) technology. The most significant issue is that the evaluation processes by physician which is based on medical information for patients and questionnaire from their guardians are time consuming, subjective and prone to error. This problem can be solved by an overall data mining modeling, which subsidizes an intuitive decision of clinicians. Therefore, in this paper we propose a quad-phased data mining modeling consisting of 4 modules. In Proposer Module, significant diagnostic criteria are selected that are effective for diagnostics. Then in Predictor Module, a model is constructed to predict and diagnose dementia based on a machine learning algorism. To help clinical physicians understand results of the predictive model better, in Descriptor Module, we interpret causes of diagnostics by profiling patient groups. Lastly, in Visualization Module, we provide visualization to effectively explore characteristics of patient groups. The proposed model is applied for CREDOS study which contains clinical data collected from 37 university-affiliated hospitals in republic of Korea from year 2005 to 2013. This research is an intelligent system enabling intuitive collaboration between CAD system and physicians. And also, improved evaluation process is able to effectively reduce time and cost consuming for clinicians and patients.

  2. Validating Performance Level Descriptors (PLDs) for the AP® Environmental Science Exam

    ERIC Educational Resources Information Center

    Reshetar, Rosemary; Kaliski, Pamela; Chajewski, Michael; Lionberger, Karen

    2012-01-01

    This presentation summarizes a pilot study conducted after the May 2011 administration of the AP Environmental Science Exam. The study used analytical methods based on scaled anchoring as input to a Performance Level Descriptor validation process that solicited systematic input from subject matter experts.

  3. Exploring simple, transparent, interpretable and predictive QSAR models for classification and quantitative prediction of rat toxicity of ionic liquids using OECD recommended guidelines.

    PubMed

    Das, Rudra Narayan; Roy, Kunal; Popelier, Paul L A

    2015-11-01

    The present study explores the chemical attributes of diverse ionic liquids responsible for their cytotoxicity in a rat leukemia cell line (IPC-81) by developing predictive classification as well as regression-based mathematical models. Simple and interpretable descriptors derived from a two-dimensional representation of the chemical structures along with quantum topological molecular similarity indices have been used for model development, employing unambiguous modeling strategies that strictly obey the guidelines of the Organization for Economic Co-operation and Development (OECD) for quantitative structure-activity relationship (QSAR) analysis. The structure-toxicity relationships that emerged from both classification and regression-based models were in accordance with the findings of some previous studies. The models suggested that the cytotoxicity of ionic liquids is dependent on the cationic surfactant action, long alkyl side chains, cationic lipophilicity as well as aromaticity, the presence of a dialkylamino substituent at the 4-position of the pyridinium nucleus and a bulky anionic moiety. The models have been transparently presented in the form of equations, thus allowing their easy transferability in accordance with the OECD guidelines. The models have also been subjected to rigorous validation tests proving their predictive potential and can hence be used for designing novel and "greener" ionic liquids. The major strength of the present study lies in the use of a diverse and large dataset, use of simple reproducible descriptors and compliance with the OECD norms. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Semi-automated detection of anterior cruciate ligament injury from MRI.

    PubMed

    Štajduhar, Ivan; Mamula, Mihaela; Miletić, Damir; Ünal, Gözde

    2017-03-01

    A radiologist's work in detecting various injuries or pathologies from radiological scans can be tiresome, time consuming and prone to errors. The field of computer-aided diagnosis aims to reduce these factors by introducing a level of automation in the process. In this paper, we deal with the problem of detecting the presence of anterior cruciate ligament (ACL) injury in a human knee. We examine the possibility of aiding the diagnosis process by building a decision-support model for detecting the presence of milder ACL injuries (not requiring operative treatment) and complete ACL ruptures (requiring operative treatment) from sagittal plane magnetic resonance (MR) volumes of human knees. Histogram of oriented gradient (HOG) descriptors and gist descriptors are extracted from manually selected rectangular regions of interest enveloping the wider cruciate ligament area. Performance of two machine-learning models is explored, coupled with both feature extraction methods: support vector machine (SVM) and random forests model. Model generalisation properties were determined by performing multiple iterations of stratified 10-fold cross validation whilst observing the area under the curve (AUC) score. Sagittal plane knee joint MR data was retrospectively gathered at the Clinical Hospital Centre Rijeka, Croatia, from 2007 until 2014. Type of ACL injury was established in a double-blind fashion by comparing the retrospectively set diagnosis against the prospective opinion of another radiologist. After clean up, the resulting dataset consisted of 917 usable labelled exam sequences of left or right knees. Experimental results suggest that a linear-kernel SVM learned from HOG descriptors has the best generalisation properties among the experimental models compared, having an area under the curve of 0.894 for the injury-detection problem and 0.943 for the complete-rupture-detection problem. Although the problem of performing semi-automated ACL-injury diagnosis by observing knee-joint MR volumes alone is a difficult one, experimental results suggest potential clinical application of computer-aided decision making, both for detecting milder injuries and detecting complete ruptures. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. Evaluation of vegetation post-fire resilience in the Alpine region using descriptors derived from MODIS spectral index time series

    NASA Astrophysics Data System (ADS)

    Di Mauro, Biagio; Fava, Francesco; Busetto, Lorenzo; Crosta, Giovanni Franco; Colombo, Roberto

    2013-04-01

    In this study a method based on the analysis of MODerate-resolution Imaging Spectroradiometer (MODIS) time series is proposed to estimate the post-fire resilience of mountain vegetation (broadleaf forest and prairies) in the Italian Alps. Resilience is defined herewith as the ability of a dynamical system to counteract disturbances. It can be quantified by the amount of time the disturbed system takes to resume, in statistical terms, an ecological functionality comparable with its undisturbed behavior. Satellite images of the Normalized Difference Vegetation Index (NDVI) and of the Enhanced Vegetation Index (EVI) with spatial resolution of 250m and temporal resolution of 16 days in the 2000-2012 time period were used. Wildfire affected areas in the Lombardy region between the years 2000 and 2010 were analysed. Only large fires (affected area >40ha) were selected. For each burned area, an undisturbed adjacent control site was located. Data pre-processing consisted in the smoothing of MODIS time series for noise removal and then a double logistic function was fitted. Land surface phenology descriptors (proxies for growing season start/end/length and green biomass) were extracted in order to characterize the time evolution of the vegetation. Descriptors from a burned area were compared to those extracted from the respective control site by means of the one-way analysis of variance. According to the number of subsequent years which exhibit statistically meaningful difference between burned and control site, five classes of resilience were identified and a set of thematic maps was created for each descriptor. The same method was applied to all 84 aggregated events and to events aggregated by main land cover. EVI index results more sensitive to fire impact than NDVI index. Analysis shows that fire causes both a reduction of the biomass and a variation in the phenology of the Alpine vegetation. Results suggest an average ecosystem resilience of 6-7 years. Moreover, broadleaf forest and prairies show different post-fire behavior in terms of land surface phenology descriptors. In addition to the above analysis, another method is proposed, which derives from the qualitative theory of dynamical systems. The (time dependent) spectral index of a burned area over the period of one year was plotted against its counterpart from the control site. Yearly plots (or scattergrams) before and after the fire were obtained. Each plot is a sequence of points on the plane, which are the vertices of a generally self-intersecting polygonal chain. Some geometrical descriptors were obtained from the yearly chains of each fire. Principal Components Analysis (PCA) of geometrical descriptors was applied to a set of case studies and the obtained results provide a system dynamics interpretation of the natural process.

  6. A Concise Guide to Feature Histograms with Applications to LIDAR-Based Spacecraft Relative Navigation

    NASA Astrophysics Data System (ADS)

    Rhodes, Andrew P.; Christian, John A.; Evans, Thomas

    2017-12-01

    With the availability and popularity of 3D sensors, it is advantageous to re-examine the use of point cloud descriptors for the purpose of pose estimation and spacecraft relative navigation. One popular descriptor is the oriented unique repeatable clustered viewpoint feature histogram (OUR-CVFH), which is most often utilized in personal and industrial robotics to simultaneously recognize and navigate relative to an object. Recent research into using the OUR-CVFH descriptor for spacecraft navigation has produced favorable results. Since OUR-CVFH is the most recent innovation in a large family of feature histogram point cloud descriptors, discussions of parameter settings and insights into its functionality are spread among various publications and online resources. This paper organizes the history of feature histogram point cloud descriptors for a straightforward explanation of their evolution. This article compiles all the requisite information needed to implement OUR-CVFH into one location, as well as providing useful suggestions on how to tune the generation parameters. This work is beneficial for anyone interested in using this histogram descriptor for object recognition or navigation - may it be personal robotics or spacecraft navigation.

  7. ANN expert system screening for illicit amphetamines using molecular descriptors

    NASA Astrophysics Data System (ADS)

    Gosav, S.; Praisler, M.; Dorohoi, D. O.

    2007-05-01

    The goal of this study was to develop and an artificial neural network (ANN) based on computed descriptors, which would be able to classify the molecular structures of potential illicit amphetamines and to derive their biological activity according to the similarity of their molecular structure with amphetamines of known toxicity. The system is necessary for testing new molecular structures for epidemiological, clinical, and forensic purposes. It was built using a database formed by 146 compounds representing drugs of abuse (mainly central stimulants, hallucinogens, sympathomimetic amines, narcotics and other potent analgesics), precursors, or derivatized counterparts. Their molecular structures were characterized by computing three types of descriptors: 38 constitutional descriptors (CDs), 69 topological descriptors (TDs) and 160 3D-MoRSE descriptors (3DDs). An ANN system was built for each category of variables. All three networks (CD-NN, TD-NN and 3DD-NN) were trained to distinguish between stimulant amphetamines, hallucinogenic amphetamines, and nonamphetamines. A selection of variables was performed when necessary. The efficiency with which each network identifies the class identity of an unknown sample was evaluated by calculating several figures of merit. The results of the comparative analysis are presented.

  8. Ligand Electron Density Shape Recognition Using 3D Zernike Descriptors

    NASA Astrophysics Data System (ADS)

    Gunasekaran, Prasad; Grandison, Scott; Cowtan, Kevin; Mak, Lora; Lawson, David M.; Morris, Richard J.

    We present a novel approach to crystallographic ligand density interpretation based on Zernike shape descriptors. Electron density for a bound ligand is expanded in an orthogonal polynomial series (3D Zernike polynomials) and the coefficients from this expansion are employed to construct rotation-invariant descriptors. These descriptors can be compared highly efficiently against large databases of descriptors computed from other molecules. In this manuscript we describe this process and show initial results from an electron density interpretation study on a dataset containing over a hundred OMIT maps. We could identify the correct ligand as the first hit in about 30 % of the cases, within the top five in a further 30 % of the cases, and giving rise to an 80 % probability of getting the correct ligand within the top ten matches. In all but a few examples, the top hit was highly similar to the correct ligand in both shape and chemistry. Further extensions and intrinsic limitations of the method are discussed.

  9. Using DFT methodology for more reliable predictive models: Design of inhibitors of Golgi α-Mannosidase II.

    PubMed

    Bobovská, Adela; Tvaroška, Igor; Kóňa, Juraj

    2016-05-01

    Human Golgi α-mannosidase II (GMII), a zinc ion co-factor dependent glycoside hydrolase (E.C.3.2.1.114), is a pharmaceutical target for the design of inhibitors with anti-cancer activity. The discovery of an effective inhibitor is complicated by the fact that all known potent inhibitors of GMII are involved in unwanted co-inhibition with lysosomal α-mannosidase (LMan, E.C.3.2.1.24), a relative to GMII. Routine empirical QSAR models for both GMII and LMan did not work with a required accuracy. Therefore, we have developed a fast computational protocol to build predictive models combining interaction energy descriptors from an empirical docking scoring function (Glide-Schrödinger), Linear Interaction Energy (LIE) method, and quantum mechanical density functional theory (QM-DFT) calculations. The QSAR models were built and validated with a library of structurally diverse GMII and LMan inhibitors and non-active compounds. A critical role of QM-DFT descriptors for the more accurate prediction abilities of the models is demonstrated. The predictive ability of the models was significantly improved when going from the empirical docking scoring function to mixed empirical-QM-DFT QSAR models (Q(2)=0.78-0.86 when cross-validation procedures were carried out; and R(2)=0.81-0.83 for a testing set). The average error for the predicted ΔGbind decreased to 0.8-1.1kcalmol(-1). Also, 76-80% of non-active compounds were successfully filtered out from GMII and LMan inhibitors. The QSAR models with the fragmented QM-DFT descriptors may find a useful application in structure-based drug design where pure empirical and force field methods reached their limits and where quantum mechanics effects are critical for ligand-receptor interactions. The optimized models will apply in lead optimization processes for GMII drug developments. Copyright © 2016 Elsevier Inc. All rights reserved.

  10. Machine learning study for the prediction of transdermal peptide

    NASA Astrophysics Data System (ADS)

    Jung, Eunkyoung; Choi, Seung-Hoon; Lee, Nam Kyung; Kang, Sang-Kee; Choi, Yun-Jaie; Shin, Jae-Min; Choi, Kihang; Jung, Dong Hyun

    2011-04-01

    In order to develop a computational method to rapidly evaluate transdermal peptides, we report approaches for predicting the transdermal activity of peptides on the basis of peptide sequence information using Artificial Neural Network (ANN), Partial Least Squares (PLS) and Support Vector Machine (SVM). We identified 269 transdermal peptides by the phage display technique and use them as the positive controls to develop and test machine learning models. Combinations of three descriptors with neural network architectures, the number of latent variables and the kernel functions are tried in training to make appropriate predictions. The capacity of models is evaluated by means of statistical indicators including sensitivity, specificity, and the area under the receiver operating characteristic curve (ROC score). In the ROC score-based comparison, three methods proved capable of providing a reasonable prediction of transdermal peptide. The best result is obtained by SVM model with a radial basis function and VHSE descriptors. The results indicate that it is possible to discriminate between transdermal peptides and random sequences using our models. We anticipate that our models will be applicable to prediction of transdermal peptide for large peptide database for facilitating efficient transdermal drug delivery through intact skin.

  11. Developing descriptors to predict mechanical properties of nanotubes.

    PubMed

    Borders, Tammie L; Fonseca, Alexandre F; Zhang, Hengji; Cho, Kyeongjae; Rusinko, Andrew

    2013-04-22

    Descriptors and quantitative structure property relationships (QSPR) were investigated for mechanical property prediction of carbon nanotubes (CNTs). 78 molecular dynamics (MD) simulations were carried out, and 20 descriptors were calculated to build quantitative structure property relationships (QSPRs) for Young's modulus and Poisson's ratio in two separate analyses: vacancy only and vacancy plus methyl functionalization. In the first analysis, C(N2)/C(T) (number of non-sp2 hybridized carbons per the total carbons) and chiral angle were identified as critical descriptors for both Young's modulus and Poisson's ratio. Further analysis and literature findings indicate the effect of chiral angle is negligible at larger CNT radii for both properties. Raman spectroscopy can be used to measure C(N2)/C(T), providing a direct link between experimental and computational results. Poisson's ratio approaches two different limiting values as CNT radii increases: 0.23-0.25 for chiral and armchair CNTs and 0.10 for zigzag CNTs (surface defects <3%). In the second analysis, the critical descriptors were C(N2)/C(T), chiral angle, and M(N)/C(T) (number of methyl groups per total carbons). These results imply new types of defects can be represented as a new descriptor in QSPR models. Finally, results are qualified and quantified against experimental data.

  12. Genetic variability in Brazilian Capsicum baccatum germplasm collection assessed by morphological fruit traits and AFLP markers

    PubMed Central

    Giacomin, Renata M.; Ruas, Paulo M.; Ruas, Eduardo A.; Barbieri, Rosa L.; Rodrigues, Rosana

    2018-01-01

    Capsicum baccatum is one of the main pepper species grown and consumed in South America. In Brazil, it is commonly cultivated by family farmers, using mostly the genotypes bishop's hat genotypes (locally cambuci) and red chili pepper (dedo-de-moça). This study had the objective of characterizing 116 C. baccatum accessions from different regions of Brazil, based on morphological fruit descriptors and AFLP (Amplified Fragment Length Polymorphisms) markers. Broad phenotypic variability among the C. baccatum accessions was detected when using morphological fruit descriptors. The Ward modified location model (Ward-MLM) discriminated five groups, based mainly on fruit shape. Six combinations of AFLP primers detected polymorphism in 97.93% of the 2466 identified bands, indicating the high genetic variability in the accessions. The UPGMA coincided with the Bayesian clustering analysis and three large groups were formed, separating the wild variety C. baccatum var. praetermissum from the other accessions. There was no relation between genetic distance and geographical origin of the accessions, probably due to the intense exchange of fruits and seeds between farmers. Morphological descriptors used together with AFLP markers proved efficient in detecting the levels of genetic variability among the accessions maintained in the germplasm collections. These results can be used as an additional source of helpful information to be exploited in C. baccatum breeding programs. PMID:29758023

  13. Learning moment-based fast local binary descriptor

    NASA Astrophysics Data System (ADS)

    Bellarbi, Abdelkader; Zenati, Nadia; Otmane, Samir; Belghit, Hayet

    2017-03-01

    Recently, binary descriptors have attracted significant attention due to their speed and low memory consumption; however, using intensity differences to calculate the binary descriptive vector is not efficient enough. We propose an approach to binary description called POLAR_MOBIL, in which we perform binary tests between geometrical and statistical information using moments in the patch instead of the classical intensity binary test. In addition, we introduce a learning technique used to select an optimized set of binary tests with low correlation and high variance. This approach offers high distinctiveness against affine transformations and appearance changes. An extensive evaluation on well-known benchmark datasets reveals the robustness and the effectiveness of the proposed descriptor, as well as its good performance in terms of low computation complexity when compared with state-of-the-art real-time local descriptors.

  14. A dynamic appearance descriptor approach to facial actions temporal modeling.

    PubMed

    Jiang, Bihan; Valstar, Michel; Martinez, Brais; Pantic, Maja

    2014-02-01

    Both the configuration and the dynamics of facial expressions are crucial for the interpretation of human facial behavior. Yet to date, the vast majority of reported efforts in the field either do not take the dynamics of facial expressions into account, or focus only on prototypic facial expressions of six basic emotions. Facial dynamics can be explicitly analyzed by detecting the constituent temporal segments in Facial Action Coding System (FACS) Action Units (AUs)-onset, apex, and offset. In this paper, we present a novel approach to explicit analysis of temporal dynamics of facial actions using the dynamic appearance descriptor Local Phase Quantization from Three Orthogonal Planes (LPQ-TOP). Temporal segments are detected by combining a discriminative classifier for detecting the temporal segments on a frame-by-frame basis with Markov Models that enforce temporal consistency over the whole episode. The system is evaluated in detail over the MMI facial expression database, the UNBC-McMaster pain database, the SAL database, the GEMEP-FERA dataset in database-dependent experiments, in cross-database experiments using the Cohn-Kanade, and the SEMAINE databases. The comparison with other state-of-the-art methods shows that the proposed LPQ-TOP method outperforms the other approaches for the problem of AU temporal segment detection, and that overall AU activation detection benefits from dynamic appearance information.

  15. Encompassing receptor flexibility in virtual screening using ensemble docking-based hybrid QSAR: discovery of novel phytochemicals for BACE1 inhibition.

    PubMed

    Chakraborty, Sandipan; Ramachandran, Balaji; Basu, Soumalee

    2014-10-01

    Mimicking receptor flexibility during receptor-ligand binding is a challenging task in computational drug design since it is associated with a large increase in the conformational search space. In the present study, we have devised an in silico design strategy incorporating receptor flexibility in virtual screening to identify potential lead compounds as inhibitors for flexible proteins. We have considered BACE1 (β-secretase), a key target protease from a therapeutic perspective for Alzheimer's disease, as the highly flexible receptor. The protein undergoes significant conformational transitions from open to closed form upon ligand binding, which makes it a difficult target for inhibitor design. We have designed a hybrid structure-activity model containing both ligand based descriptors and energetic descriptors obtained from molecular docking based on a dataset of structurally diverse BACE1 inhibitors. An ensemble of receptor conformations have been used in the docking study, further improving the prediction ability of the model. The designed model that shows significant prediction ability judged by several statistical parameters has been used to screen an in house developed 3-D structural library of 731 phytochemicals. 24 highly potent, novel BACE1 inhibitors with predicted activity (Ki) ≤ 50 nM have been identified. Detailed analysis reveals pharmacophoric features of these novel inhibitors required to inhibit BACE1.

  16. New antitrichomonal drug-like chemicals selected by bond (edge)-based TOMOCOMD-CARDD descriptors.

    PubMed

    Meneses-Marcel, Alfredo; Rivera-Borroto, Oscar M; Marrero-Ponce, Yovani; Montero, Alina; Machado Tugores, Yanetsy; Escario, José Antonio; Gómez Barrio, Alicia; Montero Pereira, David; Nogal, Juan José; Kouznetsov, Vladimir V; Ochoa Puentes, Cristian; Bohórquez, Arnold R; Grau, Ricardo; Torrens, Francisco; Ibarra-Velarde, Froylán; Arán, Vicente J

    2008-09-01

    Bond-based quadratic indices, new TOMOCOMD-CARDD molecular descriptors, and linear discriminant analysis (LDA) were used to discover novel lead trichomonacidals. The obtained LDA-based quantitative structure-activity relationships (QSAR) models, using nonstochastic and stochastic indices, were able to classify correctly 87.91% (87.50%) and 89.01% (84.38%) of the chemicals in training (test) sets, respectively. They showed large Matthews correlation coefficients of 0.75 (0.71) and 0.78 (0.65) for the training (test) sets, correspondingly. Later, both models were applied to the virtual screening of 21 chemicals to find new lead antitrichomonal agents. Predictions agreed with experimental results to a great extent because a correct classification for both models of 95.24% (20 of 21) of the chemicals was obtained. Of the 21 compounds that were screened and synthesized, 2 molecules (chemicals G-1, UC-245) showed high to moderate cytocidal activity at the concentration of 10 microg/ml, another 2 compounds (G-0 and CRIS-148) showed high cytocidal activity only at the concentration of 100 microg/ml, and the remaining chemicals (from CRIS-105 to CRIS-153, except CRIS-148) were inactive at these assayed concentrations. Finally, the best candidate, G-1 (cytocidal activity of 100% at 10 microg/ml) was in vivo assayed in ovariectomized Wistar rats achieving promising results as a trichomonacidal drug-like compound.

  17. Introducing a new methodology for the calculation of local philicity and multiphilic descriptor: an alternative to the finite difference approximation

    NASA Astrophysics Data System (ADS)

    Sánchez-Márquez, Jesús; Zorrilla, David; García, Víctor; Fernández, Manuel

    2018-07-01

    This work presents a new development based on the condensation scheme proposed by Chamorro and Pérez, in which new terms to correct the frozen molecular orbital approximation have been introduced (improved frontier molecular orbital approximation). The changes performed on the original development allow taking into account the orbital relaxation effects, providing equivalent results to those achieved by the finite difference approximation and leading also to a methodology with great advantages. Local reactivity indices based on this new development have been obtained for a sample set of molecules and they have been compared with those indices based on the frontier molecular orbital and finite difference approximations. A new definition based on the improved frontier molecular orbital methodology for the dual descriptor index is also shown. In addition, taking advantage of the characteristics of the definitions obtained with the new condensation scheme, the descriptor local philicity is analysed by separating the components corresponding to the frontier molecular orbital approximation and orbital relaxation effects, analysing also the local parameter multiphilic descriptor in the same way. Finally, the effect of using the basis set is studied and calculations using DFT, CI and Möller-Plesset methodologies are performed to analyse the consequence of different electronic-correlation levels.

  18. Computational analysis of structure-based interactions and ligand properties can predict efflux effects on antibiotics.

    PubMed

    Sarkar, Aurijit; Anderson, Kelcey C; Kellogg, Glen E

    2012-06-01

    AcrA-AcrB-TolC efflux pumps extrude drugs of multiple classes from bacterial cells and are a leading cause for antimicrobial resistance. Thus, they are of paramount interest to those engaged in antibiotic discovery. Accurate prediction of antibiotic efflux has been elusive, despite several studies aimed at this purpose. Minimum inhibitory concentration (MIC) ratios of 32 β-lactam antibiotics were collected from literature. 3-Dimensional Quantitative Structure-Activity Relationship on the β-lactam antibiotic structures revealed seemingly predictive models (q(2)=0.53), but the lack of a general superposition rule does not allow its use on antibiotics that lack the β-lactam moiety. Since MIC ratios must depend on interactions of antibiotics with lipid membranes and transport proteins during influx, capture and extrusion of antibiotics from the bacterial cell, descriptors representing these factors were calculated and used in building mathematical models that quantitatively classify antibiotics as having high/low efflux (>93% accuracy). Our models provide preliminary evidence that it is possible to predict the effects of antibiotic efflux if the passage of antibiotics into, and out of, bacterial cells is taken into account--something descriptor and field-based QSAR models cannot do. While the paucity of data in the public domain remains the limiting factor in such studies, these models show significant improvements in predictions over simple LogP-based regression models and should pave the path toward further work in this field. This method should also be extensible to other pharmacologically and biologically relevant transport proteins. Copyright © 2012 Elsevier Masson SAS. All rights reserved.

  19. Ligand- and receptor-based docking with LiBELa

    NASA Astrophysics Data System (ADS)

    dos Santos Muniz, Heloisa; Nascimento, Alessandro S.

    2015-08-01

    Methodologies on molecular docking are constantly improving. The problem consists on finding an optimal interplay between the computational cost and a satisfactory physical description of ligand-receptor interaction. In pursuit of an advance in current methods we developed a mixed docking approach combining ligand- and receptor-based strategies in a docking engine, where tridimensional descriptors for shape and charge distribution of a reference ligand guide the initial placement of the docking molecule and an interaction energy-based global minimization follows. This hybrid docking was evaluated with soft-core and force field potentials taking into account ligand pose and scoring. Our approach was found to be competitive to a purely receptor-based dock resulting in improved logAUC values when evaluated with DUD and DUD-E. Furthermore, the smoothed potential as evaluated here, was not advantageous when ligand binding poses were compared to experimentally determined conformations. In conclusion we show that a combination of ligand- and receptor-based strategy docking with a force field energy model results in good reproduction of binding poses and enrichment of active molecules against decoys. This strategy is implemented in our tool, LiBELa, available to the scientific community.

  20. An Integrated Ransac and Graph Based Mismatch Elimination Approach for Wide-Baseline Image Matching

    NASA Astrophysics Data System (ADS)

    Hasheminasab, M.; Ebadi, H.; Sedaghat, A.

    2015-12-01

    In this paper we propose an integrated approach in order to increase the precision of feature point matching. Many different algorithms have been developed as to optimizing the short-baseline image matching while because of illumination differences and viewpoints changes, wide-baseline image matching is so difficult to handle. Fortunately, the recent developments in the automatic extraction of local invariant features make wide-baseline image matching possible. The matching algorithms which are based on local feature similarity principle, using feature descriptor as to establish correspondence between feature point sets. To date, the most remarkable descriptor is the scale-invariant feature transform (SIFT) descriptor , which is invariant to image rotation and scale, and it remains robust across a substantial range of affine distortion, presence of noise, and changes in illumination. The epipolar constraint based on RANSAC (random sample consensus) method is a conventional model for mismatch elimination, particularly in computer vision. Because only the distance from the epipolar line is considered, there are a few false matches in the selected matching results based on epipolar geometry and RANSAC. Aguilariu et al. proposed Graph Transformation Matching (GTM) algorithm to remove outliers which has some difficulties when the mismatched points surrounded by the same local neighbor structure. In this study to overcome these limitations, which mentioned above, a new three step matching scheme is presented where the SIFT algorithm is used to obtain initial corresponding point sets. In the second step, in order to reduce the outliers, RANSAC algorithm is applied. Finally, to remove the remained mismatches, based on the adjacent K-NN graph, the GTM is implemented. Four different close range image datasets with changes in viewpoint are utilized to evaluate the performance of the proposed method and the experimental results indicate its robustness and capability.

  1. Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity.

    PubMed

    Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D

    2014-03-25

    A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.

  2. Toxicity evaluation and prediction of toxic chemicals on activated sludge system.

    PubMed

    Cai, Bijing; Xie, Li; Yang, Dianhai; Arcangeli, Jean-Pierre

    2010-05-15

    The gaps of data for evaluating toxicity of new or overloaded organic chemicals on activated sludge system resulted in the requirements for methodology of toxicity estimation. In this study, 24 aromatic chemicals typically existed in the industrial wastewater were selected and classified into three groups of benzenes, phenols and anilines. Their toxicity on activated sludge was then investigated. Two indexes of IC(50-M) and IC(50-S) were determined respectively from the respiration rates of activated sludge with different toxicant concentration at mid-term (24h) and short-term (30min) time intervals. Experimental results showed that the group of benzenes was the most toxic, followed by the groups of phenols and anilines. The values of IC(50-M) of the tested chemicals were higher than those of IC(50-S). In addition, quantitative structure-activity relationships (QSARs) models developed from IC(50-M) were more stable and accurate than those of IC(50-S). The multiple linear models based on molecular descriptors and K(ow) presented better reliability than single linear models based on K(ow). Among these molecular descriptors, E(lumo) was the most important impact factor for evaluation of mid-term toxicity. Copyright (c) 2009 Elsevier B.V. All rights reserved.

  3. Skin image illumination modeling and chromophore identification for melanoma diagnosis

    NASA Astrophysics Data System (ADS)

    Liu, Zhao; Zerubia, Josiane

    2015-05-01

    The presence of illumination variation in dermatological images has a negative impact on the automatic detection and analysis of cutaneous lesions. This paper proposes a new illumination modeling and chromophore identification method to correct lighting variation in skin lesion images, as well as to extract melanin and hemoglobin concentrations of human skin, based on an adaptive bilateral decomposition and a weighted polynomial curve fitting, with the knowledge of a multi-layered skin model. Different from state-of-the-art approaches based on the Lambert law, the proposed method, considering both specular reflection and diffuse reflection of the skin, enables us to address highlight and strong shading effects usually existing in skin color images captured in an uncontrolled environment. The derived melanin and hemoglobin indices, directly relating to the pathological tissue conditions, tend to be less influenced by external imaging factors and are more efficient in describing pigmentation distributions. Experiments show that the proposed method gave better visual results and superior lesion segmentation, when compared to two other illumination correction algorithms, both designed specifically for dermatological images. For computer-aided diagnosis of melanoma, sensitivity achieves 85.52% when using our chromophore descriptors, which is 8~20% higher than those derived from other color descriptors. This demonstrates the benefit of the proposed method for automatic skin disease analysis.

  4. Discrimination Power of Polynomial-Based Descriptors for Graphs by Using Functional Matrices.

    PubMed

    Dehmer, Matthias; Emmert-Streib, Frank; Shi, Yongtang; Stefu, Monica; Tripathi, Shailesh

    2015-01-01

    In this paper, we study the discrimination power of graph measures that are based on graph-theoretical matrices. The paper generalizes the work of [M. Dehmer, M. Moosbrugger. Y. Shi, Encoding structural information uniquely with polynomial-based descriptors by employing the Randić matrix, Applied Mathematics and Computation, 268(2015), 164-168]. We demonstrate that by using the new functional matrix approach, exhaustively generated graphs can be discriminated more uniquely than shown in the mentioned previous work.

  5. Discrimination Power of Polynomial-Based Descriptors for Graphs by Using Functional Matrices

    PubMed Central

    Dehmer, Matthias; Emmert-Streib, Frank; Shi, Yongtang; Stefu, Monica; Tripathi, Shailesh

    2015-01-01

    In this paper, we study the discrimination power of graph measures that are based on graph-theoretical matrices. The paper generalizes the work of [M. Dehmer, M. Moosbrugger. Y. Shi, Encoding structural information uniquely with polynomial-based descriptors by employing the Randić matrix, Applied Mathematics and Computation, 268(2015), 164–168]. We demonstrate that by using the new functional matrix approach, exhaustively generated graphs can be discriminated more uniquely than shown in the mentioned previous work. PMID:26479495

  6. QSAR Modeling and Prediction of Drug-Drug Interactions.

    PubMed

    Zakharov, Alexey V; Varlamova, Ekaterina V; Lagunin, Alexey A; Dmitriev, Alexander V; Muratov, Eugene N; Fourches, Denis; Kuz'min, Victor E; Poroikov, Vladimir V; Tropsha, Alexander; Nicklaus, Marc C

    2016-02-01

    Severe adverse drug reactions (ADRs) are the fourth leading cause of fatality in the U.S. with more than 100,000 deaths per year. As up to 30% of all ADRs are believed to be caused by drug-drug interactions (DDIs), typically mediated by cytochrome P450s, possibilities to predict DDIs from existing knowledge are important. We collected data from public sources on 1485, 2628, 4371, and 27,966 possible DDIs mediated by four cytochrome P450 isoforms 1A2, 2C9, 2D6, and 3A4 for 55, 73, 94, and 237 drugs, respectively. For each of these data sets, we developed and validated QSAR models for the prediction of DDIs. As a unique feature of our approach, the interacting drug pairs were represented as binary chemical mixtures in a 1:1 ratio. We used two types of chemical descriptors: quantitative neighborhoods of atoms (QNA) and simplex descriptors. Radial basis functions with self-consistent regression (RBF-SCR) and random forest (RF) were utilized to build QSAR models predicting the likelihood of DDIs for any pair of drug molecules. Our models showed balanced accuracy of 72-79% for the external test sets with a coverage of 81.36-100% when a conservative threshold for the model's applicability domain was applied. We generated virtually all possible binary combinations of marketed drugs and employed our models to identify drug pairs predicted to be instances of DDI. More than 4500 of these predicted DDIs that were not found in our training sets were confirmed by data from the DrugBank database.

  7. HCS-Neurons: identifying phenotypic changes in multi-neuron images upon drug treatments of high-content screening.

    PubMed

    Charoenkwan, Phasit; Hwang, Eric; Cutler, Robert W; Lee, Hua-Chin; Ko, Li-Wei; Huang, Hui-Ling; Ho, Shinn-Ying

    2013-01-01

    High-content screening (HCS) has become a powerful tool for drug discovery. However, the discovery of drugs targeting neurons is still hampered by the inability to accurately identify and quantify the phenotypic changes of multiple neurons in a single image (named multi-neuron image) of a high-content screen. Therefore, it is desirable to develop an automated image analysis method for analyzing multi-neuron images. We propose an automated analysis method with novel descriptors of neuromorphology features for analyzing HCS-based multi-neuron images, called HCS-neurons. To observe multiple phenotypic changes of neurons, we propose two kinds of descriptors which are neuron feature descriptor (NFD) of 13 neuromorphology features, e.g., neurite length, and generic feature descriptors (GFDs), e.g., Haralick texture. HCS-neurons can 1) automatically extract all quantitative phenotype features in both NFD and GFDs, 2) identify statistically significant phenotypic changes upon drug treatments using ANOVA and regression analysis, and 3) generate an accurate classifier to group neurons treated by different drug concentrations using support vector machine and an intelligent feature selection method. To evaluate HCS-neurons, we treated P19 neurons with nocodazole (a microtubule depolymerizing drug which has been shown to impair neurite development) at six concentrations ranging from 0 to 1000 ng/mL. The experimental results show that all the 13 features of NFD have statistically significant difference with respect to changes in various levels of nocodazole drug concentrations (NDC) and the phenotypic changes of neurites were consistent to the known effect of nocodazole in promoting neurite retraction. Three identified features, total neurite length, average neurite length, and average neurite area were able to achieve an independent test accuracy of 90.28% for the six-dosage classification problem. This NFD module and neuron image datasets are provided as a freely downloadable MatLab project at http://iclab.life.nctu.edu.tw/HCS-Neurons. Few automatic methods focus on analyzing multi-neuron images collected from HCS used in drug discovery. We provided an automatic HCS-based method for generating accurate classifiers to classify neurons based on their phenotypic changes upon drug treatments. The proposed HCS-neurons method is helpful in identifying and classifying chemical or biological molecules that alter the morphology of a group of neurons in HCS.

  8. Predicting in vivo effect levels for repeat-dose systemic toxicity using chemical, biological, kinetic and study covariates.

    PubMed

    Truong, Lisa; Ouedraogo, Gladys; Pham, LyLy; Clouzeau, Jacques; Loisel-Joubert, Sophie; Blanchet, Delphine; Noçairi, Hicham; Setzer, Woodrow; Judson, Richard; Grulke, Chris; Mansouri, Kamel; Martin, Matthew

    2018-02-01

    In an effort to address a major challenge in chemical safety assessment, alternative approaches for characterizing systemic effect levels, a predictive model was developed. Systemic effect levels were curated from ToxRefDB, HESS-DB and COSMOS-DB from numerous study types totaling 4379 in vivo studies for 1247 chemicals. Observed systemic effects in mammalian models are a complex function of chemical dynamics, kinetics, and inter- and intra-individual variability. To address this complex problem, systemic effect levels were modeled at the study-level by leveraging study covariates (e.g., study type, strain, administration route) in addition to multiple descriptor sets, including chemical (ToxPrint, PaDEL, and Physchem), biological (ToxCast), and kinetic descriptors. Using random forest modeling with cross-validation and external validation procedures, study-level covariates alone accounted for approximately 15% of the variance reducing the root mean squared error (RMSE) from 0.96 log 10 to 0.85 log 10  mg/kg/day, providing a baseline performance metric (lower expectation of model performance). A consensus model developed using a combination of study-level covariates, chemical, biological, and kinetic descriptors explained a total of 43% of the variance with an RMSE of 0.69 log 10  mg/kg/day. A benchmark model (upper expectation of model performance) was also developed with an RMSE of 0.5 log 10  mg/kg/day by incorporating study-level covariates and the mean effect level per chemical. To achieve a representative chemical-level prediction, the minimum study-level predicted and observed effect level per chemical were compared reducing the RMSE from 1.0 to 0.73 log 10  mg/kg/day, equivalent to 87% of predictions falling within an order-of-magnitude of the observed value. Although biological descriptors did not improve model performance, the final model was enriched for biological descriptors that indicated xenobiotic metabolism gene expression, oxidative stress, and cytotoxicity, demonstrating the importance of accounting for kinetics and non-specific bioactivity in predicting systemic effect levels. Herein, we generated an externally predictive model of systemic effect levels for use as a safety assessment tool and have generated forward predictions for over 30,000 chemicals.

  9. Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information

    NASA Astrophysics Data System (ADS)

    Sushko, Iurii; Novotarskyi, Sergii; Körner, Robert; Pandey, Anil Kumar; Rupp, Matthias; Teetz, Wolfram; Brandmaier, Stefan; Abdelaziz, Ahmed; Prokopenko, Volodymyr V.; Tanchuk, Vsevolod Y.; Todeschini, Roberto; Varnek, Alexandre; Marcou, Gilles; Ertl, Peter; Potemkin, Vladimir; Grishina, Maria; Gasteiger, Johann; Schwab, Christof; Baskin, Igor I.; Palyulin, Vladimir A.; Radchenko, Eugene V.; Welsh, William J.; Kholodovych, Vladyslav; Chekmarev, Dmitriy; Cherkasov, Artem; Aires-de-Sousa, Joao; Zhang, Qing-You; Bender, Andreas; Nigsch, Florian; Patiny, Luc; Williams, Antony; Tkachenko, Valery; Tetko, Igor V.

    2011-06-01

    The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.

  10. A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening

    PubMed Central

    2014-01-01

    Background Measures of similarity for chemical molecules have been developed since the dawn of chemoinformatics. Molecular similarity has been measured by a variety of methods including molecular descriptor based similarity, common molecular fragments, graph matching and 3D methods such as shape matching. Similarity measures are widespread in practice and have proven to be useful in drug discovery. Because of our interest in electrostatics and high throughput ligand-based virtual screening, we sought to exploit the information contained in atomic coordinates and partial charges of a molecule. Results A new molecular descriptor based on partial charges is proposed. It uses the autocorrelation function and linear binning to encode all atoms of a molecule into two rotation-translation invariant vectors. Combined with a scoring function, the descriptor allows to rank-order a database of compounds versus a query molecule. The proposed implementation is called ACPC (AutoCorrelation of Partial Charges) and released in open source. Extensive retrospective ligand-based virtual screening experiments were performed and other methods were compared with in order to validate the method and associated protocol. Conclusions While it is a simple method, it performed remarkably well in experiments. At an average speed of 1649 molecules per second, it reached an average median area under the curve of 0.81 on 40 different targets; hence validating the proposed protocol and implementation. PMID:24887178

  11. Singular spectrum decomposition of Bouligand-Minkowski fractal descriptors: an application to the classification of texture Images

    NASA Astrophysics Data System (ADS)

    Florindo, João. Batista

    2018-04-01

    This work proposes the use of Singular Spectrum Analysis (SSA) for the classification of texture images, more specifically, to enhance the performance of the Bouligand-Minkowski fractal descriptors in this task. Fractal descriptors are known to be a powerful approach to model and particularly identify complex patterns in natural images. Nevertheless, the multiscale analysis involved in those descriptors makes them highly correlated. Although other attempts to address this point was proposed in the literature, none of them investigated the relation between the fractal correlation and the well-established analysis employed in time series. And SSA is one of the most powerful techniques for this purpose. The proposed method was employed for the classification of benchmark texture images and the results were compared with other state-of-the-art classifiers, confirming the potential of this analysis in image classification.

  12. Analogue based design of MMP-13 (Collagenase-3) inhibitors.

    PubMed

    Sarma, J A R P; Rambabu, G; Srikanth, K; Raveendra, D; Vithal, M

    2002-10-07

    3D-QSAR studies using MFA and RSA methods were performed on a series of 39MMP-13 inhibitors. Model developed by MFA method has a r(2)(cv) (cross-validated) of 0.616 while its r(2) (conventional) value is 0.822. For the RSA model r(2)(cv) and r(2) are 0.681 and 0.847, respectively. Both the models indicate good internal as well as external predictive abilities. These models provide crucial information about the field descriptors for the design of potential inhibitors of MMP-13.

  13. Application of ant colony optimization in development of models for prediction of anti-HIV-1 activity of HEPT derivatives.

    PubMed

    Zare-Shahabadi, Vali; Abbasitabar, Fatemeh

    2010-09-01

    Quantitative structure-activity relationship models were derived for 107 analogs of 1-[(2-hydroxyethoxy) methyl]-6-(phenylthio)thymine, a potent inhibitor of the HIV-1 reverse transcriptase. The activities of these compounds were investigated by means of multiple linear regression (MLR) technique. An ant colony optimization algorithm, called Memorized_ACS, was applied for selecting relevant descriptors and detecting outliers. This algorithm uses an external memory based upon knowledge incorporation from previous iterations. At first, the memory is empty, and then it is filled by running several ACS algorithms. In this respect, after each ACS run, the elite ant is stored in the memory and the process is continued to fill the memory. Here, pheromone updating is performed by all elite ants collected in the memory; this results in improvements in both exploration and exploitation behaviors of the ACS algorithm. The memory is then made empty and is filled again by performing several ACS algorithms using updated pheromone trails. This process is repeated for several iterations. At the end, the memory contains several top solutions for the problem. Number of appearance of each descriptor in the external memory is a good criterion for its importance. Finally, prediction is performed by the elitist ant, and interpretation is carried out by considering the importance of each descriptor. The best MLR model has a training error of 0.47 log (1/EC(50)) units (R(2) = 0.90) and a prediction error of 0.76 log (1/EC(50)) units (R(2) = 0.88). Copyright 2010 Wiley Periodicals, Inc.

  14. A hierarchical clustering methodology for the estimation of toxicity.

    PubMed

    Martin, Todd M; Harten, Paul; Venkatapathy, Raghuraman; Das, Shashikala; Young, Douglas M

    2008-01-01

    ABSTRACT A quantitative structure-activity relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural similarity is defined in terms of 2-D physicochemical descriptors (such as connectivity and E-state indices). A genetic algorithm-based technique is used to generate statistically valid QSAR models for each cluster (using the pool of descriptors described above). The toxicity for a given query compound is estimated using the weighted average of the predictions from the closest cluster from each step in the hierarchical clustering assuming that the compound is within the domain of applicability of the cluster. The hierarchical clustering methodology was tested using a Tetrahymena pyriformis acute toxicity data set containing 644 chemicals in the training set and with two prediction sets containing 339 and 110 chemicals. The results from the hierarchical clustering methodology were compared to the results from several different QSAR methodologies.

  15. An Alternative Grading Tool for Enhancing Assessment Practice and Quality Assurance in Higher Education

    ERIC Educational Resources Information Center

    Grainger, Peter; Weir, Katie

    2016-01-01

    Assessing student learning in university courses is commonly done using a rubric that arranges the assessment criteria and standards descriptors in a matrix style or grid format. This paper introduces an alternative style of grading tool known as the continua model of a guide to making judgements, which arranges assessment criteria based on a…

  16. Development and Validation of a Computational Model Ensemble for the Early Detection of BCRP/ABCG2 Substrates during the Drug Design Stage.

    PubMed

    Gantner, Melisa E; Peroni, Roxana N; Morales, Juan F; Villalba, María L; Ruiz, María E; Talevi, Alan

    2017-08-28

    Breast Cancer Resistance Protein (BCRP) is an ATP-dependent efflux transporter linked to the multidrug resistance phenomenon in many diseases such as epilepsy and cancer and a potential source of drug interactions. For these reasons, the early identification of substrates and nonsubstrates of this transporter during the drug discovery stage is of great interest. We have developed a computational nonlinear model ensemble based on conformational independent molecular descriptors using a combined strategy of genetic algorithms, J48 decision tree classifiers, and data fusion. The best model ensemble consists in averaging the ranking of the 12 decision trees that showed the best performance on the training set, which also demonstrated a good performance for the test set. It was experimentally validated using the ex vivo everted rat intestinal sac model. Five anticonvulsant drugs classified as nonsubstrates for BRCP by the model ensemble were experimentally evaluated, and none of them proved to be a BCRP substrate under the experimental conditions used, thus confirming the predictive ability of the model ensemble. The model ensemble reported here is a potentially valuable tool to be used as an in silico ADME filter in computer-aided drug discovery campaigns intended to overcome BCRP-mediated multidrug resistance issues and to prevent drug-drug interactions.

  17. Deconstructing field-induced ketene isomerization through Lagrangian descriptors.

    PubMed

    Craven, Galen T; Hernandez, Rigoberto

    2016-02-07

    The time-dependent geometrical separatrices governing state transitions in field-induced ketene isomerization are constructed using the method of Lagrangian descriptors. We obtain the stable and unstable manifolds of time-varying transition states as dynamic phase space objects governing configurational changes when the ketene molecule is subjected to an oscillating electric field. The dynamics of the isomerization reaction are modeled through classical trajectory studies on the Gezelter-Miller potential energy surface and an approximate dipole moment model which is coupled to a time-dependent electric field. We obtain a representation of the reaction geometry, over varying field strengths and oscillation frequencies, by partitioning an initial phase space into basins labeled according to which product state is reached at a given time. The borders between these basins are in agreement with those obtained using Lagrangian descriptors, even in regimes exhibiting chaotic dynamics. Major outcomes of this work are: validation and extension of a transition state theory framework built from Lagrangian descriptors, elaboration of the applicability for this theory to periodically- and aperiodically-driven molecular systems, and prediction of regimes in which isomerization of ketene and its derivatives may be controlled using an external field.

  18. One Shot Detection with Laplacian Object and Fast Matrix Cosine Similarity.

    PubMed

    Biswas, Sujoy Kumar; Milanfar, Peyman

    2016-03-01

    One shot, generic object detection involves searching for a single query object in a larger target image. Relevant approaches have benefited from features that typically model the local similarity patterns. In this paper, we combine local similarity (encoded by local descriptors) with a global context (i.e., a graph structure) of pairwise affinities among the local descriptors, embedding the query descriptors into a low dimensional but discriminatory subspace. Unlike principal components that preserve global structure of feature space, we actually seek a linear approximation to the Laplacian eigenmap that permits us a locality preserving embedding of high dimensional region descriptors. Our second contribution is an accelerated but exact computation of matrix cosine similarity as the decision rule for detection, obviating the computationally expensive sliding window search. We leverage the power of Fourier transform combined with integral image to achieve superior runtime efficiency that allows us to test multiple hypotheses (for pose estimation) within a reasonably short time. Our approach to one shot detection is training-free, and experiments on the standard data sets confirm the efficacy of our model. Besides, low computation cost of the proposed (codebook-free) object detector facilitates rather straightforward query detection in large data sets including movie videos.

  19. Combined computational-experimental approach to predict blood-brain barrier (BBB) permeation based on "green" salting-out thin layer chromatography supported by simple molecular descriptors.

    PubMed

    Ciura, Krzesimir; Belka, Mariusz; Kawczak, Piotr; Bączek, Tomasz; Markuszewski, Michał J; Nowakowska, Joanna

    2017-09-05

    The objective of this paper is to build QSRR/QSAR model for predicting the blood-brain barrier (BBB) permeability. The obtained models are based on salting-out thin layer chromatography (SOTLC) constants and calculated molecular descriptors. Among chromatographic methods SOTLC was chosen, since the mobile phases are free of organic solvent. As consequences, there are less toxic, and have lower environmental impact compared to classical reserved phases liquid chromatography (RPLC). During the study three stationary phase silica gel, cellulose plates and neutral aluminum oxide were examined. The model set of solutes presents a wide range of log BB values, containing compounds which cross the BBB readily and molecules poorly distributed to the brain including drugs acting on the nervous system as well as peripheral acting drugs. Additionally, the comparison of three regression models: multiple linear regression (MLR), partial least-squares (PLS) and orthogonal partial least squares (OPLS) were performed. The designed QSRR/QSAR models could be useful to predict BBB of systematically synthesized newly compounds in the drug development pipeline and are attractive alternatives of time-consuming and demanding directed methods for log BB measurement. The study also shown that among several regression techniques, significant differences can be obtained in models performance, measured by R 2 and Q 2 , hence it is strongly suggested to evaluate all available options as MLR, PLS and OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Experimentally determined soil organic matter-water sorption coefficients for different classes of natural toxins and comparison with estimated numbers.

    PubMed

    Schenzel, Judith; Goss, Kai-Uwe; Schwarzenbach, René P; Bucheli, Thomas D; Droge, Steven T J

    2012-06-05

    Although natural toxins, such as mycotoxins or phytoestrogens are widely studied and were recently identified as micropollutants in the environment, many of their environmentally relevant physicochemical properties have not yet been determined. Here, the sorption affinity to Pahokee peat, a model sorbent for soil organic matter, was investigated for 29 mycotoxins and two phytoestrogens. Sorption coefficients (K(oc)) were determined with a dynamic HPLC-based column method using a fully aqueous mobile phase with 5 mM CaCl(2) at pH 4.5. Sorption coefficients varied from less than 10(0.7) L/kg(oc) (e.g., all type B trichothecenes) to 10(4.0) L/kg(oc) (positively charged ergot alkaloids). For the neutral compounds the experimental sorption data set was compared with predicted sorption coefficients using various models, based on molecular fragment approaches (EPISuite's KOCWIN or SPARC), poly parameter linear free energy relationship (pp-LFER) in combination with predicted descriptors, and quantum-chemical based software (COSMOtherm)). None of the available models was able to adequately predict absolute K(oc) numbers and relative differences in sorption affinity for the whole set of neutral toxins, largely because mycotoxins exhibit highly complex structures. Hence, at present, for such compounds fast and consistent experimental techniques for determining sorption coefficients, as the one used in this study, are required.

Top