A method of extracting impervious surface based on rule algorithm
NASA Astrophysics Data System (ADS)
Peng, Shuangyun; Hong, Liang; Xu, Quanli
2018-02-01
The impervious surface has become an important index to evaluate the urban environmental quality and measure the development level of urbanization. At present, the use of remote sensing technology to extract impervious surface has become the main way. In this paper, a method to extract impervious surface based on rule algorithm is proposed. The main ideas of the method is to use the rule-based algorithm to extract impermeable surface based on the characteristics and the difference which is between the impervious surface and the other three types of objects (water, soil and vegetation) in the seven original bands, NDWI and NDVI. The steps can be divided into three steps: 1) Firstly, the vegetation is extracted according to the principle that the vegetation is higher in the near-infrared band than the other bands; 2) Then, the water is extracted according to the characteristic of the water with the highest NDWI and the lowest NDVI; 3) Finally, the impermeable surface is extracted based on the fact that the impervious surface has a higher NDWI value and the lowest NDVI value than the soil.In order to test the accuracy of the rule algorithm, this paper uses the linear spectral mixed decomposition algorithm, the CART algorithm, the NDII index algorithm for extracting the impervious surface based on six remote sensing image of the Dianchi Lake Basin from 1999 to 2014. Then, the accuracy of the above three methods is compared with the accuracy of the rule algorithm by using the overall classification accuracy method. It is found that the extraction method based on the rule algorithm is obviously higher than the above three methods.
Induction of belief decision trees from data
NASA Astrophysics Data System (ADS)
AbuDahab, Khalil; Xu, Dong-ling; Keane, John
2012-09-01
In this paper, a method for acquiring belief rule-bases by inductive inference from data is described and evaluated. Existing methods extract traditional rules inductively from data, with consequents that are believed to be either 100% true or 100% false. Belief rules can capture uncertain or incomplete knowledge using uncertain belief degrees in consequents. Instead of using singled-value consequents, each belief rule deals with a set of collectively exhaustive and mutually exclusive consequents. The proposed method extracts belief rules from data which contain uncertain or incomplete knowledge.
Rule Extracting based on MCG with its Application in Helicopter Power Train Fault Diagnosis
NASA Astrophysics Data System (ADS)
Wang, M.; Hu, N. Q.; Qin, G. J.
2011-07-01
In order to extract decision rules for fault diagnosis from incomplete historical test records for knowledge-based damage assessment of helicopter power train structure. A method that can directly extract the optimal generalized decision rules from incomplete information based on GrC was proposed. Based on semantic analysis of unknown attribute value, the granule was extended to handle incomplete information. Maximum characteristic granule (MCG) was defined based on characteristic relation, and MCG was used to construct the resolution function matrix. The optimal general decision rule was introduced, with the basic equivalent forms of propositional logic, the rules were extracted and reduction from incomplete information table. Combined with a fault diagnosis example of power train, the application approach of the method was present, and the validity of this method in knowledge acquisition was proved.
Techniques of Acceleration for Association Rule Induction with Pseudo Artificial Life Algorithm
NASA Astrophysics Data System (ADS)
Kanakubo, Masaaki; Hagiwara, Masafumi
Frequent patterns mining is one of the important problems in data mining. Generally, the number of potential rules grows rapidly as the size of database increases. It is therefore hard for a user to extract the association rules. To avoid such a difficulty, we propose a new method for association rule induction with pseudo artificial life approach. The proposed method is to decide whether there exists an item set which contains N or more items in two transactions. If it exists, a series of item sets which are contained in the part of transactions will be recorded. The iteration of this step contributes to the extraction of association rules. It is not necessary to calculate the huge number of candidate rules. In the evaluation test, we compared the extracted association rules using our method with the rules using other algorithms like Apriori algorithm. As a result of the evaluation using huge retail market basket data, our method is approximately 10 and 20 times faster than the Apriori algorithm and many its variants.
LIDAR Point Cloud Data Extraction and Establishment of 3D Modeling of Buildings
NASA Astrophysics Data System (ADS)
Zhang, Yujuan; Li, Xiuhai; Wang, Qiang; Liu, Jiang; Liang, Xin; Li, Dan; Ni, Chundi; Liu, Yan
2018-01-01
This paper takes the method of Shepard’s to deal with the original LIDAR point clouds data, and generate regular grid data DSM, filters the ground point cloud and non ground point cloud through double least square method, and obtains the rules of DSM. By using region growing method for the segmentation of DSM rules, the removal of non building point cloud, obtaining the building point cloud information. Uses the Canny operator to extract the image segmentation is needed after the edges of the building, uses Hough transform line detection to extract the edges of buildings rules of operation based on the smooth and uniform. At last, uses E3De3 software to establish the 3D model of buildings.
A New Data Mining Scheme Using Artificial Neural Networks
Kamruzzaman, S. M.; Jehad Sarkar, A. M.
2011-01-01
Classification is one of the data mining problems receiving enormous attention in the database community. Although artificial neural networks (ANNs) have been successfully applied in a wide range of machine learning applications, they are however often regarded as black boxes, i.e., their predictions cannot be explained. To enhance the explanation of ANNs, a novel algorithm to extract symbolic rules from ANNs has been proposed in this paper. ANN methods have not been effectively utilized for data mining tasks because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by human experts. With the proposed approach, concise symbolic rules with high accuracy, that are easily explainable, can be extracted from the trained ANNs. Extracted rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and the accuracy. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of benchmark data mining classification problems. PMID:22163866
Neural network explanation using inversion.
Saad, Emad W; Wunsch, Donald C
2007-01-01
An important drawback of many artificial neural networks (ANN) is their lack of explanation capability [Andrews, R., Diederich, J., & Tickle, A. B. (1996). A survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems, 8, 373-389]. This paper starts with a survey of algorithms which attempt to explain the ANN output. We then present HYPINV, a new explanation algorithm which relies on network inversion; i.e. calculating the ANN input which produces a desired output. HYPINV is a pedagogical algorithm, that extracts rules, in the form of hyperplanes. It is able to generate rules with arbitrarily desired fidelity, maintaining a fidelity-complexity tradeoff. To our knowledge, HYPINV is the only pedagogical rule extraction method, which extracts hyperplane rules from continuous or binary attribute neural networks. Different network inversion techniques, involving gradient descent as well as an evolutionary algorithm, are presented. An information theoretic treatment of rule extraction is presented. HYPINV is applied to example synthetic problems, to a real aerospace problem, and compared with similar algorithms using benchmark problems.
An adaptive singular spectrum analysis method for extracting brain rhythms of electroencephalography
Hu, Hai; Guo, Shengxin; Liu, Ran
2017-01-01
Artifacts removal and rhythms extraction from electroencephalography (EEG) signals are important for portable and wearable EEG recording devices. Incorporating a novel grouping rule, we proposed an adaptive singular spectrum analysis (SSA) method for artifacts removal and rhythms extraction. Based on the EEG signal amplitude, the grouping rule determines adaptively the first one or two SSA reconstructed components as artifacts and removes them. The remaining reconstructed components are then grouped based on their peak frequencies in the Fourier transform to extract the desired rhythms. The grouping rule thus enables SSA to be adaptive to EEG signals containing different levels of artifacts and rhythms. The simulated EEG data based on the Markov Process Amplitude (MPA) EEG model and the experimental EEG data in the eyes-open and eyes-closed states were used to verify the adaptive SSA method. Results showed a better performance in artifacts removal and rhythms extraction, compared with the wavelet decomposition (WDec) and another two recently reported SSA methods. Features of the extracted alpha rhythms using adaptive SSA were calculated to distinguish between the eyes-open and eyes-closed states. Results showed a higher accuracy (95.8%) than those of the WDec method (79.2%) and the infinite impulse response (IIR) filtering method (83.3%). PMID:28674650
Li, Yang; Li, Guoqing; Wang, Zhenhao
2015-01-01
In order to overcome the problems of poor understandability of the pattern recognition-based transient stability assessment (PRTSA) methods, a new rule extraction method based on extreme learning machine (ELM) and an improved Ant-miner (IAM) algorithm is presented in this paper. First, the basic principles of ELM and Ant-miner algorithm are respectively introduced. Then, based on the selected optimal feature subset, an example sample set is generated by the trained ELM-based PRTSA model. And finally, a set of classification rules are obtained by IAM algorithm to replace the original ELM network. The novelty of this proposal is that transient stability rules are extracted from an example sample set generated by the trained ELM-based transient stability assessment model by using IAM algorithm. The effectiveness of the proposed method is shown by the application results on the New England 39-bus power system and a practical power system--the southern power system of Hebei province.
Extracting Date/Time Expressions in Super-Function Based Japanese-English Machine Translation
NASA Astrophysics Data System (ADS)
Sasayama, Manabu; Kuroiwa, Shingo; Ren, Fuji
Super-Function Based Machine Translation(SFBMT) which is a type of Example-Based Machine Translation has a feature which makes it possible to expand the coverage of examples by changing nouns into variables, however, there were problems extracting entire date/time expressions containing parts-of-speech other than nouns, because only nouns/numbers were changed into variables. We describe a method for extracting date/time expressions for SFBMT. SFBMT uses noun determination rules to extract nouns and a bilingual dictionary to obtain correspondence of the extracted nouns between the source and the target languages. In this method, we add a rule to extract date/time expressions and then extract date/time expressions from a Japanese-English bilingual corpus. The evaluation results shows that the precision of this method for Japanese sentences is 96.7%, with a recall of 98.2% and the precision for English sentences is 94.7%, with a recall of 92.7%.
Bao, X Y; Huang, W J; Zhang, K; Jin, M; Li, Y; Niu, C Z
2018-04-18
There is a huge amount of diagnostic or treatment information in electronic medical record (EMR), which is a concrete manifestation of clinicians actual diagnosis and treatment details. Plenty of episodes in EMRs, such as complaints, present illness, past history, differential diagnosis, diagnostic imaging, surgical records, reflecting details of diagnosis and treatment in clinical process, adopt Chinese description of natural language. How to extract effective information from these Chinese narrative text data, and organize it into a form of tabular for analysis of medical research, for the practical utilization of clinical data in the real world, is a difficult problem in Chinese medical data processing. Based on the EMRs narrative text data in a tertiary hospital in China, a customized information extracting rules learning, and rule based information extraction methods is proposed. The overall method consists of three steps, which includes: (1) Step 1, a random sample of 600 copies (including the history of present illness, past history, personal history, family history, etc.) of the electronic medical record data, was extracted as raw corpora. With our developed Chinese clinical narrative text annotation platform, the trained clinician and nurses marked the tokens and phrases in the corpora which would be extracted (with a history of diabetes as an example). (2) Step 2, based on the annotated corpora clinical text data, some extraction templates were summarized and induced firstly. Then these templates were rewritten using regular expressions of Perl programming language, as extraction rules. Using these extraction rules as basic knowledge base, we developed extraction packages in Perl, for extracting data from the EMRs text data. In the end, the extracted data items were organized in tabular data format, for later usage in clinical research or hospital surveillance purposes. (3) As the final step of the method, the evaluation and validation of the proposed methods were implemented in the National Clinical Service Data Integration Platform, and we checked the extraction results using artificial verification and automated verification combined, proved the effectiveness of the method. For all the patients with diabetes as diagnosed disease in the Department of Endocrine in the hospital, the medical history episode of these patients showed that, altogether 1 436 patients were dismissed in 2015, and a history of diabetes medical records extraction results showed that the recall rate was 87.6%, the accuracy rate was 99.5%, and F-Score was 0.93. For all the 10% patients (totally 1 223 patients) with diabetes by the dismissed dates of August 2017 in the same department, the extracted diabetes history extraction results showed that the recall rate was 89.2%, the accuracy rate was 99.2%, F-Score was 0.94. This study mainly adopts the combination of natural language processing and rule-based information extraction, and designs and implements an algorithm for extracting customized information from unstructured Chinese electronic medical record text data. It has better results than existing work.
Automatic information extraction from unstructured mammography reports using distributed semantics.
Gupta, Anupama; Banerjee, Imon; Rubin, Daniel L
2018-02-01
To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F 1 -score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules. Copyright © 2018 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Min; Cui, Qi; Wang, Jie; Ming, Dongping; Lv, Guonian
2017-01-01
In this paper, we first propose several novel concepts for object-based image analysis, which include line-based shape regularity, line density, and scale-based best feature value (SBV), based on the region-line primitive association framework (RLPAF). We then propose a raft cultivation area (RCA) extraction method for high spatial resolution (HSR) remote sensing imagery based on multi-scale feature fusion and spatial rule induction. The proposed method includes the following steps: (1) Multi-scale region primitives (segments) are obtained by image segmentation method HBC-SEG, and line primitives (straight lines) are obtained by phase-based line detection method. (2) Association relationships between regions and lines are built based on RLPAF, and then multi-scale RLPAF features are extracted and SBVs are selected. (3) Several spatial rules are designed to extract RCAs within sea waters after land and water separation. Experiments show that the proposed method can successfully extract different-shaped RCAs from HR images with good performance.
2017-01-01
Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations. PMID:28644863
Fast Reduction Method in Dominance-Based Information Systems
NASA Astrophysics Data System (ADS)
Li, Yan; Zhou, Qinghua; Wen, Yongchuan
2018-01-01
In real world applications, there are often some data with continuous values or preference-ordered values. Rough sets based on dominance relations can effectively deal with these kinds of data. Attribute reduction can be done in the framework of dominance-relation based approach to better extract decision rules. However, the computational cost of the dominance classes greatly affects the efficiency of attribute reduction and rule extraction. This paper presents an efficient method of computing dominance classes, and further compares it with traditional method with increasing attributes and samples. Experiments on UCI data sets show that the proposed algorithm obviously improves the efficiency of the traditional method, especially for large-scale data.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-19
... facilities. 211 Extractors of crude petroleum and natural gas. 211112 Natural gas liquid extraction... Greenhouse Gas Reporting Rule: Revision to Best Available Monitoring Method Request Submission Deadline for Petroleum and Natural Gas Systems Source Category AGENCY: Environmental Protection Agency (EPA). ACTION...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-05-01
.... 211112 Natural gas liquid extraction facilities. Table 1 of this preamble is not intended to be... Greenhouse Gas Reporting Rule: Revision to Best Available Monitoring Method Request Submission Deadline for Petroleum and Natural Gas Systems Source Category AGENCY: Environmental Protection Agency (EPA). ACTION...
Summary of water body extraction methods based on ZY-3 satellite
NASA Astrophysics Data System (ADS)
Zhu, Yu; Sun, Li Jian; Zhang, Chuan Yin
2017-12-01
Extracting from remote sensing images is one of the main means of water information extraction. Affected by spectral characteristics, many methods can be not applied to the satellite image of ZY-3. To solve this problem, we summarize the extraction methods for ZY-3 and analyze the extraction results of existing methods. According to the characteristics of extraction results, the method of WI& single band threshold and the method of texture filtering based on probability statistics are explored. In addition, the advantages and disadvantages of all methods are compared, which provides some reference for the research of water extraction from images. The obtained conclusions are as follows. 1) NIR has higher water sensitivity, consequently when the surface reflectance in the study area is less similar to water, using single band threshold method or multi band operation can obtain the ideal effect. 2) Compared with the water index and HIS optimal index method, object extraction method based on rules, which takes into account not only the spectral information of the water, but also space and texture feature constraints, can obtain better extraction effect, yet the image segmentation process is time consuming and the definition of the rules requires a certain knowledge. 3) The combination of the spectral relationship and water index can eliminate the interference of the shadow to a certain extent. When there is less small water or small water is not considered in further study, texture filtering based on probability statistics can effectively reduce the noises in result and avoid mixing shadows or paddy field with water in a certain extent.
An integrated method for cancer classification and rule extraction from microarray data
Huang, Liang-Tsung
2009-01-01
Different microarray techniques recently have been successfully used to investigate useful information for cancer diagnosis at the gene expression level due to their ability to measure thousands of gene expression levels in a massively parallel way. One important issue is to improve classification performance of microarray data. However, it would be ideal that influential genes and even interpretable rules can be explored at the same time to offer biological insight. Introducing the concepts of system design in software engineering, this paper has presented an integrated and effective method (named X-AI) for accurate cancer classification and the acquisition of knowledge from DNA microarray data. This method included a feature selector to systematically extract the relative important genes so as to reduce the dimension and retain as much as possible of the class discriminatory information. Next, diagonal quadratic discriminant analysis (DQDA) was combined to classify tumors, and generalized rule induction (GRI) was integrated to establish association rules which can give an understanding of the relationships between cancer classes and related genes. Two non-redundant datasets of acute leukemia were used to validate the proposed X-AI, showing significantly high accuracy for discriminating different classes. On the other hand, I have presented the abilities of X-AI to extract relevant genes, as well as to develop interpretable rules. Further, a web server has been established for cancer classification and it is freely available at . PMID:19272192
Knowledge extraction from evolving spiking neural networks with rank order population coding.
Soltic, Snjezana; Kasabov, Nikola
2010-12-01
This paper demonstrates how knowledge can be extracted from evolving spiking neural networks with rank order population coding. Knowledge discovery is a very important feature of intelligent systems. Yet, a disproportionally small amount of research is centered on the issue of knowledge extraction from spiking neural networks which are considered to be the third generation of artificial neural networks. The lack of knowledge representation compatibility is becoming a major detriment to end users of these networks. We show that a high-level knowledge can be obtained from evolving spiking neural networks. More specifically, we propose a method for fuzzy rule extraction from an evolving spiking network with rank order population coding. The proposed method was used for knowledge discovery on two benchmark taste recognition problems where the knowledge learnt by an evolving spiking neural network was extracted in the form of zero-order Takagi-Sugeno fuzzy IF-THEN rules.
Using association rule mining to identify risk factors for early childhood caries.
Ivančević, Vladimir; Tušek, Ivan; Tušek, Jasmina; Knežević, Marko; Elheshk, Salaheddin; Luković, Ivan
2015-11-01
Early childhood caries (ECC) is a potentially severe disease affecting children all over the world. The available findings are mostly based on a logistic regression model, but data mining, in particular association rule mining, could be used to extract more information from the same data set. ECC data was collected in a cross-sectional analytical study of the 10% sample of preschool children in the South Bačka area (Vojvodina, Serbia). Association rules were extracted from the data by association rule mining. Risk factors were extracted from the highly ranked association rules. Discovered dominant risk factors include male gender, frequent breastfeeding (with other risk factors), high birth order, language, and low body weight at birth. Low health awareness of parents was significantly associated to ECC only in male children. The discovered risk factors are mostly confirmed by the literature, which corroborates the value of the methods. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Measuring uncertainty by extracting fuzzy rules using rough sets
NASA Technical Reports Server (NTRS)
Worm, Jeffrey A.
1991-01-01
Despite the advancements in the computer industry in the past 30 years, there is still one major deficiency. Computers are not designed to handle terms where uncertainty is present. To deal with uncertainty, techniques other than classical logic must be developed. The methods are examined of statistical analysis, the Dempster-Shafer theory, rough set theory, and fuzzy set theory to solve this problem. The fundamentals of these theories are combined to possibly provide the optimal solution. By incorporating principles from these theories, a decision making process may be simulated by extracting two sets of fuzzy rules: certain rules and possible rules. From these rules a corresponding measure of how much these rules is believed is constructed. From this, the idea of how much a fuzzy diagnosis is definable in terms of a set of fuzzy attributes is studied.
Yap, Keem Siah; Lim, Chee Peng; Au, Mau Teng
2011-12-01
Generalized adaptive resonance theory (GART) is a neural network model that is capable of online learning and is effective in tackling pattern classification tasks. In this paper, we propose an improved GART model (IGART), and demonstrate its applicability to power systems. IGART enhances the dynamics of GART in several aspects, which include the use of the Laplacian likelihood function, a new vigilance function, a new match-tracking mechanism, an ordering algorithm for determining the sequence of training data, and a rule extraction capability to elicit if-then rules from the network. To assess the effectiveness of IGART and to compare its performances with those from other methods, three datasets that are related to power systems are employed. The experimental results demonstrate the usefulness of IGART with the rule extraction capability in undertaking classification problems in power systems engineering.
Intelligent Gearbox Diagnosis Methods Based on SVM, Wavelet Lifting and RBR
Gao, Lixin; Ren, Zhiqiang; Tang, Wenliang; Wang, Huaqing; Chen, Peng
2010-01-01
Given the problems in intelligent gearbox diagnosis methods, it is difficult to obtain the desired information and a large enough sample size to study; therefore, we propose the application of various methods for gearbox fault diagnosis, including wavelet lifting, a support vector machine (SVM) and rule-based reasoning (RBR). In a complex field environment, it is less likely for machines to have the same fault; moreover, the fault features can also vary. Therefore, a SVM could be used for the initial diagnosis. First, gearbox vibration signals were processed with wavelet packet decomposition, and the signal energy coefficients of each frequency band were extracted and used as input feature vectors in SVM for normal and faulty pattern recognition. Second, precision analysis using wavelet lifting could successfully filter out the noisy signals while maintaining the impulse characteristics of the fault; thus effectively extracting the fault frequency of the machine. Lastly, the knowledge base was built based on the field rules summarized by experts to identify the detailed fault type. Results have shown that SVM is a powerful tool to accomplish gearbox fault pattern recognition when the sample size is small, whereas the wavelet lifting scheme can effectively extract fault features, and rule-based reasoning can be used to identify the detailed fault type. Therefore, a method that combines SVM, wavelet lifting and rule-based reasoning ensures effective gearbox fault diagnosis. PMID:22399894
Intelligent gearbox diagnosis methods based on SVM, wavelet lifting and RBR.
Gao, Lixin; Ren, Zhiqiang; Tang, Wenliang; Wang, Huaqing; Chen, Peng
2010-01-01
Given the problems in intelligent gearbox diagnosis methods, it is difficult to obtain the desired information and a large enough sample size to study; therefore, we propose the application of various methods for gearbox fault diagnosis, including wavelet lifting, a support vector machine (SVM) and rule-based reasoning (RBR). In a complex field environment, it is less likely for machines to have the same fault; moreover, the fault features can also vary. Therefore, a SVM could be used for the initial diagnosis. First, gearbox vibration signals were processed with wavelet packet decomposition, and the signal energy coefficients of each frequency band were extracted and used as input feature vectors in SVM for normal and faulty pattern recognition. Second, precision analysis using wavelet lifting could successfully filter out the noisy signals while maintaining the impulse characteristics of the fault; thus effectively extracting the fault frequency of the machine. Lastly, the knowledge base was built based on the field rules summarized by experts to identify the detailed fault type. Results have shown that SVM is a powerful tool to accomplish gearbox fault pattern recognition when the sample size is small, whereas the wavelet lifting scheme can effectively extract fault features, and rule-based reasoning can be used to identify the detailed fault type. Therefore, a method that combines SVM, wavelet lifting and rule-based reasoning ensures effective gearbox fault diagnosis.
Peng, Mingkai; Sundararajan, Vijaya; Williamson, Tyler; Minty, Evan P; Smith, Tony C; Doktorchik, Chelsea T A; Quan, Hude
2018-03-01
Data quality assessment is a challenging facet for research using coded administrative health data. Current assessment approaches are time and resource intensive. We explored whether association rule mining (ARM) can be used to develop rules for assessing data quality. We extracted 2013 and 2014 records from the hospital discharge abstract database (DAD) for patients between the ages of 55 and 65 from five acute care hospitals in Alberta, Canada. The ARM was conducted using the 2013 DAD to extract rules with support ≥0.0019 and confidence ≥0.5 using the bootstrap technique, and tested in the 2014 DAD. The rules were compared against the method of coding frequency and assessed for their ability to detect error introduced by two kinds of data manipulation: random permutation and random deletion. The association rules generally had clear clinical meanings. Comparing 2014 data to 2013 data (both original), there were 3 rules with a confidence difference >0.1, while coding frequency difference of codes in the right hand of rules was less than 0.004. After random permutation of 50% of codes in the 2014 data, average rule confidence dropped from 0.72 to 0.27 while coding frequency remained unchanged. Rule confidence decreased with the increase of coding deletion, as expected. Rule confidence was more sensitive to code deletion compared to coding frequency, with slope of change ranging from 1.7 to 184.9 with a median of 9.1. The ARM is a promising technique to assess data quality. It offers a systematic way to derive coding association rules hidden in data, and potentially provides a sensitive and efficient method of assessing data quality compared to standard methods. Copyright © 2018 Elsevier Inc. All rights reserved.
Mining HIV protease cleavage data using genetic programming with a sum-product function.
Yang, Zheng Rong; Dalby, Andrew R; Qiu, Jing
2004-12-12
In order to design effective HIV inhibitors, studying and understanding the mechanism of HIV protease cleavage specification is critical. Various methods have been developed to explore the specificity of HIV protease cleavage activity. However, success in both extracting discriminant rules and maintaining high prediction accuracy is still challenging. The earlier study had employed genetic programming with a min-max scoring function to extract discriminant rules with success. However, the decision will finally be degenerated to one residue making further improvement of the prediction accuracy difficult. The challenge of revising the min-max scoring function so as to improve the prediction accuracy motivated this study. This paper has designed a new scoring function called a sum-product function for extracting HIV protease cleavage discriminant rules using genetic programming methods. The experiments show that the new scoring function is superior to the min-max scoring function. The software package can be obtained by request to Dr Zheng Rong Yang.
Applications of rule-induction in the derivation of quantitative structure-activity relationships.
A-Razzak, M; Glen, R C
1992-08-01
Recently, methods have been developed in the field of Artificial Intelligence (AI), specifically in the expert systems area using rule-induction, designed to extract rules from data. We have applied these methods to the analysis of molecular series with the objective of generating rules which are predictive and reliable. The input to rule-induction consists of a number of examples with known outcomes (a training set) and the output is a tree-structured series of rules. Unlike most other analysis methods, the results of the analysis are in the form of simple statements which can be easily interpreted. These are readily applied to new data giving both a classification and a probability of correctness. Rule-induction has been applied to in-house generated and published QSAR datasets and the methodology, application and results of these analyses are discussed. The results imply that in some cases it would be advantageous to use rule-induction as a complementary technique in addition to conventional statistical and pattern-recognition methods.
Applications of rule-induction in the derivation of quantitative structure-activity relationships
NASA Astrophysics Data System (ADS)
A-Razzak, Mohammed; Glen, Robert C.
1992-08-01
Recently, methods have been developed in the field of Artificial Intelligence (AI), specifically in the expert systems area using rule-induction, designed to extract rules from data. We have applied these methods to the analysis of molecular series with the objective of generating rules which are predictive and reliable. The input to rule-induction consists of a number of examples with known outcomes (a training set) and the output is a tree-structured series of rules. Unlike most other analysis methods, the results of the analysis are in the form of simple statements which can be easily interpreted. These are readily applied to new data giving both a classification and a probability of correctness. Rule-induction has been applied to in-house generated and published QSAR datasets and the methodology, application and results of these analyses are discussed. The results imply that in some cases it would be advantageous to use rule-induction as a complementary technique in addition to conventional statistical and pattern-recognition methods.
Deep Logic Networks: Inserting and Extracting Knowledge From Deep Belief Networks.
Tran, Son N; d'Avila Garcez, Artur S
2018-02-01
Developments in deep learning have seen the use of layerwise unsupervised learning combined with supervised learning for fine-tuning. With this layerwise approach, a deep network can be seen as a more modular system that lends itself well to learning representations. In this paper, we investigate whether such modularity can be useful to the insertion of background knowledge into deep networks, whether it can improve learning performance when it is available, and to the extraction of knowledge from trained deep networks, and whether it can offer a better understanding of the representations learned by such networks. To this end, we use a simple symbolic language-a set of logical rules that we call confidence rules-and show that it is suitable for the representation of quantitative reasoning in deep networks. We show by knowledge extraction that confidence rules can offer a low-cost representation for layerwise networks (or restricted Boltzmann machines). We also show that layerwise extraction can produce an improvement in the accuracy of deep belief networks. Furthermore, the proposed symbolic characterization of deep networks provides a novel method for the insertion of prior knowledge and training of deep networks. With the use of this method, a deep neural-symbolic system is proposed and evaluated, with the experimental results indicating that modularity through the use of confidence rules and knowledge insertion can be beneficial to network performance.
NASA Astrophysics Data System (ADS)
Gong, Y.; Yang, Y.; Yang, X.
2018-04-01
For the purpose of extracting productions of some specific branching plants effectively and realizing its 3D reconstruction, Terrestrial LiDAR data was used as extraction source of production, and a 3D reconstruction method based on Terrestrial LiDAR technologies combined with the L-system was proposed in this article. The topology structure of the plant architectures was extracted using the point cloud data of the target plant with space level segmentation mechanism. Subsequently, L-system productions were obtained and the structural parameters and production rules of branches, which fit the given plant, was generated. A three-dimensional simulation model of target plant was established combined with computer visualization algorithm finally. The results suggest that the method can effectively extract a given branching plant topology and describes its production, realizing the extraction of topology structure by the computer algorithm for given branching plant and also simplifying the extraction of branching plant productions which would be complex and time-consuming by L-system. It improves the degree of automation in the L-system extraction of productions of specific branching plants, providing a new way for the extraction of branching plant production rules.
Extracting decision rules from police accident reports through decision trees.
de Oña, Juan; López, Griselda; Abellán, Joaquín
2013-01-01
Given the current number of road accidents, the aim of many road safety analysts is to identify the main factors that contribute to crash severity. To pinpoint those factors, this paper shows an application that applies some of the methods most commonly used to build decision trees (DTs), which have not been applied to the road safety field before. An analysis of accidents on rural highways in the province of Granada (Spain) between 2003 and 2009 (both inclusive) showed that the methods used to build DTs serve our purpose and may even be complementary. Applying these methods has enabled potentially useful decision rules to be extracted that could be used by road safety analysts. For instance, some of the rules may indicate that women, contrary to men, increase their risk of severity under bad lighting conditions. The rules could be used in road safety campaigns to mitigate specific problems. This would enable managers to implement priority actions based on a classification of accidents by types (depending on their severity). However, the primary importance of this proposal is that other databases not used here (i.e. other infrastructure, roads and countries) could be used to identify unconventional problems in a manner easy for road safety managers to understand, as decision rules. Copyright © 2012 Elsevier Ltd. All rights reserved.
Human Systems Integration (HSI) Associated Development Activities in Japan
2008-06-12
machine learning and data mining methods. The continuous effort ( KAIZEN ) to improve the analysis phases are illustrated in Figure 14. Although there...model Extraction of a workflow Extraction of a control rule Variation analysis and improvement Plant operation KAIZEN Fig. 14
NASA Astrophysics Data System (ADS)
Jawak, Shridhar D.; Jadhav, Ajay; Luis, Alvarinho J.
2016-05-01
Supraglacial debris was mapped in the Schirmacher Oasis, east Antarctica, by using WorldView-2 (WV-2) high resolution optical remote sensing data consisting of 8-band calibrated Gram Schmidt (GS)-sharpened and atmospherically corrected WV-2 imagery. This study is a preliminary attempt to develop an object-oriented rule set to extract supraglacial debris for Antarctic region using 8-spectral band imagery. Supraglacial debris was manually digitized from the satellite imagery to generate the ground reference data. Several trials were performed using few existing traditional pixel-based classification techniques and color-texture based object-oriented classification methods to extract supraglacial debris over a small domain of the study area. Multi-level segmentation and attributes such as scale, shape, size, compactness along with spectral information from the data were used for developing the rule set. The quantitative analysis of error was carried out against the manually digitized reference data to test the practicability of our approach over the traditional pixel-based methods. Our results indicate that OBIA-based approach (overall accuracy: 93%) for extracting supraglacial debris performed better than all the traditional pixel-based methods (overall accuracy: 80-85%). The present attempt provides a comprehensive improved method for semiautomatic feature extraction in supraglacial environment and a new direction in the cryospheric research.
Influence of crisp values on the object-based data extraction procedure from LiDAR data
NASA Astrophysics Data System (ADS)
Tomljenovic, Ivan; Rousell, Adam
2014-05-01
Nowadays a plethora of approaches attempt to automate the process of object extraction from LiDAR data. However, the majority of these methods require the fusion of the LiDAR dataset with other information such as photogrammetric imagery. The approach that has been used as the basis for this paper is a novel method which makes use of human knowledge and the CNL modelling language to automatically extract buildings solely from LiDAR point cloud data in a transferable method. A number of rules are implemented to generate an artificial intelligence algorithm which is used for the object extraction. Although the single dataset method has been found to successfully extract building footprints from the point cloud dataset, at this initial stage it has one restriction that may limit its effectiveness - a number of the rules that are used are based on crisp boundary values. If, for example, the slope of the ground surface is used as a rule for determining objects then the slope value of a pixel would be assessed to determine if it is suitable for a building structure. This check would be performed by identifying whether the slope value is less than or greater than a threshold value. However, in reality such a crisp classification process is likely not to be a true reflection of real world scenarios. For example, using the crisp methods a difference of 1° in slope could result in one region in a dataset being deemed suitable and its neighboring region being seen as not suitable. It is likely however that there is in reality little difference in the actual suitability of these two neighboring regions. A more suitable classification process may be the use of fuzzy set theory whereby each region is seen as having degree of membership to a number of sets (or classifications). In the above example, the two regions would likely be seen as having very similar membership values to the different sets, although this is obviously dependent on factors such as the extent of each region. The purpose of this study is to identify to what extent the use of explicit boundary values has on the overall building footprint dataset extracted. By performing the analysis multiple times using differing threshold values for rules, it is possible to compare the resultant datasets and thus identify the impact of using such classification procedures. If a significant difference is found between the resultant datasets, this would highlight that the use of such crisp methods in the extraction processes may not be optimal and that a future enhancement to the method would be to consider the use of fuzzy classification methods.
Automatic Extraction of Urban Built-Up Area Based on Object-Oriented Method and Remote Sensing Data
NASA Astrophysics Data System (ADS)
Li, L.; Zhou, H.; Wen, Q.; Chen, T.; Guan, F.; Ren, B.; Yu, H.; Wang, Z.
2018-04-01
Built-up area marks the use of city construction land in the different periods of the development, the accurate extraction is the key to the studies of the changes of urban expansion. This paper studies the technology of automatic extraction of urban built-up area based on object-oriented method and remote sensing data, and realizes the automatic extraction of the main built-up area of the city, which saves the manpower cost greatly. First, the extraction of construction land based on object-oriented method, the main technical steps include: (1) Multi-resolution segmentation; (2) Feature Construction and Selection; (3) Information Extraction of Construction Land Based on Rule Set, The characteristic parameters used in the rule set mainly include the mean of the red band (Mean R), Normalized Difference Vegetation Index (NDVI), Ratio of residential index (RRI), Blue band mean (Mean B), Through the combination of the above characteristic parameters, the construction site information can be extracted. Based on the degree of adaptability, distance and area of the object domain, the urban built-up area can be quickly and accurately defined from the construction land information without depending on other data and expert knowledge to achieve the automatic extraction of the urban built-up area. In this paper, Beijing city as an experimental area for the technical methods of the experiment, the results show that: the city built-up area to achieve automatic extraction, boundary accuracy of 2359.65 m to meet the requirements. The automatic extraction of urban built-up area has strong practicality and can be applied to the monitoring of the change of the main built-up area of city.
Drug side effect extraction from clinical narratives of psychiatry and psychology patients
Kocher, Jean-Pierre A; Chute, Christopher G; Savova, Guergana K
2011-01-01
Objective To extract physician-asserted drug side effects from electronic medical record clinical narratives. Materials and methods Pattern matching rules were manually developed through examining keywords and expression patterns of side effects to discover an individual side effect and causative drug relationship. A combination of machine learning (C4.5) using side effect keyword features and pattern matching rules was used to extract sentences that contain side effect and causative drug pairs, enabling the system to discover most side effect occurrences. Our system was implemented as a module within the clinical Text Analysis and Knowledge Extraction System. Results The system was tested in the domain of psychiatry and psychology. The rule-based system extracting side effects and causative drugs produced an F score of 0.80 (0.55 excluding allergy section). The hybrid system identifying side effect sentences had an F score of 0.75 (0.56 excluding allergy section) but covered more side effect and causative drug pairs than individual side effect extraction. Discussion The rule-based system was able to identify most side effects expressed by clear indication words. More sophisticated semantic processing is required to handle complex side effect descriptions in the narrative. We demonstrated that our system can be trained to identify sentences with complex side effect descriptions that can be submitted to a human expert for further abstraction. Conclusion Our system was able to extract most physician-asserted drug side effects. It can be used in either an automated mode for side effect extraction or semi-automated mode to identify side effect sentences that can significantly simplify abstraction by a human expert. PMID:21946242
Extraction of decision rules via imprecise probabilities
NASA Astrophysics Data System (ADS)
Abellán, Joaquín; López, Griselda; Garach, Laura; Castellano, Javier G.
2017-05-01
Data analysis techniques can be applied to discover important relations among features. This is the main objective of the Information Root Node Variation (IRNV) technique, a new method to extract knowledge from data via decision trees. The decision trees used by the original method were built using classic split criteria. The performance of new split criteria based on imprecise probabilities and uncertainty measures, called credal split criteria, differs significantly from the performance obtained using the classic criteria. This paper extends the IRNV method using two credal split criteria: one based on a mathematical parametric model, and other one based on a non-parametric model. The performance of the method is analyzed using a case study of traffic accident data to identify patterns related to the severity of an accident. We found that a larger number of rules is generated, significantly supplementing the information obtained using the classic split criteria.
Using GO-WAR for mining cross-ontology weighted association rules.
Agapito, Giuseppe; Cannataro, Mario; Guzzi, Pietro Hiram; Milano, Marianna
2015-07-01
The Gene Ontology (GO) is a structured repository of concepts (GO terms) that are associated to one or more gene products. The process of association is referred to as annotation. The relevance and the specificity of both GO terms and annotations are evaluated by a measure defined as information content (IC). The analysis of annotated data is thus an important challenge for bioinformatics. There exist different approaches of analysis. From those, the use of association rules (AR) may provide useful knowledge, and it has been used in some applications, e.g. improving the quality of annotations. Nevertheless classical association rules algorithms do not take into account the source of annotation nor the importance yielding to the generation of candidate rules with low IC. This paper presents GO-WAR (Gene Ontology-based Weighted Association Rules) a methodology for extracting weighted association rules. GO-WAR can extract association rules with a high level of IC without loss of support and confidence from a dataset of annotated data. A case study on using of GO-WAR on publicly available GO annotation datasets is used to demonstrate that our method outperforms current state of the art approaches. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Chang, Yung-Chun; Dai, Hong-Jie; Wu, Johnny Chi-Yang; Chen, Jian-Ming; Tsai, Richard Tzong-Han; Hsu, Wen-Lian
2013-12-01
Patient discharge summaries provide detailed medical information about individuals who have been hospitalized. To make a precise and legitimate assessment of the abundant data, a proper time layout of the sequence of relevant events should be compiled and used to drive a patient-specific timeline, which could further assist medical personnel in making clinical decisions. The process of identifying the chronological order of entities is called temporal relation extraction. In this paper, we propose a hybrid method to identify appropriate temporal links between a pair of entities. The method combines two approaches: one is rule-based and the other is based on the maximum entropy model. We develop an integration algorithm to fuse the results of the two approaches. All rules and the integration algorithm are formally stated so that one can easily reproduce the system and results. To optimize the system's configuration, we used the 2012 i2b2 challenge TLINK track dataset and applied threefold cross validation to the training set. Then, we evaluated its performance on the training and test datasets. The experiment results show that the proposed TEMPTING (TEMPoral relaTion extractING) system (ranked seventh) achieved an F-score of 0.563, which was at least 30% better than that of the baseline system, which randomly selects TLINK candidates from all pairs and assigns the TLINK types. The TEMPTING system using the hybrid method also outperformed the stage-based TEMPTING system. Its F-scores were 3.51% and 0.97% better than those of the stage-based system on the training set and test set, respectively. Copyright © 2013 Elsevier Inc. All rights reserved.
Efficient Variable Selection Method for Exposure Variables on Binary Data
NASA Astrophysics Data System (ADS)
Ohno, Manabu; Tarumi, Tomoyuki
In this paper, we propose a new variable selection method for "robust" exposure variables. We define "robust" as property that the same variable can select among original data and perturbed data. There are few studies of effective for the selection method. The problem that selects exposure variables is almost the same as a problem that extracts correlation rules without robustness. [Brin 97] is suggested that correlation rules are possible to extract efficiently using chi-squared statistic of contingency table having monotone property on binary data. But the chi-squared value does not have monotone property, so it's is easy to judge the method to be not independent with an increase in the dimension though the variable set is completely independent, and the method is not usable in variable selection for robust exposure variables. We assume anti-monotone property for independent variables to select robust independent variables and use the apriori algorithm for it. The apriori algorithm is one of the algorithms which find association rules from the market basket data. The algorithm use anti-monotone property on the support which is defined by association rules. But independent property does not completely have anti-monotone property on the AIC of independent probability model, but the tendency to have anti-monotone property is strong. Therefore, selected variables with anti-monotone property on the AIC have robustness. Our method judges whether a certain variable is exposure variable for the independent variable using previous comparison of the AIC. Our numerical experiments show that our method can select robust exposure variables efficiently and precisely.
CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.
Cestarelli, Valerio; Fiscon, Giulia; Felici, Giovanni; Bertolazzi, Paola; Weitschek, Emanuel
2016-03-01
Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class. We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced. dmb.iasi.cnr.it/camur.php emanuel@iasi.cnr.it Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Instantaneous Coastline Extraction from LIDAR Point Cloud and High Resolution Remote Sensing Imagery
NASA Astrophysics Data System (ADS)
Li, Y.; Zhoing, L.; Lai, Z.; Gan, Z.
2018-04-01
A new method was proposed for instantaneous waterline extraction in this paper, which combines point cloud geometry features and image spectral characteristics of the coastal zone. The proposed method consists of follow steps: Mean Shift algorithm is used to segment the coastal zone of high resolution remote sensing images into small regions containing semantic information;Region features are extracted by integrating LiDAR data and the surface area of the image; initial waterlines are extracted by α-shape algorithm; a region growing algorithm with is taking into coastline refinement, with a growth rule integrating the intensity and topography of LiDAR data; moothing the coastline. Experiments are conducted to demonstrate the efficiency of the proposed method.
Effective Diagnosis of Alzheimer's Disease by Means of Association Rules
NASA Astrophysics Data System (ADS)
Chaves, R.; Ramírez, J.; Górriz, J. M.; López, M.; Salas-Gonzalez, D.; Illán, I.; Segovia, F.; Padilla, P.
In this paper we present a novel classification method of SPECT images for the early diagnosis of the Alzheimer's disease (AD). The proposed method is based on Association Rules (ARs) aiming to discover interesting associations between attributes contained in the database. The system uses firstly voxel-as-features (VAF) and Activation Estimation (AE) to find tridimensional activated brain regions of interest (ROIs) for each patient. These ROIs act as inputs to secondly mining ARs between activated blocks for controls, with a specified minimum support and minimum confidence. ARs are mined in supervised mode, using information previously extracted from the most discriminant rules for centering interest in the relevant brain areas, reducing the computational requirement of the system. Finally classification process is performed depending on the number of previously mined rules verified by each subject, yielding an up to 95.87% classification accuracy, thus outperforming recent developed methods for AD diagnosis.
Exploiting graph kernels for high performance biomedical relation extraction.
Panyam, Nagesh C; Verspoor, Karin; Cohn, Trevor; Ramamohanarao, Kotagiri
2018-01-30
Relation extraction from biomedical publications is an important task in the area of semantic mining of text. Kernel methods for supervised relation extraction are often preferred over manual feature engineering methods, when classifying highly ordered structures such as trees and graphs obtained from syntactic parsing of a sentence. Tree kernels such as the Subset Tree Kernel and Partial Tree Kernel have been shown to be effective for classifying constituency parse trees and basic dependency parse graphs of a sentence. Graph kernels such as the All Path Graph kernel (APG) and Approximate Subgraph Matching (ASM) kernel have been shown to be suitable for classifying general graphs with cycles, such as the enhanced dependency parse graph of a sentence. In this work, we present a high performance Chemical-Induced Disease (CID) relation extraction system. We present a comparative study of kernel methods for the CID task and also extend our study to the Protein-Protein Interaction (PPI) extraction task, an important biomedical relation extraction task. We discuss novel modifications to the ASM kernel to boost its performance and a method to apply graph kernels for extracting relations expressed in multiple sentences. Our system for CID relation extraction attains an F-score of 60%, without using external knowledge sources or task specific heuristic or rules. In comparison, the state of the art Chemical-Disease Relation Extraction system achieves an F-score of 56% using an ensemble of multiple machine learning methods, which is then boosted to 61% with a rule based system employing task specific post processing rules. For the CID task, graph kernels outperform tree kernels substantially, and the best performance is obtained with APG kernel that attains an F-score of 60%, followed by the ASM kernel at 57%. The performance difference between the ASM and APG kernels for CID sentence level relation extraction is not significant. In our evaluation of ASM for the PPI task, ASM performed better than APG kernel for the BioInfer dataset, in the Area Under Curve (AUC) measure (74% vs 69%). However, for all the other PPI datasets, namely AIMed, HPRD50, IEPA and LLL, ASM is substantially outperformed by the APG kernel in F-score and AUC measures. We demonstrate a high performance Chemical Induced Disease relation extraction, without employing external knowledge sources or task specific heuristics. Our work shows that graph kernels are effective in extracting relations that are expressed in multiple sentences. We also show that the graph kernels, namely the ASM and APG kernels, substantially outperform the tree kernels. Among the graph kernels, we showed the ASM kernel as effective for biomedical relation extraction, with comparable performance to the APG kernel for datasets such as the CID-sentence level relation extraction and BioInfer in PPI. Overall, the APG kernel is shown to be significantly more accurate than the ASM kernel, achieving better performance on most datasets.
Object-oriented classification of drumlins from digital elevation models
NASA Astrophysics Data System (ADS)
Saha, Kakoli
Drumlins are common elements of glaciated landscapes which are easily identified by their distinct morphometric characteristics including shape, length/width ratio, elongation ratio, and uniform direction. To date, most researchers have mapped drumlins by tracing contours on maps, or through on-screen digitization directly on top of hillshaded digital elevation models (DEMs). This paper seeks to utilize the unique morphometric characteristics of drumlins and investigates automated extraction of the landforms as objects from DEMs by Definiens Developer software (V.7), using the 30 m United States Geological Survey National Elevation Dataset DEM as input. The Chautauqua drumlin field in Pennsylvania and upstate New York, USA was chosen as a study area. As the study area is huge (approximately covers 2500 sq.km. of area), small test areas were selected for initial testing of the method. Individual polygons representing the drumlins were extracted from the elevation data set by automated recognition, using Definiens' Multiresolution Segmentation tool, followed by rule-based classification. Subsequently parameters such as length, width and length-width ratio, perimeter and area were measured automatically. To test the accuracy of the method, a second base map was produced by manual on-screen digitization of drumlins from topographic maps and the same morphometric parameters were extracted from the mapped landforms using Definiens Developer. Statistical comparison showed a high agreement between the two methods confirming that object-oriented classification for extraction of drumlins can be used for mapping these landforms. The proposed method represents an attempt to solve the problem by providing a generalized rule-set for mass extraction of drumlins. To check that the automated extraction process was next applied to a larger area. Results showed that the proposed method is as successful for the bigger area as it was for the smaller test areas.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gubler, Philipp, E-mail: pgubler@riken.jp; RIKEN Nishina Center, Wako, Saitama 351-0198; Yamamoto, Naoki
2015-05-15
Making use of the operator product expansion, we derive a general class of sum rules for the imaginary part of the single-particle self-energy of the unitary Fermi gas. The sum rules are analyzed numerically with the help of the maximum entropy method, which allows us to extract the single-particle spectral density as a function of both energy and momentum. These spectral densities contain basic information on the properties of the unitary Fermi gas, such as the dispersion relation and the superfluid pairing gap, for which we obtain reasonable agreement with the available results based on quantum Monte-Carlo simulations.
Unconventional Oil and Gas Extraction Effluent Guidelines
Overview and documents for the Unconventional Oil and Gas Extraction Pretreatment Standards final rule (6/28/2016), direct final rule (Sept. 2016) and proposed rule (Sept. 2016). 40 CFR Part 435, Subpart C.
Combined rule extraction and feature elimination in supervised classification.
Liu, Sheng; Patel, Ronak Y; Daga, Pankaj R; Liu, Haining; Fu, Gang; Doerksen, Robert J; Chen, Yixin; Wilkins, Dawn E
2012-09-01
There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.
Research on complex 3D tree modeling based on L-system
NASA Astrophysics Data System (ADS)
Gang, Chen; Bin, Chen; Yuming, Liu; Hui, Li
2018-03-01
L-system as a fractal iterative system could simulate complex geometric patterns. Based on the field observation data of trees and knowledge of forestry experts, this paper extracted modeling constraint rules and obtained an L-system rules set. Using the self-developed L-system modeling software the L-system rule set was parsed to generate complex tree 3d models.The results showed that the geometrical modeling method based on l-system could be used to describe the morphological structure of complex trees and generate 3D tree models.
Rule-based Approach on Extraction of Malay Compound Nouns in Standard Malay Document
NASA Astrophysics Data System (ADS)
Abu Bakar, Zamri; Kamal Ismail, Normaly; Rawi, Mohd Izani Mohamed
2017-08-01
Malay compound noun is defined as a form of words that exists when two or more words are combined into a single syntax and it gives a specific meaning. Compound noun acts as one unit and it is spelled separately unless an established compound noun is written closely from two words. The basic characteristics of compound noun can be seen in the Malay sentences which are the frequency of that word in the text itself. Thus, this extraction of compound nouns is significant for the following research which is text summarization, grammar checker, sentiments analysis, machine translation and word categorization. There are many research efforts that have been proposed in extracting Malay compound noun using linguistic approaches. Most of the existing methods were done on the extraction of bi-gram noun+noun compound. However, the result still produces some problems as to give a better result. This paper explores a linguistic method for extracting compound Noun from stand Malay corpus. A standard dataset are used to provide a common platform for evaluating research on the recognition of compound Nouns in Malay sentences. Therefore, an improvement for the effectiveness of the compound noun extraction is needed because the result can be compromised. Thus, this study proposed a modification of linguistic approach in order to enhance the extraction of compound nouns processing. Several pre-processing steps are involved including normalization, tokenization and tagging. The first step that uses the linguistic approach in this study is Part-of-Speech (POS) tagging. Finally, we describe several rules-based and modify the rules to get the most relevant relation between the first word and the second word in order to assist us in solving of the problems. The effectiveness of the relations used in our study can be measured using recall, precision and F1-score techniques. The comparison of the baseline values is very essential because it can provide whether there has been an improvement in the result.
Carroll, John A; Smith, Helen E; Scott, Donia; Cassell, Jackie A
2016-01-01
Background Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall). PMID:26911811
Template-based procedures for neural network interpretation.
Alexander, J A.; Mozer, M C.
1999-04-01
Although neural networks often achieve impressive learning and generalization performance, their internal workings are typically all but impossible to decipher. This characteristic of the networks, their opacity, is one of the disadvantages of connectionism compared to more traditional, rule-oriented approaches to artificial intelligence. Without a thorough understanding of the network behavior, confidence in a system's results is lowered, and the transfer of learned knowledge to other processing systems - including humans - is precluded. Methods that address the opacity problem by casting network weights in symbolic terms are commonly referred to as rule extraction techniques. This work describes a principled approach to symbolic rule extraction from standard multilayer feedforward networks based on the notion of weight templates, parameterized regions of weight space corresponding to specific symbolic expressions. With an appropriate choice of representation, we show how template parameters may be efficiently identified and instantiated to yield the optimal match to the actual weights of a unit. Depending on the requirements of the application domain, the approach can accommodate n-ary disjunctions and conjunctions with O(k) complexity, simple n-of-m expressions with O(k(2)) complexity, or more general classes of recursive n-of-m expressions with O(k(L+2)) complexity, where k is the number of inputs to an unit and L the recursion level of the expression class. Compared to other approaches in the literature, our method of rule extraction offers benefits in simplicity, computational performance, and overall flexibility. Simulation results on a variety of problems demonstrate the application of our procedures as well as the strengths and the weaknesses of our general approach.
Classification of the Gabon SAR Mosaic Using a Wavelet Based Rule Classifier
NASA Technical Reports Server (NTRS)
Simard, Marc; Saatchi, Sasan; DeGrandi, Gianfranco
2000-01-01
A method is developed for semi-automated classification of SAR images of the tropical forest. Information is extracted using the wavelet transform (WT). The transform allows for extraction of structural information in the image as a function of scale. In order to classify the SAR image, a Desicion Tree Classifier is used. The method of pruning is used to optimize classification rate versus tree size. The results give explicit insight on the type of information useful for a given class.
Dehghani Soufi, Mahsa; Samad-Soltani, Taha; Shams Vahdati, Samad; Rezaei-Hachesu, Peyman
2018-06-01
Fast and accurate patient triage for the response process is a critical first step in emergency situations. This process is often performed using a paper-based mode, which intensifies workload and difficulty, wastes time, and is at risk of human errors. This study aims to design and evaluate a decision support system (DSS) to determine the triage level. A combination of the Rule-Based Reasoning (RBR) and Fuzzy Logic Classifier (FLC) approaches were used to predict the triage level of patients according to the triage specialist's opinions and Emergency Severity Index (ESI) guidelines. RBR was applied for modeling the first to fourth decision points of the ESI algorithm. The data relating to vital signs were used as input variables and modeled using fuzzy logic. Narrative knowledge was converted to If-Then rules using XML. The extracted rules were then used to create the rule-based engine and predict the triage levels. Fourteen RBR and 27 fuzzy rules were extracted and used in the rule-based engine. The performance of the system was evaluated using three methods with real triage data. The accuracy of the clinical decision support systems (CDSSs; in the test data) was 99.44%. The evaluation of the error rate revealed that, when using the traditional method, 13.4% of the patients were miss-triaged, which is statically significant. The completeness of the documentation also improved from 76.72% to 98.5%. Designed system was effective in determining the triage level of patients and it proved helpful for nurses as they made decisions, generated nursing diagnoses based on triage guidelines. The hybrid approach can reduce triage misdiagnosis in a highly accurate manner and improve the triage outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Worm, Jeffrey A.; Culas, Donald E.
1991-01-01
Computers are not designed to handle terms where uncertainty is present. To deal with uncertainty, techniques other than classical logic must be developed. This paper examines the concepts of statistical analysis, the Dempster-Shafer theory, rough set theory, and fuzzy set theory to solve this problem. The fundamentals of these theories are combined to provide the possible optimal solution. By incorporating principles from these theories, a decision-making process may be simulated by extracting two sets of fuzzy rules: certain rules and possible rules. From these rules a corresponding measure of how much we believe these rules is constructed. From this, the idea of how much a fuzzy diagnosis is definable in terms of its fuzzy attributes is studied.
ITER structural design criteria and their extension to advanced reactor blankets*1
NASA Astrophysics Data System (ADS)
Majumdar, S.; Kalinin, G.
2000-12-01
Applications of the recent ITER structural design criteria (ISDC) are illustrated by two components. First, the low-temperature-design rules are applied to copper alloys that are particularly prone to irradiation embrittlement at relatively low fluences at certain temperatures. Allowable stresses are derived and the impact of the embrittlement on allowable surface heat flux of a simple first-wall/limiter design is demonstrated. Next, the high-temperature-design rules of ISDC are applied to evaporation of lithium and vapor extraction (EVOLVE), a blanket design concept currently being investigated under the US Advanced Power Extraction (APEX) program. A single tungsten first-wall tube is considered for thermal and stress analyses by finite-element method.
Jankovic, Marko; Ogawa, Hidemitsu
2003-08-01
This paper presents one possible implementation of a transformation that performs linear mapping to a lower-dimensional subspace. Principal component subspace will be the one that will be analyzed. Idea implemented in this paper represents generalization of the recently proposed infinity OH neural method for principal component extraction. The calculations in the newly proposed method are performed locally--a feature which is usually considered as desirable from the biological point of view. Comparing to some other wellknown methods, proposed synaptic efficacy learning rule requires less information about the value of the other efficacies to make single efficacy modification. Synaptic efficacies are modified by implementation of Modulated Hebb-type (MH) learning rule. Slightly modified MH algorithm named Modulated Hebb Oja (MHO) algorithm, will be also introduced. Structural similarity of the proposed network with part of the retinal circuit will be presented, too.
A structured analysis of uncertainty surrounding modeled impacts of groundwater-extraction rules
NASA Astrophysics Data System (ADS)
Guillaume, Joseph H. A.; Qureshi, M. Ejaz; Jakeman, Anthony J.
2012-08-01
Integrating economic and groundwater models for groundwater-management can help improve understanding of trade-offs involved between conflicting socioeconomic and biophysical objectives. However, there is significant uncertainty in most strategic decision-making situations, including in the models constructed to represent them. If not addressed, this uncertainty may be used to challenge the legitimacy of the models and decisions made using them. In this context, a preliminary uncertainty analysis was conducted of a dynamic coupled economic-groundwater model aimed at assessing groundwater extraction rules. The analysis demonstrates how a variety of uncertainties in such a model can be addressed. A number of methods are used including propagation of scenarios and bounds on parameters, multiple models, block bootstrap time-series sampling and robust linear regression for model calibration. These methods are described within the context of a theoretical uncertainty management framework, using a set of fundamental uncertainty management tasks and an uncertainty typology.
Wang, Hui; Zhang, Weide; Zeng, Qiang; Li, Zuofeng; Feng, Kaiyan; Liu, Lei
2014-04-01
Extracting information from unstructured clinical narratives is valuable for many clinical applications. Although natural Language Processing (NLP) methods have been profoundly studied in electronic medical records (EMR), few studies have explored NLP in extracting information from Chinese clinical narratives. In this study, we report the development and evaluation of extracting tumor-related information from operation notes of hepatic carcinomas which were written in Chinese. Using 86 operation notes manually annotated by physicians as the training set, we explored both rule-based and supervised machine-learning approaches. Evaluating on unseen 29 operation notes, our best approach yielded 69.6% in precision, 58.3% in recall and 63.5% F-score. Copyright © 2014 Elsevier Inc. All rights reserved.
Field turbidity method for the determination of lead in acid extracts of dried paint.
Studabaker, William B; McCombs, Michelle; Sorrell, Kristen; Salmons, Cynthia; Brown, G Gordon; Binstock, David; Gutknecht, William F; Harper, Sharon L
2010-07-08
Lead, which can be found in old paint, soil, and dust, has been clearly shown to have adverse health effects on the neurological systems of both children and adults. As part of an ongoing effort to reduce childhood lead poisoning, the US Environmental Protection Agency promulgated the Lead Renovation, Repair, and Painting Program (RRP) rule requiring that paint in target housing built prior to 1978 be tested for lead before any renovation, repair, or painting activities are initiated. This rule has led to a need for a rapid, relatively easy, and an inexpensive method for measuring lead in paint. This paper presents a new method for measuring lead extracted from paint that is based on turbidimetry. This method is applicable to paint that has been collected from a surface and extracted into 25% (v/v) of nitric acid. An aliquot of the filtered extract is mixed with an aliquot of solid potassium molybdate in 1 M ammonium acetate to form a turbid suspension of lead molybdate. The lead concentration is determined using a portable turbidity meter. This turbidimetric method has a response of approximately 0.9 NTU per microg lead per mL extract, with a range of 1-1000 Nephelometric Turbidity Units (NTUs). Precision at a concentration corresponding to the EPA-mandated decision point of 1 mg of lead per cm(2) is <2%. This method is insensitive to the presence of other metals common to paint, including Ba(2+), Ca(2+), Mg(2+), Fe(3+), Co(2+), Cu(2+), and Cd(2+), at concentrations of 10 mg mL(-1) or to Zn(2+) at 50 mg mL(-1). Analysis of 14 samples from six reference materials with lead concentrations near 1 mg cm(-2) yielded a correlation to inductively coupled plasma-atomic emission spectroscopy (ICP-AES) analysis of 0.97, with an average bias of 2.8%. Twenty-four sets of either 6 or 10 paint samples each were collected from different locations in old houses, a hospital, tobacco factory, and power station. Half of each set was analyzed using rotor/stator-25% (v/v) nitric acid extraction with measurement using the new turbidimetric method, and the other half was analyzed using microwave extraction and measurement by ICP-AES. The average relative percent difference between the turbidimetric method and the ICP-AES method for the 24 sets measured as milligrams of lead per cm(2) is -0.63 +/- 32.5%; the mean difference is -2.1 +/- 7.0 mg lead per cm(2). Non-parametric and parametric statistical tests on these data showed no difference in the results for the two procedures. At the federal regulated level of 1 mg of lead per cm(2) paint, this turbidimetric method meets the performance requirements for EPA's National Lead Laboratory Accreditation Program (NLLAP) of accuracy within +/-20% and has the potential to meet the performance specifications of EPA's RRP rule.
ExaCT: automatic extraction of clinical trial characteristics from journal publications
2010-01-01
Background Clinical trials are one of the most important sources of evidence for guiding evidence-based practice and the design of new trials. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process for systematic reviews, meta-analyses, and other evidence synthesis studies. This paper presents an automatic information extraction system, called ExaCT, that assists users with locating and extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials (RCTs). Methods ExaCT consists of two parts: an information extraction (IE) engine that searches the article for text fragments that best describe the trial characteristics, and a web browser-based user interface that allows human reviewers to assess and modify the suggested selections. The IE engine uses a statistical text classifier to locate those sentences that have the highest probability of describing a trial characteristic. Then, the IE engine's second stage applies simple rules to these sentences to extract text fragments containing the target answer. The same approach is used for all 21 trial characteristics selected for this study. Results We evaluated ExaCT using 50 previously unseen articles describing RCTs. The text classifier (first stage) was able to recover 88% of relevant sentences among its top five candidates (top5 recall) with the topmost candidate being relevant in 80% of cases (top1 precision). Precision and recall of the extraction rules (second stage) were 93% and 91%, respectively. Together, the two stages of the extraction engine were able to provide (partially) correct solutions in 992 out of 1050 test tasks (94%), with a majority of these (696) representing fully correct and complete answers. Conclusions Our experiments confirmed the applicability and efficacy of ExaCT. Furthermore, they demonstrated that combining a statistical method with 'weak' extraction rules can identify a variety of study characteristics. The system is flexible and can be extended to handle other characteristics and document types (e.g., study protocols). PMID:20920176
Charmonium ground and excited states at finite temperature from complex Borel sum rules
NASA Astrophysics Data System (ADS)
Araki, Ken-Ji; Suzuki, Kei; Gubler, Philipp; Oka, Makoto
2018-05-01
Charmonium spectral functions in vector and pseudoscalar channels at finite temperature are investigated through the complex Borel sum rules and the maximum entropy method. Our approach enables us to extract the peaks corresponding to the excited charmonia, ψ‧ and ηc‧ , as well as those of the ground states, J / ψ and ηc, which has never been achieved in usual QCD sum rule analyses. We show the spectral functions in vacuum and their thermal modification around the critical temperature, which leads to the almost simultaneous melting (or peak disappearance) of the ground and excited states.
Predicting missing values in a home care database using an adaptive uncertainty rule method.
Konias, S; Gogou, G; Bamidis, P D; Vlahavas, I; Maglaveras, N
2005-01-01
Contemporary literature illustrates an abundance of adaptive algorithms for mining association rules. However, most literature is unable to deal with the peculiarities, such as missing values and dynamic data creation, that are frequently encountered in fields like medicine. This paper proposes an uncertainty rule method that uses an adaptive threshold for filling missing values in newly added records. A new approach for mining uncertainty rules and filling missing values is proposed, which is in turn particularly suitable for dynamic databases, like the ones used in home care systems. In this study, a new data mining method named FiMV (Filling Missing Values) is illustrated based on the mined uncertainty rules. Uncertainty rules have quite a similar structure to association rules and are extracted by an algorithm proposed in previous work, namely AURG (Adaptive Uncertainty Rule Generation). The main target was to implement an appropriate method for recovering missing values in a dynamic database, where new records are continuously added, without needing to specify any kind of thresholds beforehand. The method was applied to a home care monitoring system database. Randomly, multiple missing values for each record's attributes (rate 5-20% by 5% increments) were introduced in the initial dataset. FiMV demonstrated 100% completion rates with over 90% success in each case, while usual approaches, where all records with missing values are ignored or thresholds are required, experienced significantly reduced completion and success rates. It is concluded that the proposed method is appropriate for the data-cleaning step of the Knowledge Discovery process in databases. The latter, containing much significance for the output efficiency of any data mining technique, can improve the quality of the mined information.
A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records
Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela
2016-01-01
Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. PMID:26911818
A New Automatic Method of Urban Areas Mapping in East Asia from LANDSAT Data
NASA Astrophysics Data System (ADS)
XU, R.; Jia, G.
2012-12-01
Cities, as places where human activities are concentrated, account for a small percent of global land cover but are frequently cited as the chief causes of, and solutions to, climate, biogeochemistry, and hydrology processes at local, regional, and global scales. Accompanying with uncontrolled economic growth, urban sprawl has been attributed to the accelerating integration of East Asia into the world economy and involved dramatic changes in its urban form and land use. To understand the impact of urban extent on biogeophysical processes, reliable mapping of built-up areas is particularly essential in eastern cities as a result of their characteristics of smaller patches, more fragile, and a lower fraction of the urban landscape which does not have natural than in the West. Segmentation of urban land from other land-cover types using remote sensing imagery can be done by standard classification processes as well as a logic rule calculation based on spectral indices and their derivations. Efforts to establish such a logic rule with no threshold for automatically mapping are highly worthwhile. Existing automatic methods are reviewed, and then a proposed approach is introduced including the calculation of the new index and the improved logic rule. Following this, existing automatic methods as well as the proposed approach are compared in a common context. Afterwards, the proposed approach is tested separately in cities of large, medium, and small scale in East Asia selected from different LANDSAT images. The results are promising as the approach can efficiently segment urban areas, even in the presence of more complex eastern cities. Key words: Urban extraction; Automatic Method; Logic Rule; LANDSAT images; East AisaThe Proposed Approach of Extraction of Urban Built-up Areas in Guangzhou, China
OPE, charm-quark mass, and decay constants of D and Ds mesons from QCD sum rules
Lucha, Wolfgang; Melikhov, Dmitri; Simula, Silvano
2011-01-01
We present a sum-rule extraction of the decay constants of the charmed mesons D and Ds from the two-point correlator of pseudoscalar currents. First, we compare the perturbative expansion for the correlator and the decay constant performed in terms of the pole and the running MS¯ masses of the charm quark. The perturbative expansion in terms of the pole mass shows no signs of convergence whereas reorganizing this very expansion in terms of the MS¯ mass leads to a distinct hierarchy of the perturbative expansion. Furthermore, the decay constants extracted from the pole-mass correlator turn out to be considerably smaller than those obtained by means of the MS¯-mass correlator. Second, making use of the OPE in terms of the MS¯ mass, we determine the decay constants of both D and Ds mesons with an emphasis on the uncertainties in these quantities related both to the input QCD parameters and to the limited accuracy of the method of sum rules. PMID:21949465
QCD sum rules study of meson-baryon sigma terms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erkol, Gueray; Oka, Makoto; Turan, Guersevil
2008-11-01
The pion-baryon sigma terms and the strange-quark condensates of the octet and the decuplet baryons are calculated by employing the method of QCD sum rules. We evaluate the vacuum-to-vacuum transition matrix elements of two baryon interpolating fields in an external isoscalar-scalar field and use a Monte Carlo-based approach to systematically analyze the sum rules and the uncertainties in the results. We extract the ratios of the sigma terms, which have rather high accuracy and minimal dependence on QCD parameters. We discuss the sources of uncertainties and comment on possible strangeness content of the nucleon and the Delta.
Improving KPCA Online Extraction by Orthonormalization in the Feature Space.
Souza Filho, Joao B O; Diniz, Paulo S R
2018-04-01
Recently, some online kernel principal component analysis (KPCA) techniques based on the generalized Hebbian algorithm (GHA) were proposed for use in large data sets, defining kernel components using concise dictionaries automatically extracted from data. This brief proposes two new online KPCA extraction algorithms, exploiting orthogonalized versions of the GHA rule. In both the cases, the orthogonalization of kernel components is achieved by the inclusion of some low complexity additional steps to the kernel Hebbian algorithm, thus not substantially affecting the computational cost of the algorithm. Results show improved convergence speed and accuracy of components extracted by the proposed methods, as compared with the state-of-the-art online KPCA extraction algorithms.
Extracting rate changes in transcriptional regulation from MEDLINE abstracts.
Liu, Wenting; Miao, Kui; Li, Guangxia; Chang, Kuiyu; Zheng, Jie; Rajapakse, Jagath C
2014-01-01
Time delays are important factors that are often neglected in gene regulatory network (GRN) inference models. Validating time delays from knowledge bases is a challenge since the vast majority of biological databases do not record temporal information of gene regulations. Biological knowledge and facts on gene regulations are typically extracted from bio-literature with specialized methods that depend on the regulation task. In this paper, we mine evidences for time delays related to the transcriptional regulation of yeast from the PubMed abstracts. Since the vast majority of abstracts lack quantitative time information, we can only collect qualitative evidences of time delays. Specifically, the speed-up or delay in transcriptional regulation rate can provide evidences for time delays (shorter or longer) in GRN. Thus, we focus on deriving events related to rate changes in transcriptional regulation. A corpus of yeast regulation related abstracts was manually labeled with such events. In order to capture these events automatically, we create an ontology of sub-processes that are likely to result in transcription rate changes by combining textual patterns and biological knowledge. We also propose effective feature extraction methods based on the created ontology to identify the direct evidences with specific details of these events. Our ontologies outperform existing state-of-the-art gene regulation ontologies in the automatic rule learning method applied to our corpus. The proposed deterministic ontology rule-based method can achieve comparable performance to the automatic rule learning method based on decision trees. This demonstrates the effectiveness of our ontology in identifying rate-changing events. We also tested the effectiveness of the proposed feature mining methods on detecting direct evidence of events. Experimental results show that the machine learning method on these features achieves an F1-score of 71.43%. The manually labeled corpus of events relating to rate changes in transcriptional regulation for yeast is available in https://sites.google.com/site/wentingntu/data. The created ontologies summarized both biological causes of rate changes in transcriptional regulation and corresponding positive and negative textual patterns from the corpus. They are demonstrated to be effective in identifying rate-changing events, which shows the benefits of combining textual patterns and biological knowledge on extracting complex biological events.
Liu, Yang; Yin, Xiu-Wen; Wang, Zi-Yu; Li, Xue-Lian; Pan, Meng; Li, Yan-Ping; Dong, Ling
2017-11-01
One of the advantages of biopharmaceutics classification system of Chinese materia medica (CMMBCS) is expanding the classification research level from single ingredient to multi-components of Chinese herb, and from multi-components research to holistic research of the Chinese materia medica. In present paper, the alkaloids of extract of huanglian were chosen as the main research object to explore their change rules in solubility and intestinal permeability of single-component and multi-components, and to determine the biopharmaceutical classification of extract of Huanglian from holistic level. The typical shake-flask method and HPLC were used to detect the solubility of single ingredient of alkaloids from extract of huanglian. The quantitative research of alkaloids in intestinal absorption was measured in single-pass intestinal perfusion experiment while permeability coefficient of extract of huanglian was calculated by self-defined weight coefficient method. Copyright© by the Chinese Pharmaceutical Association.
Artificial Intelligence Methods Applied to Parameter Detection of Atrial Fibrillation
NASA Astrophysics Data System (ADS)
Arotaritei, D.; Rotariu, C.
2015-09-01
In this paper we present a novel method to develop an atrial fibrillation (AF) based on statistical descriptors and hybrid neuro-fuzzy and crisp system. The inference of system produce rules of type if-then-else that care extracted to construct a binary decision system: normal of atrial fibrillation. We use TPR (Turning Point Ratio), SE (Shannon Entropy) and RMSSD (Root Mean Square of Successive Differences) along with a new descriptor, Teager- Kaiser energy, in order to improve the accuracy of detection. The descriptors are calculated over a sliding window that produce very large number of vectors (massive dataset) used by classifier. The length of window is a crisp descriptor meanwhile the rest of descriptors are interval-valued type. The parameters of hybrid system are adapted using Genetic Algorithm (GA) algorithm with fitness single objective target: highest values for sensibility and sensitivity. The rules are extracted and they are part of the decision system. The proposed method was tested using the Physionet MIT-BIH Atrial Fibrillation Database and the experimental results revealed a good accuracy of AF detection in terms of sensitivity and specificity (above 90%).
Learning temporal rules to forecast instability in continuously monitored patients
Dubrawski, Artur; Wang, Donghan; Hravnak, Marilyn; Clermont, Gilles; Pinsky, Michael R
2017-01-01
Inductive machine learning, and in particular extraction of association rules from data, has been successfully used in multiple application domains, such as market basket analysis, disease prognosis, fraud detection, and protein sequencing. The appeal of rule extraction techniques stems from their ability to handle intricate problems yet produce models based on rules that can be comprehended by humans, and are therefore more transparent. Human comprehension is a factor that may improve adoption and use of data-driven decision support systems clinically via face validity. In this work, we explore whether we can reliably and informatively forecast cardiorespiratory instability (CRI) in step-down unit (SDU) patients utilizing data from continuous monitoring of physiologic vital sign (VS) measurements. We use a temporal association rule extraction technique in conjunction with a rule fusion protocol to learn how to forecast CRI in continuously monitored patients. We detail our approach and present and discuss encouraging empirical results obtained using continuous multivariate VS data from the bedside monitors of 297 SDU patients spanning 29 346 hours (3.35 patient-years) of observation. We present example rules that have been learned from data to illustrate potential benefits of comprehensibility of the extracted models, and we analyze the empirical utility of each VS as a potential leading indicator of an impending CRI event. PMID:27274020
Validity of association rules extracted by healthcare-data-mining.
Takeuchi, Hiroshi; Kodama, Naoki
2014-01-01
A personal healthcare system used with cloud computing has been developed. It enables a daily time-series of personal health and lifestyle data to be stored in the cloud through mobile devices. The cloud automatically extracts personally useful information, such as rules and patterns concerning the user's lifestyle and health condition embedded in their personal big data, by using healthcare-data-mining. This study has verified that the extracted rules on the basis of a daily time-series data stored during a half- year by volunteer users of this system are valid.
Automatic sentence extraction for the detection of scientific paper relations
NASA Astrophysics Data System (ADS)
Sibaroni, Y.; Prasetiyowati, S. S.; Miftachudin, M.
2018-03-01
The relations between scientific papers are very useful for researchers to see the interconnection between scientific papers quickly. By observing the inter-article relationships, researchers can identify, among others, the weaknesses of existing research, performance improvements achieved to date, and tools or data typically used in research in specific fields. So far, methods that have been developed to detect paper relations include machine learning and rule-based methods. However, a problem still arises in the process of sentence extraction from scientific paper documents, which is still done manually. This manual process causes the detection of scientific paper relations longer and inefficient. To overcome this problem, this study performs an automatic sentences extraction while the paper relations are identified based on the citation sentence. The performance of the built system is then compared with that of the manual extraction system. The analysis results suggested that the automatic sentence extraction indicates a very high level of performance in the detection of paper relations, which is close to that of manual sentence extraction.
Creating an ontology driven rules base for an expert system for medical diagnosis.
Bertaud Gounot, Valérie; Donfack, Valéry; Lasbleiz, Jérémy; Bourde, Annabel; Duvauferrier, Régis
2011-01-01
Expert systems of the 1980s have failed on the difficulties of maintaining large rule bases. The current work proposes a method to achieve and maintain rule bases grounded on ontologies (like NCIT). The process described here for an expert system on plasma cell disorder encompasses extraction of a sub-ontology and automatic and comprehensive generation of production rules. The creation of rules is not based directly on classes, but on individuals (instances). Instances can be considered as prototypes of diseases formally defined by "destrictions" in the ontology. Thus, it is possible to use this process to make diagnoses of diseases. The perspectives of this work are considered: the process described with an ontology formalized in OWL1 can be extended by using an ontology in OWL2 and allow reasoning about numerical data in addition to symbolic data.
Hierarchical extraction of urban objects from mobile laser scanning data
NASA Astrophysics Data System (ADS)
Yang, Bisheng; Dong, Zhen; Zhao, Gang; Dai, Wenxia
2015-01-01
Point clouds collected in urban scenes contain a huge number of points (e.g., billions), numerous objects with significant size variability, complex and incomplete structures, and variable point densities, raising great challenges for the automated extraction of urban objects in the field of photogrammetry, computer vision, and robotics. This paper addresses these challenges by proposing an automated method to extract urban objects robustly and efficiently. The proposed method generates multi-scale supervoxels from 3D point clouds using the point attributes (e.g., colors, intensities) and spatial distances between points, and then segments the supervoxels rather than individual points by combining graph based segmentation with multiple cues (e.g., principal direction, colors) of the supervoxels. The proposed method defines a set of rules for merging segments into meaningful units according to types of urban objects and forms the semantic knowledge of urban objects for the classification of objects. Finally, the proposed method extracts and classifies urban objects in a hierarchical order ranked by the saliency of the segments. Experiments show that the proposed method is efficient and robust for extracting buildings, streetlamps, trees, telegraph poles, traffic signs, cars, and enclosures from mobile laser scanning (MLS) point clouds, with an overall accuracy of 92.3%.
Hervás, César; Silva, Manuel; Serrano, Juan Manuel; Orejuela, Eva
2004-01-01
The suitability of an approach for extracting heuristic rules from trained artificial neural networks (ANNs) pruned by a regularization method and with architectures designed by evolutionary computation for quantifying highly overlapping chromatographic peaks is demonstrated. The ANN input data are estimated by the Levenberg-Marquardt method in the form of a four-parameter Weibull curve associated with the profile of the chromatographic band. To test this approach, two N-methylcarbamate pesticides, carbofuran and propoxur, were quantified using a classic peroxyoxalate chemiluminescence reaction as a detection system for chromatographic analysis. Straightforward network topologies (one and two outputs models) allow the analytes to be quantified in concentration ratios ranging from 1:7 to 5:1 with an average standard error of prediction for the generalization test of 2.7 and 2.3% for carbofuran and propoxur, respectively. The reduced dimensions of the selected ANN architectures, especially those obtained after using heuristic rules, allowed simple quantification equations to be developed that transform the input variables into output variables. These equations can be easily interpreted from a chemical point of view to attain quantitative analytical information regarding the effect of both analytes on the characteristics of chromatographic bands, namely profile, dispersion, peak height, and residence time. Copyright 2004 American Chemical Society
Knowledge-based approach to video content classification
NASA Astrophysics Data System (ADS)
Chen, Yu; Wong, Edward K.
2001-01-01
A framework for video content classification using a knowledge-based approach is herein proposed. This approach is motivated by the fact that videos are rich in semantic contents, which can best be interpreted and analyzed by human experts. We demonstrate the concept by implementing a prototype video classification system using the rule-based programming language CLIPS 6.05. Knowledge for video classification is encoded as a set of rules in the rule base. The left-hand-sides of rules contain high level and low level features, while the right-hand-sides of rules contain intermediate results or conclusions. Our current implementation includes features computed from motion, color, and text extracted from video frames. Our current rule set allows us to classify input video into one of five classes: news, weather, reporting, commercial, basketball and football. We use MYCIN's inexact reasoning method for combining evidences, and to handle the uncertainties in the features and in the classification results. We obtained good results in a preliminary experiment, and it demonstrated the validity of the proposed approach.
Knowledge-based approach to video content classification
NASA Astrophysics Data System (ADS)
Chen, Yu; Wong, Edward K.
2000-12-01
A framework for video content classification using a knowledge-based approach is herein proposed. This approach is motivated by the fact that videos are rich in semantic contents, which can best be interpreted and analyzed by human experts. We demonstrate the concept by implementing a prototype video classification system using the rule-based programming language CLIPS 6.05. Knowledge for video classification is encoded as a set of rules in the rule base. The left-hand-sides of rules contain high level and low level features, while the right-hand-sides of rules contain intermediate results or conclusions. Our current implementation includes features computed from motion, color, and text extracted from video frames. Our current rule set allows us to classify input video into one of five classes: news, weather, reporting, commercial, basketball and football. We use MYCIN's inexact reasoning method for combining evidences, and to handle the uncertainties in the features and in the classification results. We obtained good results in a preliminary experiment, and it demonstrated the validity of the proposed approach.
Association Rule Based Feature Extraction for Character Recognition
NASA Astrophysics Data System (ADS)
Dua, Sumeet; Singh, Harpreet
Association rules that represent isomorphisms among data have gained importance in exploratory data analysis because they can find inherent, implicit, and interesting relationships among data. They are also commonly used in data mining to extract the conditions among attribute values that occur together frequently in a dataset [1]. These rules have wide range of applications, namely in the financial and retail sectors of marketing, sales, and medicine.
Sensory Intelligence for Extraction of an Abstract Auditory Rule: A Cross-Linguistic Study.
Guo, Xiao-Tao; Wang, Xiao-Dong; Liang, Xiu-Yuan; Wang, Ming; Chen, Lin
2018-02-21
In a complex linguistic environment, while speech sounds can greatly vary, some shared features are often invariant. These invariant features constitute so-called abstract auditory rules. Our previous study has shown that with auditory sensory intelligence, the human brain can automatically extract the abstract auditory rules in the speech sound stream, presumably serving as the neural basis for speech comprehension. However, whether the sensory intelligence for extraction of abstract auditory rules in speech is inherent or experience-dependent remains unclear. To address this issue, we constructed a complex speech sound stream using auditory materials in Mandarin Chinese, in which syllables had a flat lexical tone but differed in other acoustic features to form an abstract auditory rule. This rule was occasionally and randomly violated by the syllables with the rising, dipping or falling tone. We found that both Chinese and foreign speakers detected the violations of the abstract auditory rule in the speech sound stream at a pre-attentive stage, as revealed by the whole-head recordings of mismatch negativity (MMN) in a passive paradigm. However, MMNs peaked earlier in Chinese speakers than in foreign speakers. Furthermore, Chinese speakers showed different MMN peak latencies for the three deviant types, which paralleled recognition points. These findings indicate that the sensory intelligence for extraction of abstract auditory rules in speech sounds is innate but shaped by language experience. Copyright © 2018 IBRO. Published by Elsevier Ltd. All rights reserved.
Unsupervised Ontology Generation from Unstructured Text. CRESST Report 827
ERIC Educational Resources Information Center
Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.
2013-01-01
Ontologies are a vital component of most knowledge acquisition systems, and recently there has been a huge demand for generating ontologies automatically since manual or supervised techniques are not scalable. In this paper, we introduce "OntoMiner", a rule-based, iterative method to extract and populate ontologies from unstructured or…
[An object-based information extraction technology for dominant tree species group types].
Tian, Tian; Fan, Wen-yi; Lu, Wei; Xiao, Xiang
2015-06-01
Information extraction for dominant tree group types is difficult in remote sensing image classification, howevers, the object-oriented classification method using high spatial resolution remote sensing data is a new method to realize the accurate type information extraction. In this paper, taking the Jiangle Forest Farm in Fujian Province as the research area, based on the Quickbird image data in 2013, the object-oriented method was adopted to identify the farmland, shrub-herbaceous plant, young afforested land, Pinus massoniana, Cunninghamia lanceolata and broad-leave tree types. Three types of classification factors including spectral, texture, and different vegetation indices were used to establish a class hierarchy. According to the different levels, membership functions and the decision tree classification rules were adopted. The results showed that the method based on the object-oriented method by using texture, spectrum and the vegetation indices achieved the classification accuracy of 91.3%, which was increased by 5.7% compared with that by only using the texture and spectrum.
Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method.
Liu, H; Lussier, Y A; Friedman, C
2001-08-01
With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.
Clinic expert information extraction based on domain model and block importance model.
Zhang, Yuanpeng; Wang, Li; Qian, Danmin; Geng, Xingyun; Yao, Dengfu; Dong, Jiancheng
2015-11-01
To extract expert clinic information from the Deep Web, there are two challenges to face. The first one is to make a judgment on forms. A novel method based on a domain model, which is a tree structure constructed by the attributes of query interfaces is proposed. With this model, query interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from response Web pages indexed by query interfaces. To filter the noisy information on a Web page, a block importance model is proposed, both content and spatial features are taken into account in this model. The experimental results indicate that the domain model yields a precision 4.89% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method. Copyright © 2015 Elsevier Ltd. All rights reserved.
Extracting Characteristics of the Study Subjects from Full-Text Articles
Demner-Fushman, Dina; Mork, James G
2015-01-01
Characteristics of the subjects of biomedical research are important in determining if a publication describing the research is relevant to a search. To facilitate finding relevant publications, MEDLINE citations provide Medical Subject Headings that describe the subjects’ characteristics, such as their species, gender, and age. We seek to improve the recommendation of these headings by the Medical Text Indexer (MTI) that supports manual indexing of MEDLINE. To that end, we explore the potential of the full text of the publications. Using simple recall-oriented rule-based methods we determined that adding sentences extracted from the methods sections and captions to the abstracts prior to MTI processing significantly improved recall and F1 score with only a slight drop in precision. Improvements were also achieved in directly assigning several headings extracted from the full text. These results indicate the need for further development of automated methods capable of leveraging the full text for indexing. PMID:26958181
Wheeler, David C.; Burstyn, Igor; Vermeulen, Roel; Yu, Kai; Shortreed, Susan M.; Pronk, Anjoeka; Stewart, Patricia A.; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Silverman, Debra T.; Friesen, Melissa C.
2014-01-01
Objectives Evaluating occupational exposures in population-based case-control studies often requires exposure assessors to review each study participants' reported occupational information job-by-job to derive exposure estimates. Although such assessments likely have underlying decision rules, they usually lack transparency, are time-consuming and have uncertain reliability and validity. We aimed to identify the underlying rules to enable documentation, review, and future use of these expert-based exposure decisions. Methods Classification and regression trees (CART, predictions from a single tree) and random forests (predictions from many trees) were used to identify the underlying rules from the questionnaire responses and an expert's exposure assignments for occupational diesel exhaust exposure for several metrics: binary exposure probability and ordinal exposure probability, intensity, and frequency. Data were split into training (n=10,488 jobs), testing (n=2,247), and validation (n=2,248) data sets. Results The CART and random forest models' predictions agreed with 92–94% of the expert's binary probability assignments. For ordinal probability, intensity, and frequency metrics, the two models extracted decision rules more successfully for unexposed and highly exposed jobs (86–90% and 57–85%, respectively) than for low or medium exposed jobs (7–71%). Conclusions CART and random forest models extracted decision rules and accurately predicted an expert's exposure decisions for the majority of jobs and identified questionnaire response patterns that would require further expert review if the rules were applied to other jobs in the same or different study. This approach makes the exposure assessment process in case-control studies more transparent and creates a mechanism to efficiently replicate exposure decisions in future studies. PMID:23155187
NASA Astrophysics Data System (ADS)
Du, Y.; Fan, X.; He, Z.; Su, F.; Zhou, C.; Mao, H.; Wang, D.
2011-06-01
In this paper, a rough set theory is introduced to represent spatial-temporal relationships and extract the corresponding rules from typical mesoscale-eddy states in the South China Sea (SCS). Three decision attributes are adopted in this study, which make the approach flexible in retrieving spatial-temporal rules with different features. Spatial-temporal rules of typical states in the SCS are extracted as three decision attributes, which then are confirmed by the previous works. The results demonstrate that this approach is effective in extracting spatial-temporal rules from typical mesoscale-eddy states, and therefore provides a powerful approach to forecasts in the future. Spatial-temporal rules in the SCS indicate that warm eddies following the rules are generally in the southeastern and central SCS around 2000 m isobaths in winter. Their intensity and vorticity are weaker than those of cold eddies. They usually move a shorter distance. By contrast, cold eddies are in 2000 m-deeper regions of the southwestern and northeastern SCS in spring and fall. Their intensity and vorticity are strong. Usually they move a long distance. In winter, a few rules are followed by cold eddies in the northern tip of the basin and southwest of Taiwan Island rather than warm eddies, indicating cold eddies may be well-regulated in the region. Several warm-eddy rules are achieved west of Luzon Island, indicating warm eddies may be well-regulated in the region as well. Otherwise, warm and cold eddies are distributed not only in the jet flow off southern Vietnam induced by intraseasonal wind stress in summer-fall, but also in the northern shallow water, which should be a focus of future study.
Recommendation System Based On Association Rules For Distributed E-Learning Management Systems
NASA Astrophysics Data System (ADS)
Mihai, Gabroveanu
2015-09-01
Traditional Learning Management Systems are installed on a single server where learning materials and user data are kept. To increase its performance, the Learning Management System can be installed on multiple servers; learning materials and user data could be distributed across these servers obtaining a Distributed Learning Management System. In this paper is proposed the prototype of a recommendation system based on association rules for Distributed Learning Management System. Information from LMS databases is analyzed using distributed data mining algorithms in order to extract the association rules. Then the extracted rules are used as inference rules to provide personalized recommendations. The quality of provided recommendations is improved because the rules used to make the inferences are more accurate, since these rules aggregate knowledge from all e-Learning systems included in Distributed Learning Management System.
Learning temporal rules to forecast instability in continuously monitored patients.
Guillame-Bert, Mathieu; Dubrawski, Artur; Wang, Donghan; Hravnak, Marilyn; Clermont, Gilles; Pinsky, Michael R
2017-01-01
Inductive machine learning, and in particular extraction of association rules from data, has been successfully used in multiple application domains, such as market basket analysis, disease prognosis, fraud detection, and protein sequencing. The appeal of rule extraction techniques stems from their ability to handle intricate problems yet produce models based on rules that can be comprehended by humans, and are therefore more transparent. Human comprehension is a factor that may improve adoption and use of data-driven decision support systems clinically via face validity. In this work, we explore whether we can reliably and informatively forecast cardiorespiratory instability (CRI) in step-down unit (SDU) patients utilizing data from continuous monitoring of physiologic vital sign (VS) measurements. We use a temporal association rule extraction technique in conjunction with a rule fusion protocol to learn how to forecast CRI in continuously monitored patients. We detail our approach and present and discuss encouraging empirical results obtained using continuous multivariate VS data from the bedside monitors of 297 SDU patients spanning 29 346 hours (3.35 patient-years) of observation. We present example rules that have been learned from data to illustrate potential benefits of comprehensibility of the extracted models, and we analyze the empirical utility of each VS as a potential leading indicator of an impending CRI event. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Exploratory factor analysis in Rehabilitation Psychology: a content analysis.
Roberson, Richard B; Elliott, Timothy R; Chang, Jessica E; Hill, Jessica N
2014-11-01
Our objective was to examine the use and quality of exploratory factor analysis (EFA) in articles published in Rehabilitation Psychology. Trained raters examined 66 separate exploratory factor analyses in 47 articles published between 1999 and April 2014. The raters recorded the aim of the EFAs, the distributional statistics, sample size, factor retention method(s), extraction and rotation method(s), and whether the pattern coefficients, structure coefficients, and the matrix of association were reported. The primary use of the EFAs was scale development, but the most widely used extraction and rotation method was principle component analysis, with varimax rotation. When determining how many factors to retain, multiple methods (e.g., scree plot, parallel analysis) were used most often. Many articles did not report enough information to allow for the duplication of their results. EFA relies on authors' choices (e.g., factor retention rules extraction, rotation methods), and few articles adhered to all of the best practices. The current findings are compared to other empirical investigations into the use of EFA in published research. Recommendations for improving EFA reporting practices in rehabilitation psychology research are provided.
Brain Dynamics Sustaining Rapid Rule Extraction from Speech
ERIC Educational Resources Information Center
de Diego-Balaguer, Ruth; Fuentemilla, Lluis; Rodriguez-Fornells, Antoni
2011-01-01
Language acquisition is a complex process that requires the synergic involvement of different cognitive functions, which include extracting and storing the words of the language and their embedded rules for progressive acquisition of grammatical information. As has been shown in other fields that study learning processes, synchronization…
Federal Register 2010, 2011, 2012, 2013, 2014
2011-10-17
... natural gas. 211112 Natural gas liquid extraction facilities. Petrochemical Production....... 32511... facilities. 211112 Natural gas liquid extraction facilities. Suppliers of Industrial 325120 Industrial gas... reference in EPA's procedures for handling data collected under the Mandatory Greenhouse Gas Reporting Rule...
NASA Astrophysics Data System (ADS)
Bayram, B.; Erdem, F.; Akpinar, B.; Ince, A. K.; Bozkurt, S.; Catal Reis, H.; Seker, D. Z.
2017-11-01
Coastal monitoring plays a vital role in environmental planning and hazard management related issues. Since shorelines are fundamental data for environment management, disaster management, coastal erosion studies, modelling of sediment transport and coastal morphodynamics, various techniques have been developed to extract shorelines. Random Forest is one of these techniques which is used in this study for shoreline extraction.. This algorithm is a machine learning method based on decision trees. Decision trees analyse classes of training data creates rules for classification. In this study, Terkos region has been chosen for the proposed method within the scope of "TUBITAK Project (Project No: 115Y718) titled "Integration of Unmanned Aerial Vehicles for Sustainable Coastal Zone Monitoring Model - Three-Dimensional Automatic Coastline Extraction and Analysis: Istanbul-Terkos Example". Random Forest algorithm has been implemented to extract the shoreline of the Black Sea where near the lake from LANDSAT-8 and GOKTURK-2 satellite imageries taken in 2015. The MATLAB environment was used for classification. To obtain land and water-body classes, the Random Forest method has been applied to NIR bands of LANDSAT-8 (5th band) and GOKTURK-2 (4th band) imageries. Each image has been digitized manually and shorelines obtained for accuracy assessment. According to accuracy assessment results, Random Forest method is efficient for both medium and high resolution images for shoreline extraction studies.
Hassanpour, Saeed; O'Connor, Martin J; Das, Amar K
2013-08-12
A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text. Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test. Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of scientific publications.
Kaga, Chiaki; Okochi, Mina; Tomita, Yasuyuki; Kato, Ryuji; Honda, Hiroyuki
2008-03-01
We developed a method of effective peptide screening that combines experiments and computational analysis. The method is based on the concept that screening efficiency can be enhanced from even limited data by use of a model derived from computational analysis that serves as a guide to screening and combining the model with subsequent repeated experiments. Here we focus on cell-adhesion peptides as a model application of this peptide-screening strategy. Cell-adhesion peptides were screened by use of a cell-based assay of a peptide array. Starting with the screening data obtained from a limited, random 5-mer library (643 sequences), a rule regarding structural characteristics of cell-adhesion peptides was extracted by fuzzy neural network (FNN) analysis. According to this rule, peptides with unfavored residues in certain positions that led to inefficient binding were eliminated from the random sequences. In the restricted, second random library (273 sequences), the yield of cell-adhesion peptides having an adhesion rate more than 1.5-fold to that of the basal array support was significantly high (31%) compared with the unrestricted random library (20%). In the restricted third library (50 sequences), the yield of cell-adhesion peptides increased to 84%. We conclude that a repeated cycle of experiments screening limited numbers of peptides can be assisted by the rule-extracting feature of FNN.
Ravikumar, Ke; Liu, Haibin; Cohn, Judith D; Wall, Michael E; Verspoor, Karin
2012-10-05
We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.
NASA Astrophysics Data System (ADS)
Seoud, Ahmed; Kim, Juhwan; Ma, Yuansheng; Jayaram, Srividya; Hong, Le; Chae, Gyu-Yeol; Lee, Jeong-Woo; Park, Dae-Jin; Yune, Hyoung-Soon; Oh, Se-Young; Park, Chan-Ha
2018-03-01
Sub-resolution assist feature (SRAF) insertion techniques have been effectively used for a long time now to increase process latitude in the lithography patterning process. Rule-based SRAF and model-based SRAF are complementary solutions, and each has its own benefits, depending on the objectives of applications and the criticality of the impact on manufacturing yield, efficiency, and productivity. Rule-based SRAF provides superior geometric output consistency and faster runtime performance, but the associated recipe development time can be of concern. Model-based SRAF provides better coverage for more complicated pattern structures in terms of shapes and sizes, with considerably less time required for recipe development, although consistency and performance may be impacted. In this paper, we introduce a new model-assisted template extraction (MATE) SRAF solution, which employs decision tree learning in a model-based solution to provide the benefits of both rule-based and model-based SRAF insertion approaches. The MATE solution is designed to automate the creation of rules/templates for SRAF insertion, and is based on the SRAF placement predicted by model-based solutions. The MATE SRAF recipe provides optimum lithographic quality in relation to various manufacturing aspects in a very short time, compared to traditional methods of rule optimization. Experiments were done using memory device pattern layouts to compare the MATE solution to existing model-based SRAF and pixelated SRAF approaches, based on lithographic process window quality, runtime performance, and geometric output consistency.
Automated rule-base creation via CLIPS-Induce
NASA Technical Reports Server (NTRS)
Murphy, Patrick M.
1994-01-01
Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.
Dahamna, Badisse; Guillemin-Lanne, Sylvie; Darmoni, Stefan J; Faviez, Carole; Huot, Charles; Katsahian, Sandrine; Leroux, Vincent; Pereira, Suzanne; Richard, Christophe; Schück, Stéphane; Souvignet, Julien; Lillo-Le Louët, Agnès; Texier, Nathalie
2017-01-01
Background Adverse drug reactions (ADRs) are an important cause of morbidity and mortality. Classical Pharmacovigilance process is limited by underreporting which justifies the current interest in new knowledge sources such as social media. The Adverse Drug Reactions from Patient Reports in Social Media (ADR-PRISM) project aims to extract ADRs reported by patients in these media. We identified 5 major challenges to overcome to operationalize the analysis of patient posts: (1) variable quality of information on social media, (2) guarantee of data privacy, (3) response to pharmacovigilance expert expectations, (4) identification of relevant information within Web pages, and (5) robust and evolutive architecture. Objective This article aims to describe the current state of advancement of the ADR-PRISM project by focusing on the solutions we have chosen to address these 5 major challenges. Methods In this article, we propose methods and describe the advancement of this project on several aspects: (1) a quality driven approach for selecting relevant social media for the extraction of knowledge on potential ADRs, (2) an assessment of ethical issues and French regulation for the analysis of data on social media, (3) an analysis of pharmacovigilance expert requirements when reviewing patient posts on the Internet, (4) an extraction method based on natural language processing, pattern based matching, and selection of relevant medical concepts in reference terminologies, and (5) specifications of a component-based architecture for the monitoring system. Results Considering the 5 major challenges, we (1) selected a set of 21 validated criteria for selecting social media to support the extraction of potential ADRs, (2) proposed solutions to guarantee data privacy of patients posting on Internet, (3) took into account pharmacovigilance expert requirements with use case diagrams and scenarios, (4) built domain-specific knowledge resources embeding a lexicon, morphological rules, context rules, semantic rules, syntactic rules, and post-analysis processing, and (5) proposed a component-based architecture that allows storage of big data and accessibility to third-party applications through Web services. Conclusions We demonstrated the feasibility of implementing a component-based architecture that allows collection of patient posts on the Internet, near real-time processing of those posts including annotation, and storage in big data structures. In the next steps, we will evaluate the posts identified by the system in social media to clarify the interest and relevance of such approach to improve conventional pharmacovigilance processes based on spontaneous reporting. PMID:28935617
TEES 2.2: Biomedical Event Extraction for Diverse Corpora
2015-01-01
Background The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. Results The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. Conclusions The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented. PMID:26551925
TEES 2.2: Biomedical Event Extraction for Diverse Corpora.
Björne, Jari; Salakoski, Tapio
2015-01-01
The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.
The ADE scorecards: a tool for adverse drug event detection in electronic health records.
Chazard, Emmanuel; Băceanu, Adrian; Ferret, Laurie; Ficheur, Grégoire
2011-01-01
Although several methods exist for Adverse Drug events (ADE) detection due to past hospitalizations, a tool that could display those ADEs to the physicians does not exist yet. This article presents the ADE Scorecards, a Web tool that enables to screen past hospitalizations extracted from Electronic Health Records (EHR), using a set of ADE detection rules, presently rules discovered by data mining. The tool enables the physicians to (1) get contextualized statistics about the ADEs that happen in their medical department, (2) see the rules that are useful in their department, i.e. the rules that could have enabled to prevent those ADEs and (3) review in detail the ADE cases, through a comprehensive interface displaying the diagnoses, procedures, lab results, administered drugs and anonymized records. The article shows a demonstration of the tool through a use case.
MedXN: an open source medication extraction and normalization tool for clinical text
Sohn, Sunghwan; Clark, Cheryl; Halgrim, Scott R; Murphy, Sean P; Chute, Christopher G; Liu, Hongfang
2014-01-01
Objective We developed the Medication Extraction and Normalization (MedXN) system to extract comprehensive medication information and normalize it to the most appropriate RxNorm concept unique identifier (RxCUI) as specifically as possible. Methods Medication descriptions in clinical notes were decomposed into medication name and attributes, which were separately extracted using RxNorm dictionary lookup and regular expression. Then, each medication name and its attributes were combined together according to RxNorm convention to find the most appropriate RxNorm representation. To do this, we employed serialized hierarchical steps implemented in Apache's Unstructured Information Management Architecture. We also performed synonym expansion, removed false medications, and employed inference rules to improve the medication extraction and normalization performance. Results An evaluation on test data of 397 medication mentions showed F-measures of 0.975 for medication name and over 0.90 for most attributes. The RxCUI assignment produced F-measures of 0.932 for medication name and 0.864 for full medication information. Most false negative RxCUI assignments in full medication information are due to human assumption of missing attributes and medication names in the gold standard. Conclusions The MedXN system (http://sourceforge.net/projects/ohnlp/files/MedXN/) was able to extract comprehensive medication information with high accuracy and demonstrated good normalization capability to RxCUI as long as explicit evidence existed. More sophisticated inference rules might result in further improvements to specific RxCUI assignments for incomplete medication descriptions. PMID:24637954
DecisionMaker software and extracting fuzzy rules under uncertainty
NASA Technical Reports Server (NTRS)
Walker, Kevin B.
1992-01-01
Knowledge acquisition under uncertainty is examined. Theories proposed in deKorvin's paper 'Extracting Fuzzy Rules Under Uncertainty and Measuring Definability Using Rough Sets' are discussed as they relate to rule calculation algorithms. A data structure for holding an arbitrary number of data fields is described. Limitations of Pascal for loops in the generation of combinations are also discussed. Finally, recursive algorithms for generating all possible combination of attributes and for calculating the intersection of an arbitrary number of fuzzy sets are presented.
The Role of Salience in the Extraction of Algebraic Rules
ERIC Educational Resources Information Center
Endress, Ansgar D.; Scholl, Brian J.; Mehler, Jacques
2005-01-01
Recent research suggests that humans and other animals have sophisticated abilities to extract both statistical dependencies and rule-based regularities from sequences. Most of this research stresses the flexibility and generality of such processes. Here the authors take up an equally important project, namely, to explore the limits of such…
Sleep Promotes the Extraction of Grammatical Rules
Nieuwenhuis, Ingrid L. C.; Folia, Vasiliki; Forkstam, Christian; Jensen, Ole; Petersson, Karl Magnus
2013-01-01
Grammar acquisition is a high level cognitive function that requires the extraction of complex rules. While it has been proposed that offline time might benefit this type of rule extraction, this remains to be tested. Here, we addressed this question using an artificial grammar learning paradigm. During a short-term memory cover task, eighty-one human participants were exposed to letter sequences generated according to an unknown artificial grammar. Following a time delay of 15 min, 12 h (wake or sleep) or 24 h, participants classified novel test sequences as Grammatical or Non-Grammatical. Previous behavioral and functional neuroimaging work has shown that classification can be guided by two distinct underlying processes: (1) the holistic abstraction of the underlying grammar rules and (2) the detection of sequence chunks that appear at varying frequencies during exposure. Here, we show that classification performance improved after sleep. Moreover, this improvement was due to an enhancement of rule abstraction, while the effect of chunk frequency was unaltered by sleep. These findings suggest that sleep plays a critical role in extracting complex structure from separate but related items during integrative memory processing. Our findings stress the importance of alternating periods of learning with sleep in settings in which complex information must be acquired. PMID:23755173
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sang, Yuanrui; Karayaka, H. Bora; Yan, Yanjun
The slider crank is a proven mechanical linkage system with a long history of successful applications, and the slider-crank ocean wave energy converter (WEC) is a type of WEC that converts linear motion into rotation. This paper presents a control algorithm for a slider-crank WEC. In this study, a time-domain hydrodynamic analysis is adopted, and an AC synchronous machine is used in the power take-off system to achieve relatively high system performance. Also, a rule-based phase control strategy is applied to maximize energy extraction, making the system suitable for not only regular sinusoidal waves but also irregular waves. Simulations aremore » carried out under regular sinusoidal wave and synthetically produced irregular wave conditions; performance validations are also presented with high-precision, real ocean wave surface elevation data. The influences of significant wave height, and peak period upon energy extraction of the system are studied. Energy extraction results using the proposed method are compared to those of the passive loading and complex conjugate control strategies; results show that the level of energy extraction is between those of the passive loading and complex conjugate control strategies, and the suboptimal nature of this control strategy is verified.« less
Teaching artificial neural systems to drive: Manual training techniques for autonomous systems
NASA Technical Reports Server (NTRS)
Shepanski, J. F.; Macy, S. A.
1987-01-01
A methodology was developed for manually training autonomous control systems based on artificial neural systems (ANS). In applications where the rule set governing an expert's decisions is difficult to formulate, ANS can be used to extract rules by associating the information an expert receives with the actions taken. Properly constructed networks imitate rules of behavior that permits them to function autonomously when they are trained on the spanning set of possible situations. This training can be provided manually, either under the direct supervision of a system trainer, or indirectly using a background mode where the networks assimilates training data as the expert performs its day-to-day tasks. To demonstrate these methods, an ANS network was trained to drive a vehicle through simulated freeway traffic.
NASA Astrophysics Data System (ADS)
Wang, Min; Cui, Qi; Sun, Yujie; Wang, Qiao
2018-07-01
In object-based image analysis (OBIA), object classification performance is jointly determined by image segmentation, sample or rule setting, and classifiers. Typically, as a crucial step to obtain object primitives, image segmentation quality significantly influences subsequent feature extraction and analyses. By contrast, template matching extracts specific objects from images and prevents shape defects caused by image segmentation. However, creating or editing templates is tedious and sometimes results in incomplete or inaccurate templates. In this study, we combine OBIA and template matching techniques to address these problems and aim for accurate photovoltaic panel (PVP) extraction from very high-resolution (VHR) aerial imagery. The proposed method is based on the previously proposed region-line primitive association framework, in which complementary information between region (segment) and line (straight line) primitives is utilized to achieve a more powerful performance than routine OBIA. Several novel concepts, including the mutual fitting ratio and best-fitting template based on region-line primitive association analyses, are proposed. Automatic template generation and matching method for PVP extraction from VHR imagery are designed for concept and model validation. Results show that the proposed method can successfully extract PVPs without any user-specified matching template or training sample. High user independency and accuracy are the main characteristics of the proposed method in comparison with routine OBIA and template matching techniques.
Chen, Zhenyu; Li, Jianping; Wei, Liwei
2007-10-01
Recently, gene expression profiling using microarray techniques has been shown as a promising tool to improve the diagnosis and treatment of cancer. Gene expression data contain high level of noise and the overwhelming number of genes relative to the number of available samples. It brings out a great challenge for machine learning and statistic techniques. Support vector machine (SVM) has been successfully used to classify gene expression data of cancer tissue. In the medical field, it is crucial to deliver the user a transparent decision process. How to explain the computed solutions and present the extracted knowledge becomes a main obstacle for SVM. A multiple kernel support vector machine (MK-SVM) scheme, consisting of feature selection, rule extraction and prediction modeling is proposed to improve the explanation capacity of SVM. In this scheme, we show that the feature selection problem can be translated into an ordinary multiple parameters learning problem. And a shrinkage approach: 1-norm based linear programming is proposed to obtain the sparse parameters and the corresponding selected features. We propose a novel rule extraction approach using the information provided by the separating hyperplane and support vectors to improve the generalization capacity and comprehensibility of rules and reduce the computational complexity. Two public gene expression datasets: leukemia dataset and colon tumor dataset are used to demonstrate the performance of this approach. Using the small number of selected genes, MK-SVM achieves encouraging classification accuracy: more than 90% for both two datasets. Moreover, very simple rules with linguist labels are extracted. The rule sets have high diagnostic power because of their good classification performance.
Learning accurate and interpretable models based on regularized random forests regression
2014-01-01
Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807
Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra
2015-01-01
Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.
Bimodal Emotion Congruency Is Critical to Preverbal Infants' Abstract Rule Learning
ERIC Educational Resources Information Center
Tsui, Angeline Sin Mei; Ma, Yuen Ki; Ho, Anna; Chow, Hiu Mei; Tseng, Chia-huei
2016-01-01
Extracting general rules from specific examples is important, as we must face the same challenge displayed in various formats. Previous studies have found that bimodal presentation of grammar-like rules (e.g. ABA) enhanced 5-month-olds' capacity to acquire a rule that infants failed to learn when the rule was presented with visual presentation of…
Object-Driven and Temporal Action Rules Mining
ERIC Educational Resources Information Center
Hajja, Ayman
2013-01-01
In this thesis, I present my complete research work in the field of action rules, more precisely object-driven and temporal action rules. The drive behind the introduction of object-driven and temporally based action rules is to bring forth an adapted approach to extract action rules from a subclass of systems that have a specific nature, in which…
Integrative relational machine-learning for understanding drug side-effect profiles
2013-01-01
Background Drug side effects represent a common reason for stopping drug development during clinical trials. Improving our ability to understand drug side effects is necessary to reduce attrition rates during drug development as well as the risk of discovering novel side effects in available drugs. Today, most investigations deal with isolated side effects and overlook possible redundancy and their frequent co-occurrence. Results In this work, drug annotations are collected from SIDER and DrugBank databases. Terms describing individual side effects reported in SIDER are clustered with a semantic similarity measure into term clusters (TCs). Maximal frequent itemsets are extracted from the resulting drug x TC binary table, leading to the identification of what we call side-effect profiles (SEPs). A SEP is defined as the longest combination of TCs which are shared by a significant number of drugs. Frequent SEPs are explored on the basis of integrated drug and target descriptors using two machine learning methods: decision-trees and inductive-logic programming. Although both methods yield explicit models, inductive-logic programming method performs relational learning and is able to exploit not only drug properties but also background knowledge. Learning efficiency is evaluated by cross-validation and direct testing with new molecules. Comparison of the two machine-learning methods shows that the inductive-logic-programming method displays a greater sensitivity than decision trees and successfully exploit background knowledge such as functional annotations and pathways of drug targets, thereby producing rich and expressive rules. All models and theories are available on a dedicated web site. Conclusions Side effect profiles covering significant number of drugs have been extracted from a drug ×side-effect association table. Integration of background knowledge concerning both chemical and biological spaces has been combined with a relational learning method for discovering rules which explicitly characterize drug-SEP associations. These rules are successfully used for predicting SEPs associated with new drugs. PMID:23802887
Integrative relational machine-learning for understanding drug side-effect profiles.
Bresso, Emmanuel; Grisoni, Renaud; Marchetti, Gino; Karaboga, Arnaud Sinan; Souchet, Michel; Devignes, Marie-Dominique; Smaïl-Tabbone, Malika
2013-06-26
Drug side effects represent a common reason for stopping drug development during clinical trials. Improving our ability to understand drug side effects is necessary to reduce attrition rates during drug development as well as the risk of discovering novel side effects in available drugs. Today, most investigations deal with isolated side effects and overlook possible redundancy and their frequent co-occurrence. In this work, drug annotations are collected from SIDER and DrugBank databases. Terms describing individual side effects reported in SIDER are clustered with a semantic similarity measure into term clusters (TCs). Maximal frequent itemsets are extracted from the resulting drug x TC binary table, leading to the identification of what we call side-effect profiles (SEPs). A SEP is defined as the longest combination of TCs which are shared by a significant number of drugs. Frequent SEPs are explored on the basis of integrated drug and target descriptors using two machine learning methods: decision-trees and inductive-logic programming. Although both methods yield explicit models, inductive-logic programming method performs relational learning and is able to exploit not only drug properties but also background knowledge. Learning efficiency is evaluated by cross-validation and direct testing with new molecules. Comparison of the two machine-learning methods shows that the inductive-logic-programming method displays a greater sensitivity than decision trees and successfully exploit background knowledge such as functional annotations and pathways of drug targets, thereby producing rich and expressive rules. All models and theories are available on a dedicated web site. Side effect profiles covering significant number of drugs have been extracted from a drug ×side-effect association table. Integration of background knowledge concerning both chemical and biological spaces has been combined with a relational learning method for discovering rules which explicitly characterize drug-SEP associations. These rules are successfully used for predicting SEPs associated with new drugs.
NASA Astrophysics Data System (ADS)
Tariba, N.; Bouknadel, A.; Haddou, A.; Ikken, N.; Omari, Hafsa El; Omari, Hamid El
2017-01-01
The Photovoltaic Generator have a nonlinear characteristic function relating the intensity at the voltage I = f (U) and depend on the variation of solar irradiation and temperature, In addition, its point of operation depends directly on the load that it supplies. To fix this drawback, and to extract the maximum power available to the terminal of the generator, an adaptation stage is introduced between the generator and the load to couple the two elements as perfectly as possible. The adaptation stage is associated with a command called MPPT MPPT (Maximum Power Point Tracker) whose is used to force the PVG to operate at the MPP (Maximum Power Point) under variation of climatic conditions and load variation. This paper presents a comparative study between the adaptive controller for PV Systems using MIT rules and Lyapunov method to regulate the PV voltage. The Incremental Conductance (IC) algorithm is used to extract the maximum power from the PVG by calculating the voltage Vref, and the adaptive controller is used to regulate and track quickly the PV voltage. The two methods of the adaptive controller will be compared to prove their performance by using the PSIM tools and experimental test, and the mathematical model of step-up with PVG model will be presented.
[Study on Information Extraction of Clinic Expert Information from Hospital Portals].
Zhang, Yuanpeng; Dong, Jiancheng; Qian, Danmin; Geng, Xingyun; Wu, Huiqun; Wang, Li
2015-12-01
Clinic expert information provides important references for residents in need of hospital care. Usually, such information is hidden in the deep web and cannot be directly indexed by search engines. To extract clinic expert information from the deep web, the first challenge is to make a judgment on forms. This paper proposes a novel method based on a domain model, which is a tree structure constructed by the attributes of search interfaces. With this model, search interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from the returned web pages indexed by search interfaces. To filter the noise information on a web page, a block importance model is proposed. The experiment results indicated that the domain model yielded a precision 10.83% higher than that of the rule-based method, whereas the block importance model yielded an F₁ measure 10.5% higher than that of the XPath method.
NASA Astrophysics Data System (ADS)
Adiana, M. A.; Mazura, M. P.
2011-04-01
Senna alata L. commonly known as candle bush belongs to the family of Fabaceae and the plant has been reported to possess anti-inflammatory, analgesic, laxative and antiplatelet-aggregating activity. In order to develop a rapid and effective analysis method for studying integrally the main constituents in the medicinal materials and their extracts, discriminating the extracts from different extraction process, comparing the categories of chemical constituents in the different extracts and monitoring the qualities of medicinal materials, we applied Fourier transform infrared spectroscopy (FT-IR) associated with second derivative infrared spectroscopy and two-dimensional infrared correlation spectroscopy (2D-IR) to study the main constituents of S. alata and its different extracts (extracted by hexane, dichloromethane, ethyl acetate and methanol in turn). The findings indicated that FT-IR and 2D-IR can provide many holistic variation rules of chemical constituents. Use of the macroscopical fingerprint characters of FT-IR and 2D-IR spectrum can identify the main chemical constituents in medicinal materials and their extracts, but also compare the components differences among similar samples. In a conclusion, FT-IR spectroscopy combined with 2D correlation analysis provides a powerful method for the quality control of traditional medicines.
Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.
Agapito, Giuseppe; Milano, Marianna; Guzzi, Pietro Hiram; Cannataro, Mario
2016-01-01
Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.
A Decision Making Methodology in Support of the Business Rules Lifecycle
NASA Technical Reports Server (NTRS)
Wild, Christopher; Rosca, Daniela
1998-01-01
The business rules that underlie an enterprise emerge as a new category of system requirements that represent decisions about how to run the business, and which are characterized by their business-orientation and their propensity for change. In this report, we introduce a decision making methodology which addresses several aspects of the business rules lifecycle: acquisition, deployment and evolution. We describe a meta-model for representing business rules in terms of an enterprise model, and also a decision support submodel for reasoning about and deriving the rules. The possibility for lifecycle automated assistance is demonstrated in terms of the automatic extraction of business rules from the decision structure. A system based on the metamodel has been implemented, including the extraction algorithm. This is the final report for Daniela Rosca's PhD fellowship. It describes the work we have done over the past year, current research and the list of publications associated with her thesis topic.
Layout-aware text extraction from full-text PDF of scientific articles.
Ramakrishnan, Cartic; Patnia, Abhishek; Hovy, Eduard; Burns, Gully Apc
2012-05-28
The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement. LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/.
Probabilistic combination of static and dynamic gait features for verification
NASA Astrophysics Data System (ADS)
Bazin, Alex I.; Nixon, Mark S.
2005-03-01
This paper describes a novel probabilistic framework for biometric identification and data fusion. Based on intra and inter-class variation extracted from training data, posterior probabilities describing the similarity between two feature vectors may be directly calculated from the data using the logistic function and Bayes rule. Using a large publicly available database we show the two imbalanced gait modalities may be fused using this framework. All fusion methods tested provide an improvement over the best modality, with the weighted sum rule giving the best performance, hence showing that highly imbalanced classifiers may be fused in a probabilistic setting; improving not only the performance, but also generalized application capability.
Extracting the information of coastline shape and its multiple representations
NASA Astrophysics Data System (ADS)
Liu, Ying; Li, Shujun; Tian, Zhen; Chen, Huirong
2007-06-01
According to studying the coastline, a new way of multiple representations is put forward in the paper. That is stimulating human thinking way when they generalized, building the appropriate math model and describing the coastline with graphics, extracting all kinds of the coastline shape information. The coastline automatic generalization will be finished based on the knowledge rules and arithmetic operators. Showing the information of coastline shape by building the curve Douglas binary tree, it can reveal the shape character of coastline not only microcosmically but also macroscopically. Extracting the information of coastline concludes the local characteristic point and its orientation, the curve structure and the topology trait. The curve structure can be divided the single curve and the curve cluster. By confirming the knowledge rules of the coastline generalization, the generalized scale and its shape parameter, the coastline automatic generalization model is established finally. The method of the multiple scale representation of coastline in this paper has some strong points. It is human's thinking mode and can keep the nature character of the curve prototype. The binary tree structure can control the coastline comparability, avoid the self-intersect phenomenon and hold the unanimous topology relationship.
A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents
2011-01-01
Background A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. Methods This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. Results We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. Conclusions Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts. PMID:21489220
A novel methodology for building robust design rules by using design based metrology (DBM)
NASA Astrophysics Data System (ADS)
Lee, Myeongdong; Choi, Seiryung; Choi, Jinwoo; Kim, Jeahyun; Sung, Hyunju; Yeo, Hyunyoung; Shim, Myoungseob; Jin, Gyoyoung; Chung, Eunseung; Roh, Yonghan
2013-03-01
This paper addresses a methodology for building robust design rules by using design based metrology (DBM). Conventional method for building design rules has been using a simulation tool and a simple pattern spider mask. At the early stage of the device, the estimation of simulation tool is poor. And the evaluation of the simple pattern spider mask is rather subjective because it depends on the experiential judgment of an engineer. In this work, we designed a huge number of pattern situations including various 1D and 2D design structures. In order to overcome the difficulties of inspecting many types of patterns, we introduced Design Based Metrology (DBM) of Nano Geometry Research, Inc. And those mass patterns could be inspected at a fast speed with DBM. We also carried out quantitative analysis on PWQ silicon data to estimate process variability. Our methodology demonstrates high speed and accuracy for building design rules. All of test patterns were inspected within a few hours. Mass silicon data were handled with not personal decision but statistical processing. From the results, robust design rules are successfully verified and extracted. Finally we found out that our methodology is appropriate for building robust design rules.
Argumentation Based Joint Learning: A Novel Ensemble Learning Approach
Xu, Junyi; Yao, Li; Li, Le
2015-01-01
Recently, ensemble learning methods have been widely used to improve classification performance in machine learning. In this paper, we present a novel ensemble learning method: argumentation based multi-agent joint learning (AMAJL), which integrates ideas from multi-agent argumentation, ensemble learning, and association rule mining. In AMAJL, argumentation technology is introduced as an ensemble strategy to integrate multiple base classifiers and generate a high performance ensemble classifier. We design an argumentation framework named Arena as a communication platform for knowledge integration. Through argumentation based joint learning, high quality individual knowledge can be extracted, and thus a refined global knowledge base can be generated and used independently for classification. We perform numerous experiments on multiple public datasets using AMAJL and other benchmark methods. The results demonstrate that our method can effectively extract high quality knowledge for ensemble classifier and improve the performance of classification. PMID:25966359
A.I.-based real-time support for high performance aircraft operations
NASA Technical Reports Server (NTRS)
Vidal, J. J.
1985-01-01
Artificial intelligence (AI) based software and hardware concepts are applied to the handling system malfunctions during flight tests. A representation of malfunction procedure logic using Boolean normal forms are presented. The representation facilitates the automation of malfunction procedures and provides easy testing for the embedded rules. It also forms a potential basis for a parallel implementation in logic hardware. The extraction of logic control rules, from dynamic simulation and their adaptive revision after partial failure are examined. It uses a simplified 2-dimensional aircraft model with a controller that adaptively extracts control rules for directional thrust that satisfies a navigational goal without exceeding pre-established position and velocity limits. Failure recovery (rule adjusting) is examined after partial actuator failure. While this experiment was performed with primitive aircraft and mission models, it illustrates an important paradigm and provided complexity extrapolations for the proposed extraction of expertise from simulation, as discussed. The use of relaxation and inexact reasoning in expert systems was also investigated.
Layout-aware text extraction from full-text PDF of scientific articles
2012-01-01
Background The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for our system and identify further areas of improvement. Conclusions LA-PDFText is an open-source tool for accurately extracting text from full-text scientific articles. The release of the system is available at http://code.google.com/p/lapdftext/. PMID:22640904
Ji, Dong Xu; Foong, Kelvin Weng Chiong; Ong, Sim Heng
2013-09-01
Extraction of the mandible from 3D volumetric images is frequently required for surgical planning and evaluation. Image segmentation from MRI is more complex than CT due to lower bony signal-to-noise. An automated method to extract the human mandible body shape from magnetic resonance (MR) images of the head was developed and tested. Anonymous MR images data sets of the head from 12 subjects were subjected to a two-stage rule-constrained region growing approach to derive the shape of the body of the human mandible. An initial thresholding technique was applied followed by a 3D seedless region growing algorithm to detect a large portion of the trabecular bone (TB) regions of the mandible. This stage is followed with a rule-constrained 2D segmentation of each MR axial slice to merge the remaining portions of the TB regions with lower intensity levels. The two-stage approach was replicated to detect the cortical bone (CB) regions of the mandibular body. The TB and CB regions detected from the preceding steps were merged and subjected to a series of morphological processes for completion of the mandibular body region definition. Comparisons of the accuracy of segmentation between the two-stage approach, conventional region growing method, 3D level set method, and manual segmentation were made with Jaccard index, Dice index, and mean surface distance (MSD). The mean accuracy of the proposed method is [Formula: see text] for Jaccard index, [Formula: see text] for Dice index, and [Formula: see text] mm for MSD. The mean accuracy of CRG is [Formula: see text] for Jaccard index, [Formula: see text] for Dice index, and [Formula: see text] mm for MSD. The mean accuracy of the 3D level set method is [Formula: see text] for Jaccard index, [Formula: see text] for Dice index, and [Formula: see text] mm for MSD. The proposed method shows improvement in accuracy over CRG and 3D level set. Accurate segmentation of the body of the human mandible from MR images is achieved with the proposed two-stage rule-constrained seedless region growing approach. The accuracy achieved with the two-stage approach is higher than CRG and 3D level set.
Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei
2016-01-01
Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme. PMID:27362762
Zhu, Hongchun; Cai, Lijie; Liu, Haiying; Huang, Wei
2016-01-01
Multi-scale image segmentation and the selection of optimal segmentation parameters are the key processes in the object-oriented information extraction of high-resolution remote sensing images. The accuracy of remote sensing special subject information depends on this extraction. On the basis of WorldView-2 high-resolution data, the optimal segmentation parameters methodof object-oriented image segmentation and high-resolution image information extraction, the following processes were conducted in this study. Firstly, the best combination of the bands and weights was determined for the information extraction of high-resolution remote sensing image. An improved weighted mean-variance method was proposed andused to calculatethe optimal segmentation scale. Thereafter, the best shape factor parameter and compact factor parameters were computed with the use of the control variables and the combination of the heterogeneity and homogeneity indexes. Different types of image segmentation parameters were obtained according to the surface features. The high-resolution remote sensing images were multi-scale segmented with the optimal segmentation parameters. Ahierarchical network structure was established by setting the information extraction rules to achieve object-oriented information extraction. This study presents an effective and practical method that can explain expert input judgment by reproducible quantitative measurements. Furthermore the results of this procedure may be incorporated into a classification scheme.
NASA Astrophysics Data System (ADS)
Shi, Liehang; Ling, Tonghui; Zhang, Jianguo
2016-03-01
Radiologists currently use a variety of terminologies and standards in most hospitals in China, and even there are multiple terminologies being used for different sections in one department. In this presentation, we introduce a medical semantic comprehension system (MedSCS) to extract semantic information about clinical findings and conclusion from free text radiology reports so that the reports can be classified correctly based on medical terms indexing standards such as Radlex or SONMED-CT. Our system (MedSCS) is based on both rule-based methods and statistics-based methods which improve the performance and the scalability of MedSCS. In order to evaluate the over all of the system and measure the accuracy of the outcomes, we developed computation methods to calculate the parameters of precision rate, recall rate, F-score and exact confidence interval.
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
NASA Astrophysics Data System (ADS)
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Decoding rule search domain in the left inferior frontal gyrus
Babcock, Laura; Vallesi, Antonino
2018-01-01
Traditionally, the left hemisphere has been thought to extract mainly verbal patterns of information, but recent evidence has shown that the left Inferior Frontal Gyrus (IFG) is active during inductive reasoning in both the verbal and spatial domains. We aimed to understand whether the left IFG supports inductive reasoning in a domain-specific or domain-general fashion. To do this we used Multi-Voxel Pattern Analysis to decode the representation of domain during a rule search task. Thirteen participants were asked to extract the rule underlying streams of letters presented in different spatial locations. Each rule was either verbal (letters forming words) or spatial (positions forming geometric figures). Our results show that domain was decodable in the left prefrontal cortex, suggesting that this region represents domain-specific information, rather than processes common to the two domains. A replication study with the same participants tested two years later confirmed these findings, though the individual representations changed, providing evidence for the flexible nature of representations. This study extends our knowledge on the neural basis of goal-directed behaviors and on how information relevant for rule extraction is flexibly mapped in the prefrontal cortex. PMID:29547623
Beta Hebbian Learning as a New Method for Exploratory Projection Pursuit.
Quintián, Héctor; Corchado, Emilio
2017-09-01
In this research, a novel family of learning rules called Beta Hebbian Learning (BHL) is thoroughly investigated to extract information from high-dimensional datasets by projecting the data onto low-dimensional (typically two dimensional) subspaces, improving the existing exploratory methods by providing a clear representation of data's internal structure. BHL applies a family of learning rules derived from the Probability Density Function (PDF) of the residual based on the beta distribution. This family of rules may be called Hebbian in that all use a simple multiplication of the output of the neural network with some function of the residuals after feedback. The derived learning rules can be linked to an adaptive form of Exploratory Projection Pursuit and with artificial distributions, the networks perform as the theory suggests they should: the use of different learning rules derived from different PDFs allows the identification of "interesting" dimensions (as far from the Gaussian distribution as possible) in high-dimensional datasets. This novel algorithm, BHL, has been tested over seven artificial datasets to study the behavior of BHL parameters, and was later applied successfully over four real datasets, comparing its results, in terms of performance, with other well-known Exploratory and projection models such as Maximum Likelihood Hebbian Learning (MLHL), Locally-Linear Embedding (LLE), Curvilinear Component Analysis (CCA), Isomap and Neural Principal Component Analysis (Neural PCA).
Application of a hybrid association rules/decision tree model for drought monitoring
NASA Astrophysics Data System (ADS)
Nourani, Vahid; Molajou, Amir
2017-12-01
The previous researches have shown that the incorporation of the oceanic-atmospheric climate phenomena such as Sea Surface Temperature (SST) into hydro-climatic models could provide important predictive information about hydro-climatic variability. In this paper, the hybrid application of two data mining techniques (decision tree and association rules) was offered to discover affiliation between drought of Tabriz and Kermanshah synoptic stations (located in Iran) and de-trend SSTs of the Black, Mediterranean and Red Seas. Two major steps of the proposed model were the classification of de-trend SST data and selecting the most effective groups and extracting hidden information involved in the data. The techniques of decision tree which can identify the good traits from a data set for the classification purpose were used for classification and selecting the most effective groups and association rules were employed to extract the hidden predictive information from the large observed data. To examine the accuracy of the rules, confidence and Heidke Skill Score (HSS) measures were calculated and compared for different considering lag times. The computed measures confirm reliable performance of the proposed hybrid data mining method to forecast drought and the results show a relative correlation between the Mediterranean, Black and Red Sea de-trend SSTs and drought of Tabriz and Kermanshah synoptic stations so that the confidence between the monthly Standardized Precipitation Index (SPI) values and the de-trend SST of seas is higher than 70 and 80% respectively for Tabriz and Kermanshah synoptic stations.
Mathieson, Luke; Mendes, Alexandre; Marsden, John; Pond, Jeffrey; Moscato, Pablo
2017-01-01
This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686-690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.
Pan, Zhiran; Liang, Hailong; Liang, Chabhufi; Xu, Wen
2015-01-01
A method for qualitative analysis of constituents in Polygonum cuspidatum by ultra-high-pressure liquid chromatography coupled with linear ion trap-Orbitrap mass spectrometry (UHPLC-LTQ-Orbitrap MS) has been established. The methanol extract of Polygonum cuspidatumrn was separated on a Waters UPLC C18 column using acetonitrile-water (containing formic acid) eluting system and detected by LTQ-Orbitrap hybrid mass spectrometer in negative mode. The targeted components were further fragmented in LTQ and high accuracy data were acquired by Orbitrap MS. The summarized fragmentation pathways of typical reference components and a diagnostic fragment ions-searching-based strategy were used for detection and identification of the main phenolic components in Polygonum cuspidatum. Other clues such as nitrogen rule, even electron rule, degree of unsaturation rule and isotopic peak data were included for the structural elucidation as well. The whole analytical procedure was within 10 min and more than 30 components were identified or tentatively identified. This method is helpful for further phytochemical research and quality control on Polygonum cuspidatum and related preparations.
NASA Technical Reports Server (NTRS)
Solomon, V.; Baracos, V.; Sarraf, P.; Goldberg, A. L.
1998-01-01
The rapid loss of muscle mass that accompanies many disease states, such as cancer or sepsis, is primarily a result of increased protein breakdown in muscle, and several observations have suggested an activation of the ubiquitin-proteasome system. Accordingly, in extracts of atrophying muscles from tumor-bearing or septic rats, rates of 125I-ubiquitin conjugation to endogenous proteins were found to be higher than in control extracts. On the other hand, in extracts of muscles from hypothyroid rats, where overall proteolysis is reduced below normal, the conjugation of 125I-ubiquitin to soluble proteins decreased by 50%, and treatment with triiodothyronine (T3) restored ubiquitination to control levels. Surprisingly, the N-end rule pathway, which selectively degrades proteins with basic or large hydrophobic N-terminal residues, was found to be responsible for most of these changes in ubiquitin conjugation. Competitive inhibitors of this pathway that specifically block the ubiquitin ligase, E3alpha, suppressed most of the increased ubiquitin conjugation in the muscle extracts from tumor-bearing and septic rats. These inhibitors also suppressed ubiquitination in normal extracts toward levels in hypothyroid extracts, which showed little E3alpha-dependent ubiquitination. Thus, the inhibitors eliminated most of the differences in ubiquitination under these different pathological conditions. Moreover, 125I-lysozyme, a model N-end rule substrate, was ubiquitinated more rapidly in extracts from tumor-bearing and septic rats, and more slowly in those from hypothyroid rats, than in controls. Thus, the rate of ubiquitin conjugation increases in atrophying muscles, and these hormone- and cytokine-dependent responses are in large part due to activation of the N-end rule pathway.
Solomon, Vered; Baracos, Vickie; Sarraf, Pasha; Goldberg, Alfred L.
1998-01-01
The rapid loss of muscle mass that accompanies many disease states, such as cancer or sepsis, is primarily a result of increased protein breakdown in muscle, and several observations have suggested an activation of the ubiquitin–proteasome system. Accordingly, in extracts of atrophying muscles from tumor-bearing or septic rats, rates of 125I-ubiquitin conjugation to endogenous proteins were found to be higher than in control extracts. On the other hand, in extracts of muscles from hypothyroid rats, where overall proteolysis is reduced below normal, the conjugation of 125I-ubiquitin to soluble proteins decreased by 50%, and treatment with triiodothyronine (T3) restored ubiquitination to control levels. Surprisingly, the N-end rule pathway, which selectively degrades proteins with basic or large hydrophobic N-terminal residues, was found to be responsible for most of these changes in ubiquitin conjugation. Competitive inhibitors of this pathway that specifically block the ubiquitin ligase, E3α, suppressed most of the increased ubiquitin conjugation in the muscle extracts from tumor-bearing and septic rats. These inhibitors also suppressed ubiquitination in normal extracts toward levels in hypothyroid extracts, which showed little E3α-dependent ubiquitination. Thus, the inhibitors eliminated most of the differences in ubiquitination under these different pathological conditions. Moreover, 125I-lysozyme, a model N-end rule substrate, was ubiquitinated more rapidly in extracts from tumor-bearing and septic rats, and more slowly in those from hypothyroid rats, than in controls. Thus, the rate of ubiquitin conjugation increases in atrophying muscles, and these hormone- and cytokine-dependent responses are in large part due to activation of the N-end rule pathway. PMID:9770532
Liu, Zengjian; Tang, Buzhou; Wang, Xiaolong; Chen, Qingcai; Li, Haodi; Bu, Junzhao; Jiang, Jingzhi; Deng, Qiwen; Zhu, Suisong
2016-01-01
Time is an important aspect of information and is very useful for information utilization. The goal of this study was to analyze the challenges of temporal expression (TE) extraction and normalization in Chinese clinical notes by assessing the performance of a rule-based system developed by us on a manually annotated corpus (including 1,778 clinical notes of 281 hospitalized patients). In order to develop system conveniently, we divided TEs into three categories: direct, indirect and uncertain TEs, and designed different rules for each category of them. Evaluation on the independent test set shows that our system achieves an F-score of93.40% on TE extraction, and an accuracy of 92.58% on TE normalization under "exact-match" criterion. Compared with HeidelTime for Chinese newswire text, our system is much better, indicating that it is necessary to develop a specific TE extraction and normalization system for Chinese clinical notes because of domain difference.
2010-01-01
Background Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. Results We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. Conclusions We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch. PMID:20682041
Striatal degeneration impairs language learning: evidence from Huntington's disease.
De Diego-Balaguer, R; Couette, M; Dolbeau, G; Dürr, A; Youssov, K; Bachoud-Lévi, A-C
2008-11-01
Although the role of the striatum in language processing is still largely unclear, a number of recent proposals have outlined its specific contribution. Different studies report evidence converging to a picture where the striatum may be involved in those aspects of rule-application requiring non-automatized behaviour. This is the main characteristic of the earliest phases of language acquisition that require the online detection of distant dependencies and the creation of syntactic categories by means of rule learning. Learning of sequences and categorization processes in non-language domains has been known to require striatal recruitment. Thus, we hypothesized that the striatum should play a prominent role in the extraction of rules in learning a language. We studied 13 pre-symptomatic gene-carriers and 22 early stage patients of Huntington's disease (pre-HD), both characterized by a progressive degeneration of the striatum and 21 late stage patients Huntington's disease (18 stage II, two stage III and one stage IV) where cortical degeneration accompanies striatal degeneration. When presented with a simplified artificial language where words and rules could be extracted, early stage Huntington's disease patients (stage I) were impaired in the learning test, demonstrating a greater impairment in rule than word learning compared to the 20 age- and education-matched controls. Huntington's disease patients at later stages were impaired both on word and rule learning. While spared in their overall performance, gene-carriers having learned a set of abstract artificial language rules were then impaired in the transfer of those rules to similar artificial language structures. The correlation analyses among several neuropsychological tests assessing executive function showed that rule learning correlated with tests requiring working memory and attentional control, while word learning correlated with a test involving episodic memory. These learning impairments significantly correlated with the bicaudate ratio. The overall results support striatal involvement in rule extraction from speech and suggest that language acquisition requires several aspects of memory and executive functions for word and rule learning.
The N-end rule pathway catalyzes a major fraction of the protein degradation in skeletal muscle
NASA Technical Reports Server (NTRS)
Solomon, V.; Lecker, S. H.; Goldberg, A. L.
1998-01-01
In skeletal muscle, overall protein degradation involves the ubiquitin-proteasome system. One property of a protein that leads to rapid ubiquitin-dependent degradation is the presence of a basic, acidic, or bulky hydrophobic residue at its N terminus. However, in normal cells, substrates for this N-end rule pathway, which involves ubiquitin carrier protein (E2) E214k and ubiquitin-protein ligase (E3) E3alpha, have remained unclear. Surprisingly, in soluble extracts of rabbit muscle, we found that competitive inhibitors of E3alpha markedly inhibited the 125I-ubiquitin conjugation and ATP-dependent degradation of endogenous proteins. These inhibitors appear to selectively inhibit E3alpha, since they blocked degradation of 125I-lysozyme, a model N-end rule substrate, but did not affect the degradation of proteins whose ubiquitination involved other E3s. The addition of several E2s or E3alpha to the muscle extracts stimulated overall proteolysis and ubiquitination, but only the stimulation by E3alpha or E214k was sensitive to these inhibitors. A similar general inhibition of ubiquitin conjugation to endogenous proteins was observed with a dominant negative inhibitor of E214k. Certain substrates of the N-end rule pathway are degraded after their tRNA-dependent arginylation. We found that adding RNase A to muscle extracts reduced the ATP-dependent proteolysis of endogenous proteins, and supplying tRNA partially restored this process. Finally, although in muscle extracts the N-end rule pathway catalyzes most ubiquitin conjugation, it makes only a minor contribution to overall protein ubiquitination in HeLa cell extracts.
Quantitative knowledge acquisition for expert systems
NASA Technical Reports Server (NTRS)
Belkin, Brenda L.; Stengel, Robert F.
1991-01-01
A common problem in the design of expert systems is the definition of rules from data obtained in system operation or simulation. While it is relatively easy to collect data and to log the comments of human operators engaged in experiments, generalizing such information to a set of rules has not previously been a direct task. A statistical method is presented for generating rule bases from numerical data, motivated by an example based on aircraft navigation with multiple sensors. The specific objective is to design an expert system that selects a satisfactory suite of measurements from a dissimilar, redundant set, given an arbitrary navigation geometry and possible sensor failures. The systematic development is described of a Navigation Sensor Management (NSM) Expert System from Kalman Filter convariance data. The method invokes two statistical techniques: Analysis of Variance (ANOVA) and the ID3 Algorithm. The ANOVA technique indicates whether variations of problem parameters give statistically different covariance results, and the ID3 algorithms identifies the relationships between the problem parameters using probabilistic knowledge extracted from a simulation example set. Both are detailed.
Innovative Use of Quality Management Methods for Product Improvement
NASA Astrophysics Data System (ADS)
Midor, Katarzyna; Žarnovský, Jozef
2016-12-01
Organisations constantly look for new, innovative solutions and methods which could be used to improve their efficiency and increase the quality of their products. Identifying the causes for returns is an important issue for modern companies, as returns are the cause for the increase in production costs and, most importantly, the loss of credibility in the eyes of the client. Therefore, for the company to be able to sustain or strengthen its position on the market, it has to follow the rules of quality management. Especially important is the rule of constant improvement. This rule is primarily connected with preventing errors and defects from occurring at all the stages of the production process. To achieve that, one must, among other things, use quality management tools. The article presents an analysis of causes for returns of a vibrating screen produced by a company which manufactures machinery and equipment for the extractive industry, using quality management tools such as the Ishikawa diagram and Pareto analysis. The analysis allowed for the identification of the causes of client returns which could not be previously identified, and proposing solutions for them.
ZK DrugResist 2.0: A TextMiner to extract semantic relations of drug resistance from PubMed.
Khalid, Zoya; Sezerman, Osman Ugur
2017-05-01
Extracting useful knowledge from an unstructured textual data is a challenging task for biologists, since biomedical literature is growing exponentially on a daily basis. Building an automated method for such tasks is gaining much attention of researchers. ZK DrugResist is an online tool that automatically extracts mutations and expression changes associated with drug resistance from PubMed. In this study we have extended our tool to include semantic relations extracted from biomedical text covering drug resistance and established a server including both of these features. Our system was tested for three relations, Resistance (R), Intermediate (I) and Susceptible (S) by applying hybrid feature set. From the last few decades the focus has changed to hybrid approaches as it provides better results. In our case this approach combines rule-based methods with machine learning techniques. The results showed 97.67% accuracy with 96% precision, recall and F-measure. The results have outperformed the previously existing relation extraction systems thus can facilitate computational analysis of drug resistance against complex diseases and further can be implemented on other areas of biomedicine. Copyright © 2017 Elsevier Inc. All rights reserved.
Sannino, Giovanna; De Falco, Ivanoe; De Pietro, Giuseppe
2014-06-01
Real-time Obstructive Sleep Apnea (OSA) episode detection and monitoring are important for society in terms of an improvement in the health of the general population and of a reduction in mortality and healthcare costs. Currently, to diagnose OSA patients undergo PolySomnoGraphy (PSG), a complicated and invasive test to be performed in a specialized center involving many sensors and wires. Accordingly, each patient is required to stay in the same position throughout the duration of one night, thus restricting their movements. This paper proposes an easy, cheap, and portable approach for the monitoring of patients with OSA, which collects single-channel ElectroCardioGram (ECG) data only. It is easy to perform from the patient's point of view because only one wearable sensor is required, so the patient is not restricted to keeping the same position all night long, and the detection and monitoring can be carried out in any place through the use of a mobile device. Our approach is based on the automatic extraction, from a database containing information about the monitored patient, of explicit knowledge in the form of a set of IF…THEN rules containing typical parameters derived from Heart Rate Variability (HRV) analysis. The extraction is carried out off-line by means of a Differential Evolution algorithm. This set of rules can then be exploited in the real-time mobile monitoring system developed at our Laboratory: the ECG data is gathered by a wearable sensor and sent to a mobile device, where it is processed in real time. Subsequently, HRV-related parameters are computed from this data, and, if their values activate some of the rules describing the occurrence of OSA, an alarm is automatically produced. This approach has been tested on a well-known literature database of OSA patients. The numerical results show its effectiveness in terms of accuracy, sensitivity, and specificity, and the achieved sets of rules evidence the user-friendliness of the approach. Furthermore, the method is compared against other well known classifiers, and its discrimination ability is shown to be higher. Copyright © 2014 Elsevier Inc. All rights reserved.
Extracting TSK-type Neuro-Fuzzy model using the Hunting search algorithm
NASA Astrophysics Data System (ADS)
Bouzaida, Sana; Sakly, Anis; M'Sahli, Faouzi
2014-01-01
This paper proposes a Takagi-Sugeno-Kang (TSK) type Neuro-Fuzzy model tuned by a novel metaheuristic optimization algorithm called Hunting Search (HuS). The HuS algorithm is derived based on a model of group hunting of animals such as lions, wolves, and dolphins when looking for a prey. In this study, the structure and parameters of the fuzzy model are encoded into a particle. Thus, the optimal structure and parameters are achieved simultaneously. The proposed method was demonstrated through modeling and control problems, and the results have been compared with other optimization techniques. The comparisons indicate that the proposed method represents a powerful search approach and an effective optimization technique as it can extract the accurate TSK fuzzy model with an appropriate number of rules.
Recognition of Handwritten Arabic words using a neuro-fuzzy network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boukharouba, Abdelhak; Bennia, Abdelhak
We present a new method for the recognition of handwritten Arabic words based on neuro-fuzzy hybrid network. As a first step, connected components (CCs) of black pixels are detected. Then the system determines which CCs are sub-words and which are stress marks. The stress marks are then isolated and identified separately and the sub-words are segmented into graphemes. Each grapheme is described by topological and statistical features. Fuzzy rules are extracted from training examples by a hybrid learning scheme comprised of two phases: rule generation phase from data using a fuzzy c-means, and rule parameter tuning phase using gradient descentmore » learning. After learning, the network encodes in its topology the essential design parameters of a fuzzy inference system.The contribution of this technique is shown through the significant tests performed on a handwritten Arabic words database.« less
Takahashi, Hiro; Aoyagi, Kazuhiko; Nakanishi, Yukihiro; Sasaki, Hiroki; Yoshida, Teruhiko; Honda, Hiroyuki
2006-07-01
Esophageal cancer is a well-known cancer with poorer prognosis than other cancers. An optimal and individualized treatment protocol based on accurate diagnosis is urgently needed to improve the treatment of cancer patients. For this purpose, it is important to develop a sophisticated algorithm that can manage a large amount of data, such as gene expression data from DNA microarrays, for optimal and individualized diagnosis. Marker gene selection is essential in the analysis of gene expression data. We have already developed a combination method of the use of the projective adaptive resonance theory and that of a boosted fuzzy classifier with the SWEEP operator denoted PART-BFCS. This method is superior to other methods, and has four features, namely fast calculation, accurate prediction, reliable prediction, and rule extraction. In this study, we applied this method to analyze microarray data obtained from esophageal cancer patients. A combination method of PART-BFCS and the U-test was also investigated. It was necessary to use a specific type of BFCS, namely, BFCS-1,2, because the esophageal cancer data were very complexity. PART-BFCS and PART-BFCS with the U-test models showed higher performances than two conventional methods, namely, k-nearest neighbor (kNN) and weighted voting (WV). The genes including CDK6 could be found by our methods and excellent IF-THEN rules could be extracted. The genes selected in this study have a high potential as new diagnosis markers for esophageal cancer. These results indicate that the new methods can be used in marker gene selection for the diagnosis of cancer patients.
Songbirds and humans apply different strategies in a sound sequence discrimination task.
Seki, Yoshimasa; Suzuki, Kenta; Osawa, Ayumi M; Okanoya, Kazuo
2013-01-01
The abilities of animals and humans to extract rules from sound sequences have previously been compared using observation of spontaneous responses and conditioning techniques. However, the results were inconsistently interpreted across studies possibly due to methodological and/or species differences. Therefore, we examined the strategies for discrimination of sound sequences in Bengalese finches and humans using the same protocol. Birds were trained on a GO/NOGO task to discriminate between two categories of sound stimulus generated based on an "AAB" or "ABB" rule. The sound elements used were taken from a variety of male (M) and female (F) calls, such that the sequences could be represented as MMF and MFF. In test sessions, FFM and FMM sequences, which were never presented in the training sessions but conformed to the rule, were presented as probe stimuli. The results suggested two discriminative strategies were being applied: (1) memorizing sound patterns of either GO or NOGO stimuli and generating the appropriate responses for only those sounds; and (2) using the repeated element as a cue. There was no evidence that the birds successfully extracted the abstract rule (i.e., AAB and ABB); MMF-GO subjects did not produce a GO response for FFM and vice versa. Next we examined whether those strategies were also applicable for human participants on the same task. The results and questionnaires revealed that participants extracted the abstract rule, and most of them employed it to discriminate the sequences. This strategy was never observed in bird subjects, although some participants used strategies similar to the birds when responding to the probe stimuli. Our results showed that the human participants applied the abstract rule in the task even without instruction but Bengalese finches did not, thereby reconfirming that humans have to extract abstract rules from sound sequences that is distinct from non-human animals.
Kim, Heejun; Bian, Jiantao; Mostafa, Javed; Jonnalagadda, Siddhartha; Del Fiol, Guilherme
2016-01-01
Motivation: Clinicians need up-to-date evidence from high quality clinical trials to support clinical decisions. However, applying evidence from the primary literature requires significant effort. Objective: To examine the feasibility of automatically extracting key clinical trial information from ClinicalTrials.gov. Methods: We assessed the coverage of ClinicalTrials.gov for high quality clinical studies that are indexed in PubMed. Using 140 random ClinicalTrials.gov records, we developed and tested rules for the automatic extraction of key information. Results: The rate of high quality clinical trial registration in ClinicalTrials.gov increased from 0.2% in 2005 to 17% in 2015. Trials reporting results increased from 3% in 2005 to 19% in 2015. The accuracy of the automatic extraction algorithm for 10 trial attributes was 90% on average. Future research is needed to improve the algorithm accuracy and to design information displays to optimally present trial information to clinicians.
Highly scalable and robust rule learner: performance evaluation and comparison.
Kurgan, Lukasz A; Cios, Krzysztof J; Dick, Scott
2006-02-01
Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.
Yoo, Sung-Hoon; Oh, Sung-Kwun; Pedrycz, Witold
2015-09-01
In this study, we propose a hybrid method of face recognition by using face region information extracted from the detected face region. In the preprocessing part, we develop a hybrid approach based on the Active Shape Model (ASM) and the Principal Component Analysis (PCA) algorithm. At this step, we use a CCD (Charge Coupled Device) camera to acquire a facial image by using AdaBoost and then Histogram Equalization (HE) is employed to improve the quality of the image. ASM extracts the face contour and image shape to produce a personal profile. Then we use a PCA method to reduce dimensionality of face images. In the recognition part, we consider the improved Radial Basis Function Neural Networks (RBF NNs) to identify a unique pattern associated with each person. The proposed RBF NN architecture consists of three functional modules realizing the condition phase, the conclusion phase, and the inference phase completed with the help of fuzzy rules coming in the standard 'if-then' format. In the formation of the condition part of the fuzzy rules, the input space is partitioned with the use of Fuzzy C-Means (FCM) clustering. In the conclusion part of the fuzzy rules, the connections (weights) of the RBF NNs are represented by four kinds of polynomials such as constant, linear, quadratic, and reduced quadratic. The values of the coefficients are determined by running a gradient descent method. The output of the RBF NNs model is obtained by running a fuzzy inference method. The essential design parameters of the network (including learning rate, momentum coefficient and fuzzification coefficient used by the FCM) are optimized by means of Differential Evolution (DE). The proposed P-RBF NNs (Polynomial based RBF NNs) are applied to facial recognition and its performance is quantified from the viewpoint of the output performance and recognition rate. Copyright © 2015 Elsevier Ltd. All rights reserved.
Network-Based Method for Identifying Co-Regeneration Genes in Bone, Dentin, Nerve and Vessel Tissues
Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Cai, Yu-Dong
2017-01-01
Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein–protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method. PMID:28974058
Chen, Lei; Pan, Hongying; Zhang, Yu-Hang; Feng, Kaiyan; Kong, XiangYin; Huang, Tao; Cai, Yu-Dong
2017-10-02
Bone and dental diseases are serious public health problems. Most current clinical treatments for these diseases can produce side effects. Regeneration is a promising therapy for bone and dental diseases, yielding natural tissue recovery with few side effects. Because soft tissues inside the bone and dentin are densely populated with nerves and vessels, the study of bone and dentin regeneration should also consider the co-regeneration of nerves and vessels. In this study, a network-based method to identify co-regeneration genes for bone, dentin, nerve and vessel was constructed based on an extensive network of protein-protein interactions. Three procedures were applied in the network-based method. The first procedure, searching, sought the shortest paths connecting regeneration genes of one tissue type with regeneration genes of other tissues, thereby extracting possible co-regeneration genes. The second procedure, testing, employed a permutation test to evaluate whether possible genes were false discoveries; these genes were excluded by the testing procedure. The last procedure, screening, employed two rules, the betweenness ratio rule and interaction score rule, to select the most essential genes. A total of seventeen genes were inferred by the method, which were deemed to contribute to co-regeneration of at least two tissues. All these seventeen genes were extensively discussed to validate the utility of the method.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-10-17
... natural gas. 211112 Natural gas liquid extraction facilities. Petrochemical Production 32511 Ethylene.... Suppliers of Natural Gas and NGLs 221210 Natural gas distribution facilities. 211112 Natural gas liquid... Gas Reporting Rule, which are provided in the Special Rules Governing Certain Information Obtained...
NASA Astrophysics Data System (ADS)
Wang, H.; Ning, X.; Zhang, H.; Liu, Y.; Yu, F.
2018-04-01
Urban boundary is an important indicator for urban sprawl analysis. However, methods of urban boundary extraction were inconsistent, and construction land or urban impervious surfaces was usually used to represent urban areas with coarse-resolution images, resulting in lower precision and incomparable urban boundary products. To solve above problems, a semi-automatic method of urban boundary extraction was proposed by using high-resolution image and geographic information data. Urban landscape and form characteristics, geographical knowledge were combined to generate a series of standardized rules for urban boundary extraction. Urban boundaries of China's 31 provincial capitals in year 2000, 2005, 2010 and 2015 were extracted with above-mentioned method. Compared with other two open urban boundary products, accuracy of urban boundary in this study was the highest. Urban boundary, together with other thematic data, were integrated to measure and analyse urban sprawl. Results showed that China's provincial capitals had undergone a rapid urbanization from year 2000 to 2015, with the area change from 6520 square kilometres to 12398 square kilometres. Urban area of provincial capital had a remarkable region difference and a high degree of concentration. Urban land became more intensive in general. Urban sprawl rate showed inharmonious with population growth rate. About sixty percent of the new urban areas came from cultivated land. The paper provided a consistent method of urban boundary extraction and urban sprawl measurement using high-resolution remote sensing images. The result of urban sprawl of China's provincial capital provided valuable urbanization information for government and public.
Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia
2013-01-01
Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008
Development of an evolutionary fuzzy expert system for estimating future behavior of stock price
NASA Astrophysics Data System (ADS)
Mehmanpazir, Farhad; Asadi, Shahrokh
2017-03-01
The stock market has always been an attractive area for researchers since no method has been found yet to predict the stock price behavior precisely. Due to its high rate of uncertainty and volatility, it carries a higher risk than any other investment area, thus the stock price behavior is difficult to simulation. This paper presents a "data mining-based evolutionary fuzzy expert system" (DEFES) approach to estimate the behavior of stock price. This tool is developed in seven-stage architecture. Data mining is used in three stages to reduce the complexity of the whole data space. The first stage, noise filtering, is used to make our raw data clean and smooth. Variable selection is second stage; we use stepwise regression analysis to choose the key variables been considered in the model. In the third stage, K-means is used to divide the data into sub-populations to decrease the effects of noise and rebate complexity of the patterns. At next stage, extraction of Mamdani type fuzzy rule-based system will be carried out for each cluster by means of genetic algorithm and evolutionary strategy. In the fifth stage, we use binary genetic algorithm to rule filtering to remove the redundant rules in order to solve over learning phenomenon. In the sixth stage, we utilize the genetic tuning process to slightly adjust the shape of the membership functions. Last stage is the testing performance of tool and adjusts parameters. This is the first study on using an approximate fuzzy rule base system and evolutionary strategy with the ability of extracting the whole knowledge base of fuzzy expert system for stock price forecasting problems. The superiority and applicability of DEFES are shown for International Business Machines Corporation and compared the outcome with the results of the other methods. Results with MAPE metric and Wilcoxon signed ranks test indicate that DEFES provides more accuracy and outperforms all previous methods, so it can be considered as a superior tool for stock price forecasting problems.
[Object-oriented aquatic vegetation extracting approach based on visible vegetation indices.
Jing, Ran; Deng, Lei; Zhao, Wen Ji; Gong, Zhao Ning
2016-05-01
Using the estimation of scale parameters (ESP) image segmentation tool to determine the ideal image segmentation scale, the optimal segmented image was created by the multi-scale segmentation method. Based on the visible vegetation indices derived from mini-UAV imaging data, we chose a set of optimal vegetation indices from a series of visible vegetation indices, and built up a decision tree rule. A membership function was used to automatically classify the study area and an aquatic vegetation map was generated. The results showed the overall accuracy of image classification using the supervised classification was 53.7%, and the overall accuracy of object-oriented image analysis (OBIA) was 91.7%. Compared with pixel-based supervised classification method, the OBIA method improved significantly the image classification result and further increased the accuracy of extracting the aquatic vegetation. The Kappa value of supervised classification was 0.4, and the Kappa value based OBIA was 0.9. The experimental results demonstrated that using visible vegetation indices derived from the mini-UAV data and OBIA method extracting the aquatic vegetation developed in this study was feasible and could be applied in other physically similar areas.
Fusion of multi-spectral and panchromatic images based on 2D-PWVD and SSIM
NASA Astrophysics Data System (ADS)
Tan, Dongjie; Liu, Yi; Hou, Ruonan; Xue, Bindang
2016-03-01
A combined method using 2D pseudo Wigner-Ville distribution (2D-PWVD) and structural similarity(SSIM) index is proposed for fusion of low resolution multi-spectral (MS) image and high resolution panchromatic (PAN) image. First, the intensity component of multi-spectral image is extracted with generalized IHS transform. Then, the spectrum diagrams of the intensity components of multi-spectral image and panchromatic image are obtained with 2D-PWVD. Different fusion rules are designed for different frequency information of the spectrum diagrams. SSIM index is used to evaluate the high frequency information of the spectrum diagrams for assigning the weights in the fusion processing adaptively. After the new spectrum diagram is achieved according to the fusion rule, the final fusion image can be obtained by inverse 2D-PWVD and inverse GIHS transform. Experimental results show that, the proposed method can obtain high quality fusion images.
Efficiency measurement of health care organizations: What models are used?
Jaafaripooyan, Ebrahim; Emamgholipour, Sara; Raei, Behzad
2017-01-01
Background: Literature abounds with various techniques for efficiency measurement of health care organizations (HCOs), which should be used cautiously and appropriately. The present study aimed at discovering the rules regulating the interplay among the number of inputs, outputs, and decision- making units (DMUs) and identifying all methods used for the measurement of Iranian HCOs and critically appraising all DEA studies on Iranian HCOs in their application of such rules. Methods: The present study employed a systematic search of all studies related to efficiency measurement of Iranian HCOs. A search was conducted in different databases such as PubMed and Scopus between 2001 and 2015 to identify the studies related to the measurement in health care. The retrieved studies passed through a multi-stage (title, abstract, body) filtering process. Data extraction table for each study was completed and included method, number of inputs and outputs, DMUs, and their efficiency score. Results: Various methods were found for efficiency measurement. Overall, 122 studies were retrieved, of which 73 had exclusively employed DEA technique for measuring the efficiency of HCOs in Iran, and 23 with hybrid models (including DEA). Only 6 studies had explicitly used the rules of thumb. Conclusion: The number of inputs, outputs, and DMUs should be cautiously selected in DEA like techniques, as their proportionality can directly affect the discriminatory power of the technique. The given literature seemed to be, to a large extent, unsuccessful in attending to such proportionality. This study collected a list of key rules (of thumb) on the interplay of inputs, outputs, and DMUs, which could be considered by most researchers keen to apply DEA technique.
Automatic extraction of property norm-like data from large text corpora.
Kelly, Colin; Devereux, Barry; Korhonen, Anna
2014-01-01
Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.
Barbosa, Jocelyn; Lee, Kyubum; Lee, Sunwon; Lodhi, Bilal; Cho, Jae-Gu; Seo, Woo-Keun; Kang, Jaewoo
2016-03-12
Facial palsy or paralysis (FP) is a symptom that loses voluntary muscles movement in one side of the human face, which could be very devastating in the part of the patients. Traditional methods are solely dependent to clinician's judgment and therefore time consuming and subjective in nature. Hence, a quantitative assessment system becomes apparently invaluable for physicians to begin the rehabilitation process; and to produce a reliable and robust method is challenging and still underway. We introduce a novel approach for a quantitative assessment of facial paralysis that tackles classification problem for FP type and degree of severity. Specifically, a novel method of quantitative assessment is presented: an algorithm that extracts the human iris and detects facial landmarks; and a hybrid approach combining the rule-based and machine learning algorithm to analyze and prognosticate facial paralysis using the captured images. A method combining the optimized Daugman's algorithm and Localized Active Contour (LAC) model is proposed to efficiently extract the iris and facial landmark or key points. To improve the performance of LAC, appropriate parameters of initial evolving curve for facial features' segmentation are automatically selected. The symmetry score is measured by the ratio between features extracted from the two sides of the face. Hybrid classifiers (i.e. rule-based with regularized logistic regression) were employed for discriminating healthy and unhealthy subjects, FP type classification, and for facial paralysis grading based on House-Brackmann (H-B) scale. Quantitative analysis was performed to evaluate the performance of the proposed approach. Experiments show that the proposed method demonstrates its efficiency. Facial movement feature extraction on facial images based on iris segmentation and LAC-based key point detection along with a hybrid classifier provides a more efficient way of addressing classification problem on facial palsy type and degree of severity. Combining iris segmentation and key point-based method has several merits that are essential for our real application. Aside from the facial key points, iris segmentation provides significant contribution as it describes the changes of the iris exposure while performing some facial expressions. It reveals the significant difference between the healthy side and the severe palsy side when raising eyebrows with both eyes directed upward, and can model the typical changes in the iris region.
[Methods of artificial intelligence: a new trend in pharmacy].
Dohnal, V; Kuca, K; Jun, D
2005-07-01
Artificial neural networks (ANN) and genetic algorithms are one group of methods called artificial intelligence. The application of ANN on pharmaceutical data can lead to an understanding of the inner structure of data and a possibility to build a model (adaptation). In addition, for certain cases it is possible to extract rules from data. The adapted ANN is prepared for the prediction of properties of compounds which were not used in the adaptation phase. The applications of ANN have great potential in pharmaceutical industry and in the interpretation of analytical, pharmacokinetic or toxicological data.
Automated detection of pain from facial expressions: a rule-based approach using AAM
NASA Astrophysics Data System (ADS)
Chen, Zhanli; Ansari, Rashid; Wilkie, Diana J.
2012-02-01
In this paper, we examine the problem of using video analysis to assess pain, an important problem especially for critically ill, non-communicative patients, and people with dementia. We propose and evaluate an automated method to detect the presence of pain manifested in patient videos using a unique and large collection of cancer patient videos captured in patient homes. The method is based on detecting pain-related facial action units defined in the Facial Action Coding System (FACS) that is widely used for objective assessment in pain analysis. In our research, a person-specific Active Appearance Model (AAM) based on Project-Out Inverse Compositional Method is trained for each patient individually for the modeling purpose. A flexible representation of the shape model is used in a rule-based method that is better suited than the more commonly used classifier-based methods for application to the cancer patient videos in which pain-related facial actions occur infrequently and more subtly. The rule-based method relies on the feature points that provide facial action cues and is extracted from the shape vertices of AAM, which have a natural correspondence to face muscular movement. In this paper, we investigate the detection of a commonly used set of pain-related action units in both the upper and lower face. Our detection results show good agreement with the results obtained by three trained FACS coders who independently reviewed and scored the action units in the cancer patient videos.
Extracting sets of chemical substructures and protein domains governing drug-target interactions.
Yamanishi, Yoshihiro; Pauwels, Edouard; Saigo, Hiroto; Stoven, Véronique
2011-05-23
The identification of rules governing molecular recognition between drug chemical substructures and protein functional sites is a challenging issue at many stages of the drug development process. In this paper we develop a novel method to extract sets of drug chemical substructures and protein domains that govern drug-target interactions on a genome-wide scale. This is made possible using sparse canonical correspondence analysis (SCCA) for analyzing drug substructure profiles and protein domain profiles simultaneously. The method does not depend on the availability of protein 3D structures. From a data set of known drug-target interactions including enzymes, ion channels, G protein-coupled receptors, and nuclear receptors, we extract a set of chemical substructures shared by drugs able to bind to a set of protein domains. These two sets of extracted chemical substructures and protein domains form components that can be further exploited in a drug discovery process. This approach successfully clusters protein domains that may be evolutionary unrelated but that bind a common set of chemical substructures. As shown in several examples, it can also be very helpful for predicting new protein-ligand interactions and addressing the problem of ligand specificity. The proposed method constitutes a contribution to the recent field of chemogenomics that aims to connect the chemical space with the biological space.
Self-Supervised Chinese Ontology Learning from Online Encyclopedias
Shao, Zhiqing; Ruan, Tong
2014-01-01
Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO. PMID:24715819
Method for Face-Emotion Retrieval Using A Cartoon Emotional Expression Approach
NASA Astrophysics Data System (ADS)
Kostov, Vlaho; Yanagisawa, Hideyoshi; Johansson, Martin; Fukuda, Shuichi
A simple method for extracting emotion from a human face, as a form of non-verbal communication, was developed to cope with and optimize mobile communication in a globalized and diversified society. A cartoon face based model was developed and used to evaluate emotional content of real faces. After a pilot survey, basic rules were defined and student subjects were asked to express emotion using the cartoon face. Their face samples were then analyzed using principal component analysis and the Mahalanobis distance method. Feature parameters considered as having relations with emotions were extracted and new cartoon faces (based on these parameters) were generated. The subjects evaluated emotion of these cartoon faces again and we confirmed these parameters were suitable. To confirm how these parameters could be applied to real faces, we asked subjects to express the same emotions which were then captured electronically. Simple image processing techniques were also developed to extract these features from real faces and we then compared them with the cartoon face parameters. It is demonstrated via the cartoon face that we are able to express the emotions from very small amounts of information. As a result, real and cartoon faces correspond to each other. It is also shown that emotion could be extracted from still and dynamic real face images using these cartoon-based features.
Self-supervised Chinese ontology learning from online encyclopedias.
Hu, Fanghuai; Shao, Zhiqing; Ruan, Tong
2014-01-01
Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.
Seera, Manjeevan; Lim, Chee Peng; Ishak, Dahaman; Singh, Harapajan
2012-01-01
In this paper, a novel approach to detect and classify comprehensive fault conditions of induction motors using a hybrid fuzzy min-max (FMM) neural network and classification and regression tree (CART) is proposed. The hybrid model, known as FMM-CART, exploits the advantages of both FMM and CART for undertaking data classification and rule extraction problems. A series of real experiments is conducted, whereby the motor current signature analysis method is applied to form a database comprising stator current signatures under different motor conditions. The signal harmonics from the power spectral density are extracted as discriminative input features for fault detection and classification with FMM-CART. A comprehensive list of induction motor fault conditions, viz., broken rotor bars, unbalanced voltages, stator winding faults, and eccentricity problems, has been successfully classified using FMM-CART with good accuracy rates. The results are comparable, if not better, than those reported in the literature. Useful explanatory rules in the form of a decision tree are also elicited from FMM-CART to analyze and understand different fault conditions of induction motors.
The information extraction of Gannan citrus orchard based on the GF-1 remote sensing image
NASA Astrophysics Data System (ADS)
Wang, S.; Chen, Y. L.
2017-02-01
The production of Gannan oranges is the largest in China, which occupied an important part in the world. The extraction of citrus orchard quickly and effectively has important significance for fruit pathogen defense, fruit production and industrial planning. The traditional spectra extraction method of citrus orchard based on pixel has a lower classification accuracy, difficult to avoid the “pepper phenomenon”. In the influence of noise, the phenomenon that different spectrums of objects have the same spectrum is graveness. Taking Xunwu County citrus fruit planting area of Ganzhou as the research object, aiming at the disadvantage of the lower accuracy of the traditional method based on image element classification method, a decision tree classification method based on object-oriented rule set is proposed. Firstly, multi-scale segmentation is performed on the GF-1 remote sensing image data of the study area. Subsequently the sample objects are selected for statistical analysis of spectral features and geometric features. Finally, combined with the concept of decision tree classification, a variety of empirical values of single band threshold, NDVI, band combination and object geometry characteristics are used hierarchically to execute the information extraction of the research area, and multi-scale segmentation and hierarchical decision tree classification is implemented. The classification results are verified with the confusion matrix, and the overall Kappa index is 87.91%.
Layout pattern analysis using the Voronoi diagram of line segments
NASA Astrophysics Data System (ADS)
Dey, Sandeep Kumar; Cheilaris, Panagiotis; Gabrani, Maria; Papadopoulou, Evanthia
2016-01-01
Early identification of problematic patterns in very large scale integration (VLSI) designs is of great value as the lithographic simulation tools face significant timing challenges. To reduce the processing time, such a tool selects only a fraction of possible patterns which have a probable area of failure, with the risk of missing some problematic patterns. We introduce a fast method to automatically extract patterns based on their structure and context, using the Voronoi diagram of line-segments as derived from the edges of VLSI design shapes. Designers put line segments around the problematic locations in patterns called "gauges," along which the critical distance is measured. The gauge center is the midpoint of a gauge. We first use the Voronoi diagram of VLSI shapes to identify possible problematic locations, represented as gauge centers. Then we use the derived locations to extract windows containing the problematic patterns from the design layout. The problematic locations are prioritized by the shape and proximity information of the design polygons. We perform experiments for pattern selection in a portion of a 22-nm random logic design layout. The design layout had 38,584 design polygons (consisting of 199,946 line segments) on layer Mx, and 7079 markers generated by an optical rule checker (ORC) tool. The optical rules specify requirements for printing circuits with minimum dimension. Markers are the locations of some optical rule violations in the layout. We verify our approach by comparing the coverage of our extracted patterns to the ORC-generated markers. We further derive a similarity measure between patterns and between layouts. The similarity measure helps to identify a set of representative gauges that reduces the number of patterns for analysis.
Direct Determinations of the πNN Coupling Constants
NASA Astrophysics Data System (ADS)
Ericson, T. E. O.; Loiseau, B.
1998-11-01
A novel extrapolation method has been used to deduce directly the charged πN N coupling constant from backward np differential scattering cross sections. The extracted value, g2c = 14.52(0.26) is higher than the indirectly deduced values obtained in nucleon-nucleon energy-dependent partial-wave analyses. Our preliminary direct value from a reanalysis of the GMO sum-rule points to an intermediate value of g2c about 13.97(30).
NASA Astrophysics Data System (ADS)
Chang, Hsun-Ming; Fan, Kai-Lin; Charnas, Adam; Ye, Peide D.; Lin, Yu-Ming; Wu, Chih-I.; Wu, Chao-Hsin
2018-04-01
Compared to graphene and MoS2, studies on metal contacts to black phosphorus (BP) transistors are still immature. In this work, we present the experimental analysis of titanium contacts on BP based upon the theory of thermionic emssion. The Schottky barrier height (SBH) is extracted by thermionic emission methods to analyze the properties of Ti-BP contact. To examine the results, the band gap of BP is extracted followed by theoretical band alignment by Schottky-Mott rule. However, an underestimated SBH is found due to the hysteresis in electrical results. Hence, a modified SBH extraction for contact resistance that avoids the effects of hysteresis is proposed and demonstrated, showing a more accurate SBH that agrees well with theoretical value and results of transmission electron microscopy and energy-dispersive x-ray spectroscopy.
Developing a Learning Progression for Number Sense Based on the Rule Space Model in China
ERIC Educational Resources Information Center
Chen, Fu; Yan, Yue; Xin, Tao
2017-01-01
The current study focuses on developing the learning progression of number sense for primary school students, and it applies a cognitive diagnostic model, the rule space model, to data analysis. The rule space model analysis firstly extracted nine cognitive attributes and their hierarchy model from the analysis of previous research and the…
NASA Astrophysics Data System (ADS)
Kandemir, Ekrem; Borekci, Selim; Cetin, Numan S.
2018-04-01
Photovoltaic (PV) power generation has been widely used in recent years, with techniques for increasing the power efficiency representing one of the most important issues. The available maximum power of a PV panel is dependent on environmental conditions such as solar irradiance and temperature. To extract the maximum available power from a PV panel, various maximum-power-point tracking (MPPT) methods are used. In this work, two different MPPT methods were implemented for a 150-W PV panel. The first method, known as incremental conductance (Inc. Cond.) MPPT, determines the maximum power by measuring the derivative of the PV voltage and current. The other method is based on reduced-rule compressed fuzzy logic control (RR-FLC), using which it is relatively easier to determine the maximum power because a single input variable is used to reduce computing loads. In this study, a 150-W PV panel system model was realized using these MPPT methods in MATLAB and the results compared. According to the simulation results, the proposed RR-FLC-based MPPT could increase the response rate and tracking accuracy by 4.66% under standard test conditions.
Kühbandner, Stephan; Ruther, Joachim
2015-06-01
Triacylglycerides (TAGs) and other non-volatile fatty acid derivatives (NFADs) occur in large amounts in the internal tissues of insects, but their presence on the insect cuticle is controversially discussed. Most studies investigating cuticular lipids of insects involve solvent extraction, which implies the risk of extracting lipids from internal tissues. Here, we present a new method that overcomes this problem. The method employs solid phase micro-extraction (SPME) to sample NFADs by rubbing the SPME fiber over the insect cuticle. Subsequently, the sampled NFADs are transesterified in situ with trimethyl sulfonium hydroxide (TMSH) into more volatile fatty acid methyl esters (FAMEs), which can be analyzed by standard GC/MS. We performed two types of control experiments to enable significant conclusions: (1) to rule out contamination of the GC/MS system with NFADs, and (2) to exclude the presence of free fatty acids on the insect cuticle, which would also furnish FAMEs after TMSH treatment, and thus might simulate the presence of NFADs. In combination with these two essential control experiments, the described SPME technique can be used to detect TAGs and/or other NFADs on the insect cuticle. We analyzed six insect species from four insect orders with our method and compared the results with conventional solvent extraction followed by ex situ transesterification. Several fatty acids typically found as constituents of TAGs were detected by the SPME method on the cuticle of all species analyzed. A comparison of the two methods revealed differences in the fatty acid compositions of the samples. Saturated fatty acids showed by trend higher relative abundances when sampled with the SPME method, while several minor FAMEs were detected only in the solvent extracts. Our study suggests that TAGs and maybe other NFADs are far more common on the insect cuticle than usually thought.
Automatic movie skimming with general tempo analysis
NASA Astrophysics Data System (ADS)
Lee, Shih-Hung; Yeh, Chia-Hung; Kuo, C. C. J.
2003-11-01
Story units are extracted by general tempo analysis including tempos analysis including tempos of audio and visual information in this research. Although many schemes have been proposed to successfully segment video data into shots using basic low-level features, how to group shots into meaningful units called story units is still a challenging problem. By focusing on a certain type of video such as sport or news, we can explore models with the specific application domain knowledge. For movie contents, many heuristic rules based on audiovisual clues have been proposed with limited success. We propose a method to extract story units using general tempo analysis. Experimental results are given to demonstrate the feasibility and efficiency of the proposed technique.
Bousquet, Cedric; Dahamna, Badisse; Guillemin-Lanne, Sylvie; Darmoni, Stefan J; Faviez, Carole; Huot, Charles; Katsahian, Sandrine; Leroux, Vincent; Pereira, Suzanne; Richard, Christophe; Schück, Stéphane; Souvignet, Julien; Lillo-Le Louët, Agnès; Texier, Nathalie
2017-09-21
Adverse drug reactions (ADRs) are an important cause of morbidity and mortality. Classical Pharmacovigilance process is limited by underreporting which justifies the current interest in new knowledge sources such as social media. The Adverse Drug Reactions from Patient Reports in Social Media (ADR-PRISM) project aims to extract ADRs reported by patients in these media. We identified 5 major challenges to overcome to operationalize the analysis of patient posts: (1) variable quality of information on social media, (2) guarantee of data privacy, (3) response to pharmacovigilance expert expectations, (4) identification of relevant information within Web pages, and (5) robust and evolutive architecture. This article aims to describe the current state of advancement of the ADR-PRISM project by focusing on the solutions we have chosen to address these 5 major challenges. In this article, we propose methods and describe the advancement of this project on several aspects: (1) a quality driven approach for selecting relevant social media for the extraction of knowledge on potential ADRs, (2) an assessment of ethical issues and French regulation for the analysis of data on social media, (3) an analysis of pharmacovigilance expert requirements when reviewing patient posts on the Internet, (4) an extraction method based on natural language processing, pattern based matching, and selection of relevant medical concepts in reference terminologies, and (5) specifications of a component-based architecture for the monitoring system. Considering the 5 major challenges, we (1) selected a set of 21 validated criteria for selecting social media to support the extraction of potential ADRs, (2) proposed solutions to guarantee data privacy of patients posting on Internet, (3) took into account pharmacovigilance expert requirements with use case diagrams and scenarios, (4) built domain-specific knowledge resources embeding a lexicon, morphological rules, context rules, semantic rules, syntactic rules, and post-analysis processing, and (5) proposed a component-based architecture that allows storage of big data and accessibility to third-party applications through Web services. We demonstrated the feasibility of implementing a component-based architecture that allows collection of patient posts on the Internet, near real-time processing of those posts including annotation, and storage in big data structures. In the next steps, we will evaluate the posts identified by the system in social media to clarify the interest and relevance of such approach to improve conventional pharmacovigilance processes based on spontaneous reporting. ©Cedric Bousquet, Badisse Dahamna, Sylvie Guillemin-Lanne, Stefan J Darmoni, Carole Faviez, Charles Huot, Sandrine Katsahian, Vincent Leroux, Suzanne Pereira, Christophe Richard, Stéphane Schück, Julien Souvignet, Agnès Lillo-Le Louët, Nathalie Texier. Originally published in JMIR Research Protocols (http://www.researchprotocols.org), 21.09.2017.
Linan, Margaret K; Sottara, Davide; Freimuth, Robert R
2015-01-01
Pharmacogenomics (PGx) guidelines contain drug-gene relationships, therapeutic and clinical recommendations from which clinical decision support (CDS) rules can be extracted, rendered and then delivered through clinical decision support systems (CDSS) to provide clinicians with just-in-time information at the point of care. Several tools exist that can be used to generate CDS rules that are based on computer interpretable guidelines (CIG), but none have been previously applied to the PGx domain. We utilized the Unified Modeling Language (UML), the Health Level 7 virtual medical record (HL7 vMR) model, and standard terminologies to represent the semantics and decision logic derived from a PGx guideline, which were then mapped to the Health eDecisions (HeD) schema. The modeling and extraction processes developed here demonstrate how structured knowledge representations can be used to support the creation of shareable CDS rules from PGx guidelines.
1992-04-01
AND SCHEDULING" TIM FINN, UNIVERSITY OF MARYLAND, BALTIMORE COUNTY E. " EXTRACTING RULES FROM SOFTWARE FOR KNOWLEDGE-BASES" NOAH S. PRYWES, UNIVERSITY...Databases for Planning and Scheduling" Tim Finin, Unisys Corporation 8:30 - 9:00 " Extracting Rules from Software for Knowledge Baseso Noah Prywes, U. of...Space Requirements are Tractable E.G.: FEM, Multiplication Routines, Sorting Programs Lebmwmy fo Al Roseew d. The Ohio Male Unlversity A-2 Type 2
Evolving rule-based systems in two medical domains using genetic programming.
Tsakonas, Athanasios; Dounias, Georgios; Jantzen, Jan; Axer, Hubertus; Bjerregaard, Beth; von Keyserlingk, Diedrich Graf
2004-11-01
To demonstrate and compare the application of different genetic programming (GP) based intelligent methodologies for the construction of rule-based systems in two medical domains: the diagnosis of aphasia's subtypes and the classification of pap-smear examinations. Past data representing (a) successful diagnosis of aphasia's subtypes from collaborating medical experts through a free interview per patient, and (b) correctly classified smears (images of cells) by cyto-technologists, previously stained using the Papanicolaou method. Initially a hybrid approach is proposed, which combines standard genetic programming and heuristic hierarchical crisp rule-base construction. Then, genetic programming for the production of crisp rule based systems is attempted. Finally, another hybrid intelligent model is composed by a grammar driven genetic programming system for the generation of fuzzy rule-based systems. Results denote the effectiveness of the proposed systems, while they are also compared for their efficiency, accuracy and comprehensibility, to those of an inductive machine learning approach as well as to those of a standard genetic programming symbolic expression approach. The proposed GP-based intelligent methodologies are able to produce accurate and comprehensible results for medical experts performing competitive to other intelligent approaches. The aim of the authors was the production of accurate but also sensible decision rules that could potentially help medical doctors to extract conclusions, even at the expense of a higher classification score achievement.
A Swarm Optimization approach for clinical knowledge mining.
Christopher, J Jabez; Nehemiah, H Khanna; Kannan, A
2015-10-01
Rule-based classification is a typical data mining task that is being used in several medical diagnosis and decision support systems. The rules stored in the rule base have an impact on classification efficiency. Rule sets that are extracted with data mining tools and techniques are optimized using heuristic or meta-heuristic approaches in order to improve the quality of the rule base. In this work, a meta-heuristic approach called Wind-driven Swarm Optimization (WSO) is used. The uniqueness of this work lies in the biological inspiration that underlies the algorithm. WSO uses Jval, a new metric, to evaluate the efficiency of a rule-based classifier. Rules are extracted from decision trees. WSO is used to obtain different permutations and combinations of rules whereby the optimal ruleset that satisfies the requirement of the developer is used for predicting the test data. The performance of various extensions of decision trees, namely, RIPPER, PART, FURIA and Decision Tables are analyzed. The efficiency of WSO is also compared with the traditional Particle Swarm Optimization. Experiments were carried out with six benchmark medical datasets. The traditional C4.5 algorithm yields 62.89% accuracy with 43 rules for liver disorders dataset where as WSO yields 64.60% with 19 rules. For Heart disease dataset, C4.5 is 68.64% accurate with 98 rules where as WSO is 77.8% accurate with 34 rules. The normalized standard deviation for accuracy of PSO and WSO are 0.5921 and 0.5846 respectively. WSO provides accurate and concise rulesets. PSO yields results similar to that of WSO but the novelty of WSO lies in its biological motivation and it is customization for rule base optimization. The trade-off between the prediction accuracy and the size of the rule base is optimized during the design and development of rule-based clinical decision support system. The efficiency of a decision support system relies on the content of the rule base and classification accuracy. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Yang, Zheng Rong; Thomson, Rebecca; Hodgman, T Charles; Dry, Jonathan; Doyle, Austin K; Narayanan, Ajit; Wu, XiKun
2003-11-01
This paper presents an algorithm which is able to extract discriminant rules from oligopeptides for protease proteolytic cleavage activity prediction. The algorithm is developed using genetic programming. Three important components in the algorithm are a min-max scoring function, the reverse Polish notation (RPN) and the use of minimum description length. The min-max scoring function is developed using amino acid similarity matrices for measuring the similarity between an oligopeptide and a rule, which is a complex algebraic equation of amino acids rather than a simple pattern sequence. The Fisher ratio is then calculated on the scoring values using the class label associated with the oligopeptides. The discriminant ability of each rule can therefore be evaluated. The use of RPN makes the evolutionary operations simpler and therefore reduces the computational cost. To prevent overfitting, the concept of minimum description length is used to penalize over-complicated rules. A fitness function is therefore composed of the Fisher ratio and the use of minimum description length for an efficient evolutionary process. In the application to four protease datasets (Trypsin, Factor Xa, Hepatitis C Virus and HIV protease cleavage site prediction), our algorithm is superior to C5, a conventional method for deriving decision trees.
Extracting Drug-Drug Interactions with Word and Character-Level Recurrent Neural Networks
Kavuluru, Ramakanth; Rios, Anthony; Tran, Tung
2017-01-01
Drug-drug interactions (DDIs) are known to be responsible for nearly a third of all adverse drug reactions. Hence several current efforts focus on extracting signal from EMRs to prioritize DDIs that need further exploration. To this end, being able to extract explicit mentions of DDIs in free text narratives is an important task. In this paper, we explore recurrent neural network (RNN) architectures to detect and classify DDIs from unstructured text using the DDIExtraction dataset from the SemEval 2013 (task 9) shared task. Our methods are in line with those used in other recent deep learning efforts for relation extraction including DDI extraction. However, to our knowledge, we are the first to investigate the potential of character-level RNNs (Char-RNNs) for DDI extraction (and relation extraction in general). Furthermore, we explore a simple but effective model bootstrapping method to (a). build model averaging ensembles, (b). derive confidence intervals around mean micro-F scores (MMF), and (c). assess the average behavior of our methods. Without any rule based filtering of negative examples, a popular heuristic used by most earlier efforts, we achieve an MMF of 69.13. By adding simple replicable heuristics to filter negative instances we are able to achieve an MMF of 70.38. Furthermore, our best ensembles produce micro F-scores of 70.81 (without filtering) and 72.13 (with filtering), which are superior to metrics reported in published results. Although Char-RNNs turnout to be inferior to regular word based RNN models in overall comparisons, we find that ensembling models from both architectures results in nontrivial gains over simply using either alone, indicating that they complement each other. PMID:29034375
Kim, Heejun; Bian, Jiantao; Mostafa, Javed; Jonnalagadda, Siddhartha; Del Fiol, Guilherme
2016-01-01
Motivation: Clinicians need up-to-date evidence from high quality clinical trials to support clinical decisions. However, applying evidence from the primary literature requires significant effort. Objective: To examine the feasibility of automatically extracting key clinical trial information from ClinicalTrials.gov. Methods: We assessed the coverage of ClinicalTrials.gov for high quality clinical studies that are indexed in PubMed. Using 140 random ClinicalTrials.gov records, we developed and tested rules for the automatic extraction of key information. Results: The rate of high quality clinical trial registration in ClinicalTrials.gov increased from 0.2% in 2005 to 17% in 2015. Trials reporting results increased from 3% in 2005 to 19% in 2015. The accuracy of the automatic extraction algorithm for 10 trial attributes was 90% on average. Future research is needed to improve the algorithm accuracy and to design information displays to optimally present trial information to clinicians. PMID:28269867
Prediction of linear B-cell epitopes of hepatitis C virus for vaccine development
2015-01-01
Background High genetic heterogeneity in the hepatitis C virus (HCV) is the major challenge of the development of an effective vaccine. Existing studies for developing HCV vaccines have mainly focused on T-cell immune response. However, identification of linear B-cell epitopes that can stimulate B-cell response is one of the major tasks of peptide-based vaccine development. Owing to the variability in B-cell epitope length, the prediction of B-cell epitopes is much more complex than that of T-cell epitopes. Furthermore, the motifs of linear B-cell epitopes in different pathogens are quite different (e. g. HCV and hepatitis B virus). To cope with this challenge, this work aims to propose an HCV-customized sequence-based prediction method to identify B-cell epitopes of HCV. Results This work establishes an experimentally verified dataset comprising the B-cell response of HCV dataset consisting of 774 linear B-cell epitopes and 774 non B-cell epitopes from the Immune Epitope Database. An interpretable rule mining system of B-cell epitopes (IRMS-BE) is proposed to select informative physicochemical properties (PCPs) and then extracts several if-then rule-based knowledge for identifying B-cell epitopes. A web server Bcell-HCV was implemented using an SVM with the 34 informative PCPs, which achieved a training accuracy of 79.7% and test accuracy of 70.7% better than the SVM-based methods for identifying B-cell epitopes of HCV and the two general-purpose methods. This work performs advanced analysis of the 34 informative properties, and the results indicate that the most effective property is the alpha-helix structure of epitopes, which influences the connection between host cells and the E2 proteins of HCV. Furthermore, 12 interpretable rules are acquired from top-five PCPs and achieve a sensitivity of 75.6% and specificity of 71.3%. Finally, a conserved promising vaccine candidate, PDREMVLYQE, is identified for inclusion in a vaccine against HCV. Conclusions This work proposes an interpretable rule mining system IRMS-BE for extracting interpretable rules using informative physicochemical properties and a web server Bcell-HCV for predicting linear B-cell epitopes of HCV. IRMS-BE may also apply to predict B-cell epitopes for other viruses, which benefits the improvement of vaccines development of these viruses without significant modification. Bcell-HCV is useful for identifying B-cell epitopes of HCV antigen to help vaccine development, which is available at http://e045.life.nctu.edu.tw/BcellHCV. PMID:26680271
NASA Technical Reports Server (NTRS)
Enslin, William R.; Ton, Jezching; Jain, Anil
1987-01-01
Landsat TM data were combined with land cover and planimetric data layers contained in the State of Michigan's geographic information system (GIS) to identify changes in forestlands, specifically new oil/gas wells. A GIS-guided feature-based classification method was developed. The regions extracted by the best image band/operator combination were studied using a set of rules based on the characteristics of the GIS oil/gas pads.
Historical feature pattern extraction based network attack situation sensing algorithm.
Zeng, Yong; Liu, Dacheng; Lei, Zhou
2014-01-01
The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.
Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm
Zeng, Yong; Liu, Dacheng; Lei, Zhou
2014-01-01
The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously. PMID:24892054
Gene regulatory network identification from the yeast cell cycle based on a neuro-fuzzy system.
Wang, B H; Lim, J W; Lim, J S
2016-08-30
Many studies exist for reconstructing gene regulatory networks (GRNs). In this paper, we propose a method based on an advanced neuro-fuzzy system, for gene regulatory network reconstruction from microarray time-series data. This approach uses a neural network with a weighted fuzzy function to model the relationships between genes. Fuzzy rules, which determine the regulators of genes, are very simplified through this method. Additionally, a regulator selection procedure is proposed, which extracts the exact dynamic relationship between genes, using the information obtained from the weighted fuzzy function. Time-series related features are extracted from the original data to employ the characteristics of temporal data that are useful for accurate GRN reconstruction. The microarray dataset of the yeast cell cycle was used for our study. We measured the mean squared prediction error for the efficiency of the proposed approach and evaluated the accuracy in terms of precision, sensitivity, and F-score. The proposed method outperformed the other existing approaches.
The Safe Yield and Climatic Variability: Implications for Groundwater Management.
Loáiciga, Hugo A
2017-05-01
Methods for calculating the safe yield are evaluated in this paper using a high-quality and long historical data set of groundwater recharge, discharge, extraction, and precipitation in a karst aquifer. Consideration is given to the role that climatic variability has on the determination of a climatically representative period with which to evaluate the safe yield. The methods employed to estimate the safe yield are consistent with its definition as a long-term average extraction rate that avoids adverse impacts on groundwater. The safe yield is a useful baseline for groundwater planning; yet, it is herein shown that it is not an operational rule that works well under all climatic conditions. This paper shows that due to the nature of dynamic groundwater processes it may be most appropriate to use an adaptive groundwater management strategy that links groundwater extraction rates to groundwater discharge rates, thus achieving a safe yield that represents an estimated long-term sustainable yield. An example of the calculation of the safe yield of the Edwards Aquifer (Texas) demonstrates that it is about one-half of the average annual recharge. © 2016, National Ground Water Association.
Salient contour extraction from complex natural scene in night vision image
NASA Astrophysics Data System (ADS)
Han, Jing; Yue, Jiang; Zhang, Yi; Bai, Lian-fa
2014-03-01
The theory of center-surround interaction in non-classical receptive field can be applied in night vision information processing. In this work, an optimized compound receptive field modulation method is proposed to extract salient contour from complex natural scene in low-light-level (LLL) and infrared images. The kernel idea is that multi-feature analysis can recognize the inhomogeneity in modulatory coverage more accurately and that center and surround with the grouping structure satisfying Gestalt rule deserves high connection-probability. Computationally, a multi-feature contrast weighted inhibition model is presented to suppress background and lower mutual inhibition among contour elements; a fuzzy connection facilitation model is proposed to achieve the enhancement of contour response, the connection of discontinuous contour and the further elimination of randomly distributed noise and texture; a multi-scale iterative attention method is designed to accomplish dynamic modulation process and extract contours of targets in multi-size. This work provides a series of biologically motivated computational visual models with high-performance for contour detection from cluttered scene in night vision images.
Localized Segment Based Processing for Automatic Building Extraction from LiDAR Data
NASA Astrophysics Data System (ADS)
Parida, G.; Rajan, K. S.
2017-05-01
The current methods of object segmentation and extraction and classification of aerial LiDAR data is manual and tedious task. This work proposes a technique for object segmentation out of LiDAR data. A bottom-up geometric rule based approach was used initially to devise a way to segment buildings out of the LiDAR datasets. For curved wall surfaces, comparison of localized surface normals was done to segment buildings. The algorithm has been applied to both synthetic datasets as well as real world dataset of Vaihingen, Germany. Preliminary results show successful segmentation of the buildings objects from a given scene in case of synthetic datasets and promissory results in case of real world data. The advantages of the proposed work is non-dependence on any other form of data required except LiDAR. It is an unsupervised method of building segmentation, thus requires no model training as seen in supervised techniques. It focuses on extracting the walls of the buildings to construct the footprint, rather than focussing on roof. The focus on extracting the wall to reconstruct the buildings from a LiDAR scene is crux of the method proposed. The current segmentation approach can be used to get 2D footprints of the buildings, with further scope to generate 3D models. Thus, the proposed method can be used as a tool to get footprints of buildings in urban landscapes, helping in urban planning and the smart cities endeavour.
Federal Register Notice for the Mining Waste Exclusion Final Rule, September 1, 1989
Final rule responding to a federal Appeals Court directive to narrow the exclusion of solid waste from the extraction, beneficiation, and processing of ores and minerals from regulation as hazardous waste as it applies to mineral processing wastes.
Kaiser, W; Faber, T S; Findeis, M
1996-01-01
The authors developed a computer program that detects myocardial infarction (MI) and left ventricular hypertrophy (LVH) in two steps: (1) by extracting parameter values from a 10-second, 12-lead electrocardiogram, and (2) by classifying the extracted parameter values with rule sets. Every disease has its dedicated set of rules. Hence, there are separate rule sets for anterior MI, inferior MI, and LVH. If at least one rule is satisfied, the disease is said to be detected. The computer program automatically develops these rule sets. A database (learning set) of healthy subjects and patients with MI, LVH, and mixed MI+LVH was used. After defining the rule type, initial limits, and expected quality of the rules (positive predictive value, minimum number of patients), the program creates a set of rules by varying the limits. The general rule type is defined as: disease = lim1l < p1 < or = lim1u and lim2l < p2 < or = lim2u and ... limnl < pn < or = limnu. When defining the rule types, only the parameters (p1 ... pn) that are known as clinical electrocardiographic criteria (amplitudes [mV] of Q, R, and T waves and ST-segment; duration [ms] of Q wave; frontal angle [degrees]) were used. This allowed for submitting the learned rule sets to an independent investigator for medical verification. It also allowed the creation of explanatory texts with the rules. These advantages are not offered by the neurons of a neural network. The learned rules were checked against a test set and the following results were obtained: MI: sensitivity 76.2%, positive predictive value 98.6%; LVH: sensitivity 72.3%, positive predictive value 90.9%. The specificity ratings for MI are better than 98%; for LVH, better than 90%.
A rapid extraction of landslide disaster information research based on GF-1 image
NASA Astrophysics Data System (ADS)
Wang, Sai; Xu, Suning; Peng, Ling; Wang, Zhiyi; Wang, Na
2015-08-01
In recent years, the landslide disasters occurred frequently because of the seismic activity. It brings great harm to people's life. It has caused high attention of the state and the extensive concern of society. In the field of geological disaster, landslide information extraction based on remote sensing has been controversial, but high resolution remote sensing image can improve the accuracy of information extraction effectively with its rich texture and geometry information. Therefore, it is feasible to extract the information of earthquake- triggered landslides with serious surface damage and large scale. Taking the Wenchuan county as the study area, this paper uses multi-scale segmentation method to extract the landslide image object through domestic GF-1 images and DEM data, which uses the estimation of scale parameter tool to determine the optimal segmentation scale; After analyzing the characteristics of landslide high-resolution image comprehensively and selecting spectrum feature, texture feature, geometric features and landform characteristics of the image, we can establish the extracting rules to extract landslide disaster information. The extraction results show that there are 20 landslide whose total area is 521279.31 .Compared with visual interpretation results, the extraction accuracy is 72.22%. This study indicates its efficient and feasible to extract earthquake landslide disaster information based on high resolution remote sensing and it provides important technical support for post-disaster emergency investigation and disaster assessment.
Semantic Segmentation of Building Elements Using Point Cloud Hashing
NASA Astrophysics Data System (ADS)
Chizhova, M.; Gurianov, A.; Hess, M.; Luhmann, T.; Brunn, A.; Stilla, U.
2018-05-01
For the interpretation of point clouds, the semantic definition of extracted segments from point clouds or images is a common problem. Usually, the semantic of geometrical pre-segmented point cloud elements are determined using probabilistic networks and scene databases. The proposed semantic segmentation method is based on the psychological human interpretation of geometric objects, especially on fundamental rules of primary comprehension. Starting from these rules the buildings could be quite well and simply classified by a human operator (e.g. architect) into different building types and structural elements (dome, nave, transept etc.), including particular building parts which are visually detected. The key part of the procedure is a novel method based on hashing where point cloud projections are transformed into binary pixel representations. A segmentation approach released on the example of classical Orthodox churches is suitable for other buildings and objects characterized through a particular typology in its construction (e.g. industrial objects in standardized enviroments with strict component design allowing clear semantic modelling).
Learning Semantic Tags from Big Data for Clinical Text Representation.
Li, Yanpeng; Liu, Hongfang
2015-01-01
In clinical text mining, it is one of the biggest challenges to represent medical terminologies and n-gram terms in sparse medical reports using either supervised or unsupervised methods. Addressing this issue, we propose a novel method for word and n-gram representation at semantic level. We first represent each word by its distance with a set of reference features calculated by reference distance estimator (RDE) learned from labeled and unlabeled data, and then generate new features using simple techniques of discretization, random sampling and merging. The new features are a set of binary rules that can be interpreted as semantic tags derived from word and n-grams. We show that the new features significantly outperform classical bag-of-words and n-grams in the task of heart disease risk factor extraction in i2b2 2014 challenge. It is promising to see that semantics tags can be used to replace the original text entirely with even better prediction performance as well as derive new rules beyond lexical level.
Extracting fuzzy rules under uncertainty and measuring definability using rough sets
NASA Technical Reports Server (NTRS)
Culas, Donald E.
1991-01-01
Although computers have come a long way since their invention, they are basically able to handle only crisp values at the hardware level. Unfortunately, the world we live in consists of problems which fail to fall into this category, i.e., uncertainty is all too common. A problem is looked at which involves uncertainty. To be specific, attributes are dealt with which are fuzzy sets. Under this condition, knowledge is acquired by looking at examples. In each example, a condition as well as a decision is made available. Based on the examples given, two sets of rules are extracted, certain and possible. Furthermore, measures are constructed of how much these rules are believed in, and finally, the decisions are defined as a function of the terms used in the conditions.
Information extraction and knowledge graph construction from geoscience literature
NASA Astrophysics Data System (ADS)
Wang, Chengbin; Ma, Xiaogang; Chen, Jianguo; Chen, Jingwen
2018-03-01
Geoscience literature published online is an important part of open data, and brings both challenges and opportunities for data analysis. Compared with studies of numerical geoscience data, there are limited works on information extraction and knowledge discovery from textual geoscience data. This paper presents a workflow and a few empirical case studies for that topic, with a focus on documents written in Chinese. First, we set up a hybrid corpus combining the generic and geology terms from geology dictionaries to train Chinese word segmentation rules of the Conditional Random Fields model. Second, we used the word segmentation rules to parse documents into individual words, and removed the stop-words from the segmentation results to get a corpus constituted of content-words. Third, we used a statistical method to analyze the semantic links between content-words, and we selected the chord and bigram graphs to visualize the content-words and their links as nodes and edges in a knowledge graph, respectively. The resulting graph presents a clear overview of key information in an unstructured document. This study proves the usefulness of the designed workflow, and shows the potential of leveraging natural language processing and knowledge graph technologies for geoscience.
An Interval Type-2 Neural Fuzzy System for Online System Identification and Feature Elimination.
Lin, Chin-Teng; Pal, Nikhil R; Wu, Shang-Lin; Liu, Yu-Ting; Lin, Yang-Yin
2015-07-01
We propose an integrated mechanism for discarding derogatory features and extraction of fuzzy rules based on an interval type-2 neural fuzzy system (NFS)-in fact, it is a more general scheme that can discard bad features, irrelevant antecedent clauses, and even irrelevant rules. High-dimensional input variable and a large number of rules not only enhance the computational complexity of NFSs but also reduce their interpretability. Therefore, a mechanism for simultaneous extraction of fuzzy rules and reducing the impact of (or eliminating) the inferior features is necessary. The proposed approach, namely an interval type-2 Neural Fuzzy System for online System Identification and Feature Elimination (IT2NFS-SIFE), uses type-2 fuzzy sets to model uncertainties associated with information and data in designing the knowledge base. The consequent part of the IT2NFS-SIFE is of Takagi-Sugeno-Kang type with interval weights. The IT2NFS-SIFE possesses a self-evolving property that can automatically generate fuzzy rules. The poor features can be discarded through the concept of a membership modulator. The antecedent and modulator weights are learned using a gradient descent algorithm. The consequent part weights are tuned via the rule-ordered Kalman filter algorithm to enhance learning effectiveness. Simulation results show that IT2NFS-SIFE not only simplifies the system architecture by eliminating derogatory/irrelevant antecedent clauses, rules, and features but also maintains excellent performance.
Multiple-Primitives Hierarchical Classification of Airborne Laser Scanning Data in Urban Areas
NASA Astrophysics Data System (ADS)
Ni, H.; Lin, X. G.; Zhang, J. X.
2017-09-01
A hierarchical classification method for Airborne Laser Scanning (ALS) data of urban areas is proposed in this paper. This method is composed of three stages among which three types of primitives are utilized, i.e., smooth surface, rough surface, and individual point. In the first stage, the input ALS data is divided into smooth surfaces and rough surfaces by employing a step-wise point cloud segmentation method. In the second stage, classification based on smooth surfaces and rough surfaces is performed. Points in the smooth surfaces are first classified into ground and buildings based on semantic rules. Next, features of rough surfaces are extracted. Then, points in rough surfaces are classified into vegetation and vehicles based on the derived features and Random Forests (RF). In the third stage, point-based features are extracted for the ground points, and then, an individual point classification procedure is performed to classify the ground points into bare land, artificial ground and greenbelt. Moreover, the shortages of the existing studies are analyzed, and experiments show that the proposed method overcomes these shortages and handles more types of objects.
A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records.
Tahsin, Tasnia; Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela
2016-09-01
The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Discovering H-bonding rules in crystals with inductive logic programming.
Ando, Howard Y; Dehaspe, Luc; Luyten, Walter; Van Craenenbroeck, Elke; Vandecasteele, Henk; Van Meervelt, Luc
2006-01-01
In the domain of crystal engineering, various schemes have been proposed for the classification of hydrogen bonding (H-bonding) patterns observed in 3D crystal structures. In this study, the aim is to complement these schemes with rules that predict H-bonding in crystals from 2D structural information only. Modern computational power and the advances in inductive logic programming (ILP) can now provide computational chemistry with the opportunity for extracting structure-specific rules from large databases that can be incorporated into expert systems. ILP technology is here applied to H-bonding in crystals to develop a self-extracting expert system utilizing data in the Cambridge Structural Database of small molecule crystal structures. A clear increase in performance was observed when the ILP system DMax was allowed to refer to the local structural environment of the possible H-bond donor/acceptor pairs. This ability distinguishes ILP from more traditional approaches that build rules on the basis of global molecular properties.
On the inherent competition between valid and spurious inductive inferences in Boolean data
NASA Astrophysics Data System (ADS)
Andrecut, M.
Inductive inference is the process of extracting general rules from specific observations. This problem also arises in the analysis of biological networks, such as genetic regulatory networks, where the interactions are complex and the observations are incomplete. A typical task in these problems is to extract general interaction rules as combinations of Boolean covariates, that explain a measured response variable. The inductive inference process can be considered as an incompletely specified Boolean function synthesis problem. This incompleteness of the problem will also generate spurious inferences, which are a serious threat to valid inductive inference rules. Using random Boolean data as a null model, here we attempt to measure the competition between valid and spurious inductive inference rules from a given data set. We formulate two greedy search algorithms, which synthesize a given Boolean response variable in a sparse disjunct normal form, and respectively a sparse generalized algebraic normal form of the variables from the observation data, and we evaluate numerically their performance.
Mobile robots traversability awareness based on terrain visual sensory data fusion
NASA Astrophysics Data System (ADS)
Shirkhodaie, Amir
2007-04-01
In this paper, we have presented methods that significantly improve the robot awareness of its terrain traversability conditions. The terrain traversability awareness is achieved by association of terrain image appearances from different poses and fusion of extracted information from multimodality imaging and range sensor data for localization and clustering environment landmarks. Initially, we describe methods for extraction of salient features of the terrain for the purpose of landmarks registration from two or more images taken from different via points along the trajectory path of the robot. The method of image registration is applied as a means of overlaying (two or more) of the same terrain scene at different viewpoints. The registration geometrically aligns salient landmarks of two images (the reference and sensed images). A Similarity matching techniques is proposed for matching the terrain salient landmarks. Secondly, we present three terrain classifier models based on rule-based, supervised neural network, and fuzzy logic for classification of terrain condition under uncertainty and mapping the robot's terrain perception to apt traversability measures. This paper addresses the technical challenges and navigational skill requirements of mobile robots for traversability path planning in natural terrain environments similar to Mars surface terrains. We have described different methods for detection of salient terrain features based on imaging texture analysis techniques. We have also presented three competing techniques for terrain traversability assessment of mobile robots navigating in unstructured natural terrain environments. These three techniques include: a rule-based terrain classifier, a neural network-based terrain classifier, and a fuzzy-logic terrain classifier. Each proposed terrain classifier divides a region of natural terrain into finite sub-terrain regions and classifies terrain condition exclusively within each sub-terrain region based on terrain spatial and textural cues.
A motif detection and classification method for peptide sequences using genetic programming.
Tomita, Yasuyuki; Kato, Ryuji; Okochi, Mina; Honda, Hiroyuki
2008-08-01
An exploration of common rules (property motifs) in amino acid sequences has been required for the design of novel sequences and elucidation of the interactions between molecules controlled by the structural or physical environment. In the present study, we developed a new method to search property motifs that are common in peptide sequence data. Our method comprises the following two characteristics: (i) the automatic determination of the position and length of common property motifs by calculating the physicochemical similarity of amino acids, and (ii) the quick and effective exploration of motif candidates that discriminates the positives and negatives by the introduction of genetic programming (GP). Our method was evaluated by two types of model data sets. First, the intentionally buried property motifs were searched in the artificially derived peptide data containing intentionally buried property motifs. As a result, the expected property motifs were correctly extracted by our algorithm. Second, the peptide data that interact with MHC class II molecules were analyzed as one of the models of biologically active peptides with buried motifs in various lengths. Twofold MHC class II binding peptides were identified with the rule using our method, compared to the existing scoring matrix method. In conclusion, our GP based motif searching approach enabled to obtain knowledge of functional aspects of the peptides without any prior knowledge.
KAM (Knowledge Acquisition Module): A tool to simplify the knowledge acquisition process
NASA Technical Reports Server (NTRS)
Gettig, Gary A.
1988-01-01
Analysts, knowledge engineers and information specialists are faced with increasing volumes of time-sensitive data in text form, either as free text or highly structured text records. Rapid access to the relevant data in these sources is essential. However, due to the volume and organization of the contents, and limitations of human memory and association, frequently: (1) important information is not located in time; (2) reams of irrelevant data are searched; and (3) interesting or critical associations are missed due to physical or temporal gaps involved in working with large files. The Knowledge Acquisition Module (KAM) is a microcomputer-based expert system designed to assist knowledge engineers, analysts, and other specialists in extracting useful knowledge from large volumes of digitized text and text-based files. KAM formulates non-explicit, ambiguous, or vague relations, rules, and facts into a manageable and consistent formal code. A library of system rules or heuristics is maintained to control the extraction of rules, relations, assertions, and other patterns from the text. These heuristics can be added, deleted or customized by the user. The user can further control the extraction process with optional topic specifications. This allows the user to cluster extracts based on specific topics. Because KAM formalizes diverse knowledge, it can be used by a variety of expert systems and automated reasoning applications. KAM can also perform important roles in computer-assisted training and skill development. Current research efforts include the applicability of neural networks to aid in the extraction process and the conversion of these extracts into standard formats.
Sortal anaphora resolution to enhance relation extraction from biomedical literature.
Kilicoglu, Halil; Rosemblat, Graciela; Fiszman, Marcelo; Rindflesch, Thomas C
2016-04-14
Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level. We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed. Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.
Valx: A system for extracting and structuring numeric lab test comparison statements from text
Hao, Tianyong; Liu, Hongfang; Weng, Chunhua
2017-01-01
Objectives To develop an automated method for extracting and structuring numeric lab test comparison statements from text and evaluate the method using clinical trial eligibility criteria text. Methods Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and domain knowledge acquired from the Internet, Valx takes 7 steps to extract and normalize numeric lab test expressions: 1) text preprocessing, 2) numeric, unit, and comparison operator extraction, 3) variable identification using hybrid knowledge, 4) variable - numeric association, 5) context-based association filtering, 6) measurement unit normalization, and 7) heuristic rule-based comparison statements verification. Our reference standard was the consensus-based annotation among three raters for all comparison statements for two variables, i.e., HbA1c and glucose, identified from all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov. Results The precision, recall, and F-measure for structuring HbA1c comparison statements were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type 2 Diabetes trials, respectively. The precision, recall, and F-measure for structuring glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials, and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively. Conclusions Valx is effective at extracting and structuring free-text lab test comparison statements in clinical trial summaries. Future studies are warranted to test its generalizability beyond eligibility criteria text. The open-source Valx enables its further evaluation and continued improvement among the collaborative scientific community. PMID:26940748
Adib, Adiana Mohamed; Jamaludin, Fadzureena; Kiong, Ling Sui; Hashim, Nuziah; Abdullah, Zunoliza
2014-08-05
Baeckea frutescens or locally known as Cucur atap is used as antibacterial, antidysentery, antipyretic and diuretic agent. In Malaysia and Indonesia, they are used as an ingredient of the traditional medicine given to mothers during confinement. A three-steps infra-red (IR) macro-fingerprinting method combining conventional IR spectra, and the secondary derivative spectra with two dimensional infrared correlation spectroscopy (2D-IR) have been proved to be effective methods to examine a complicated mixture such as herbal medicines. This study investigated the feasibility of employing multi-steps IR spectroscopy in order to study the main constituents of B. frutescens and its different extracts (extracted by chloroform, ethyl acetate, methanol and aqueous in turn). The findings indicated that FT-IR and 2D-IR can provide many holistic variation rules of chemical constituents. The structural information of the samples indicated that B. frutescens and its extracts contain a large amount of flavonoids, since some characteristic absorption peaks of flavonoids, such as ∼1600cm(-1), ∼1500cm(-1), ∼1450cm(-1), and ∼1270cm(-1) can be observed. The macroscopical fingerprint characters of FT-IR and 2D-IR spectra can not only provide the information of main chemical constituents in medicinal materials and their different extracts, but also compare the components differences among the similar samples. In conclusion, the multi-steps IR macro-fingerprint method is rapid, effective, visual and accurate for pharmaceutical research. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Ikeda, Yoichi
2018-03-01
We present recent progress of lattice QCD studies on hadronic interactions which play a crucial role to understand the properties of atomic nuclei and hadron resonances. There are two methods, the plateau method (or the direct method) and the HAL QCD method, to study the hadronic interactions. In the plateau method, the determination of a ground state energy from the temporal correlation functions of multi-hadron systems is a key to reliably extract the physical observables. It turns out that, due to the contamination of excited elastic scattering states nearby, one can easily be misled by a fake plateau into extracting the ground state energy. We introduce a consistency check (sanity check) which can rule out obviously false results obtained from a fake plateau, and find that none of the results obtained at the moment for two-baryon systems in the plateau method pass the test. On the other hand, the HAL QCD method is free from the fake-plateau problem. We investigate the systematic uncertainties of the HAL QCD method, which are found to be well controlled. On the basis of the HAL QCD method, the structure of the tetraquark candidate Zc(3900), which was experimentally reported in e+e- collisions, is studied by the s-wave two-meson coupled-channel scattering. The results show that the Zc(3900) is not a conventional resonance but a threshold cusp. A semi-phenomenological analysis with the coupled-channel interaction to the experimentally observed decay mode is also presented to confirm the conclusion.
Modified Spectral Fatigue Methods for S-N Curves With MIL-HDBK-5J Coefficients
NASA Technical Reports Server (NTRS)
Irvine, Tom; Larsen, Curtis
2016-01-01
The rainflow method is used for counting fatigue cycles from a stress response time history, where the fatigue cycles are stress-reversals. The rainflow method allows the application of Palmgren-Miner's rule in order to assess the fatigue life of a structure subject to complex loading. The fatigue damage may also be calculated from a stress response power spectral density (PSD) using the semi-empirical Dirlik, Single Moment, Zhao-Baker and other spectral methods. These methods effectively assume that the PSD has a corresponding time history which is stationary with a normal distribution. This paper shows how the probability density function for rainflow stress cycles can be extracted from each of the spectral methods. This extraction allows for the application of the MIL-HDBK-5J fatigue coefficients in the cumulative damage summation. A numerical example is given in this paper for the stress response of a beam undergoing random base excitation, where the excitation is applied separately by a time history and by its corresponding PSD. The fatigue calculation is performed in the time domain, as well as in the frequency domain via the modified spectral methods. The result comparison shows that the modified spectral methods give comparable results to the time domain rainflow counting method.
Amplitudes for multiphoton quantum processes in linear optics
NASA Astrophysics Data System (ADS)
Urías, Jesús
2011-07-01
The prominent role that linear optical networks have acquired in the engineering of photon states calls for physically intuitive and automatic methods to compute the probability amplitudes for the multiphoton quantum processes occurring in linear optics. A version of Wick's theorem for the expectation value, on any vector state, of products of linear operators, in general, is proved. We use it to extract the combinatorics of any multiphoton quantum processes in linear optics. The result is presented as a concise rule to write down directly explicit formulae for the probability amplitude of any multiphoton process in linear optics. The rule achieves a considerable simplification and provides an intuitive physical insight about quantum multiphoton processes. The methodology is applied to the generation of high-photon-number entangled states by interferometrically mixing coherent light with spontaneously down-converted light.
Unsupervised learning of natural languages
Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon
2005-01-01
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics. PMID:16087885
Unsupervised learning of natural languages.
Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon
2005-08-16
We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
Gibert, Karina; García-Rudolph, Alejandro; Curcoll, Lluïsa; Soler, Dolors; Pla, Laura; Tormos, José María
2009-01-01
In this paper, an integral Knowledge Discovery Methodology, named Clustering based on rules by States, which incorporates artificial intelligence (AI) and statistical methods as well as interpretation-oriented tools, is used for extracting knowledge patterns about the evolution over time of the Quality of Life (QoL) of patients with Spinal Cord Injury. The methodology incorporates the interaction with experts as a crucial element with the clustering methodology to guarantee usefulness of the results. Four typical patterns are discovered by taking into account prior expert knowledge. Several hypotheses are elaborated about the reasons for psychological distress or decreases in QoL of patients over time. The knowledge discovery from data (KDD) approach turns out, once again, to be a suitable formal framework for handling multidimensional complexity of the health domains.
Elayavilli, Ravikumar Komandur; Liu, Hongfang
2016-01-01
Computational modeling of biological cascades is of great interest to quantitative biologists. Biomedical text has been a rich source for quantitative information. Gathering quantitative parameters and values from biomedical text is one significant challenge in the early steps of computational modeling as it involves huge manual effort. While automatically extracting such quantitative information from bio-medical text may offer some relief, lack of ontological representation for a subdomain serves as impedance in normalizing textual extractions to a standard representation. This may render textual extractions less meaningful to the domain experts. In this work, we propose a rule-based approach to automatically extract relations involving quantitative data from biomedical text describing ion channel electrophysiology. We further translated the quantitative assertions extracted through text mining to a formal representation that may help in constructing ontology for ion channel events using a rule based approach. We have developed Ion Channel ElectroPhysiology Ontology (ICEPO) by integrating the information represented in closely related ontologies such as, Cell Physiology Ontology (CPO), and Cardiac Electro Physiology Ontology (CPEO) and the knowledge provided by domain experts. The rule-based system achieved an overall F-measure of 68.93% in extracting the quantitative data assertions system on an independently annotated blind data set. We further made an initial attempt in formalizing the quantitative data assertions extracted from the biomedical text into a formal representation that offers potential to facilitate the integration of text mining into ontological workflow, a novel aspect of this study. This work is a case study where we created a platform that provides formal interaction between ontology development and text mining. We have achieved partial success in extracting quantitative assertions from the biomedical text and formalizing them in ontological framework. The ICEPO ontology is available for download at http://openbionlp.org/mutd/supplementarydata/ICEPO/ICEPO.owl.
DeepSleepNet: A Model for Automatic Sleep Stage Scoring Based on Raw Single-Channel EEG.
Supratak, Akara; Dong, Hao; Wu, Chao; Guo, Yike
2017-11-01
This paper proposes a deep learning model, named DeepSleepNet, for automatic sleep stage scoring based on raw single-channel EEG. Most of the existing methods rely on hand-engineered features, which require prior knowledge of sleep analysis. Only a few of them encode the temporal information, such as transition rules, which is important for identifying the next sleep stages, into the extracted features. In the proposed model, we utilize convolutional neural networks to extract time-invariant features, and bidirectional-long short-term memory to learn transition rules among sleep stages automatically from EEG epochs. We implement a two-step training algorithm to train our model efficiently. We evaluated our model using different single-channel EEGs (F4-EOG (left), Fpz-Cz, and Pz-Oz) from two public sleep data sets, that have different properties (e.g., sampling rate) and scoring standards (AASM and R&K). The results showed that our model achieved similar overall accuracy and macro F1-score (MASS: 86.2%-81.7, Sleep-EDF: 82.0%-76.9) compared with the state-of-the-art methods (MASS: 85.9%-80.5, Sleep-EDF: 78.9%-73.7) on both data sets. This demonstrated that, without changing the model architecture and the training algorithm, our model could automatically learn features for sleep stage scoring from different raw single-channel EEGs from different data sets without utilizing any hand-engineered features.
Giraldo, Sergio I; Ramirez, Rafael
2016-01-01
Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores) of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1) quantitatively evaluate the accuracy of the induced models, (2) analyse the relative importance of the considered musical features, (3) discuss some of the learnt expressive performance rules in the context of previous work, and (4) assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules' performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules.
Giraldo, Sergio I.; Ramirez, Rafael
2016-01-01
Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores) of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1) quantitatively evaluate the accuracy of the induced models, (2) analyse the relative importance of the considered musical features, (3) discuss some of the learnt expressive performance rules in the context of previous work, and (4) assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules' performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules. PMID:28066290
The EPA has identified solvent extraction for vegetable oil production processes as major sources of a single hazardous air pollutant (HAP), n-hexane. Learn more about the rule requirements and regulations, as well as find compliance help
A literature search tool for intelligent extraction of disease-associated genes.
Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H; Wall, Dennis P
2014-01-01
To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.
[Occurrence of pesticides in apples of varieties from south-east Poland in the years 1986-1989].
Sadło, S
1991-01-01
In years 1986-1989, a total 445 samples apples of late cultivars were examined. Pesticides were extracted with acetone, whereupon they were transferred to dichloromethane. Extracts were analysed using a gas chromatograph equipped with an electron capture detector and a thermo-ionic detector. Moreover, TLC and colorimetric methods were used. The major part of insecticides and fungicides used for orchard protection were examined. Pesticide residues were present in 248 samples; as a rule, they did not exceed 20% of their allowable concentrations in apples. Thus, the occurrence--in the same sample--of two or more pesticides on this level should entail no risk for the consumer's health. In 19 samples the allowable concentrations of pesticide residues were slightly exceeded; in all cases, only MBC was involved.
Table Extraction from Web Pages Using Conditional Random Fields to Extract Toponym Related Data
NASA Astrophysics Data System (ADS)
Luthfi Hanifah, Hayyu'; Akbar, Saiful
2017-01-01
Table is one of the ways to visualize information on web pages. The abundant number of web pages that compose the World Wide Web has been the motivation of information extraction and information retrieval research, including the research for table extraction. Besides, there is a need for a system which is designed to specifically handle location-related information. Based on this background, this research is conducted to provide a way to extract location-related data from web tables so that it can be used in the development of Geographic Information Retrieval (GIR) system. The location-related data will be identified by the toponym (location name). In this research, a rule-based approach with gazetteer is used to recognize toponym from web table. Meanwhile, to extract data from a table, a combination of rule-based approach and statistical-based approach is used. On the statistical-based approach, Conditional Random Fields (CRF) model is used to understand the schema of the table. The result of table extraction is presented on JSON format. If a web table contains toponym, a field will be added on the JSON document to store the toponym values. This field can be used to index the table data in accordance to the toponym, which then can be used in the development of GIR system.
[Three-dimensional computer aided design for individualized post-and-core restoration].
Gu, Xiao-yu; Wang, Ya-ping; Wang, Yong; Lü, Pei-jun
2009-10-01
To develop a method of three-dimensional computer aided design (CAD) of post-and-core restoration. Two plaster casts with extracted natural teeth were used in this study. The extracted teeth were prepared and scanned using tomography method to obtain three-dimensional digitalized models. According to the basic rules of post-and-core design, posts, cores and cavity surfaces of the teeth were designed using the tools for processing point clouds, curves and surfaces on the forward engineering software of Tanglong prosthodontic system. Then three-dimensional figures of the final restorations were corrected according to the configurations of anterior teeth, premolars and molars respectively. Computer aided design of 14 post-and-core restorations were finished, and good fitness between the restoration and the three-dimensional digital models were obtained. Appropriate retention forms and enough spaces for the full crown restorations can be obtained through this method. The CAD of three-dimensional figures of the post-and-core restorations can fulfill clinical requirements. Therefore they can be used in computer-aided manufacture (CAM) of post-and-core restorations.
A logical model of cooperating rule-based systems
NASA Technical Reports Server (NTRS)
Bailin, Sidney C.; Moore, John M.; Hilberg, Robert H.; Murphy, Elizabeth D.; Bahder, Shari A.
1989-01-01
A model is developed to assist in the planning, specification, development, and verification of space information systems involving distributed rule-based systems. The model is based on an analysis of possible uses of rule-based systems in control centers. This analysis is summarized as a data-flow model for a hypothetical intelligent control center. From this data-flow model, the logical model of cooperating rule-based systems is extracted. This model consists of four layers of increasing capability: (1) communicating agents, (2) belief-sharing knowledge sources, (3) goal-sharing interest areas, and (4) task-sharing job roles.
Applying the Rule Space Model to Develop a Learning Progression for Thermochemistry
ERIC Educational Resources Information Center
Chen, Fu; Zhang, Shanshan; Guo, Yanfang; Xin, Tao
2017-01-01
We used the Rule Space Model, a cognitive diagnostic model, to measure the learning progression for thermochemistry for senior high school students. We extracted five attributes and proposed their hierarchical relationships to model the construct of thermochemistry at four levels using a hypothesized learning progression. For this study, we…
Extraction and Measurement of Multi-Level Parallelism in Productions Systems
1990-12-14
the conflict set (agenda): (RULE.A (RULEB ( MATCHA ) (MATCHB) (ACTA)) (ACTB)) where the MATCH predicates are sets of rules in working memory and the act...ACTAfnMATCHB = 0 ) A ( MATCHA n ACTB = 0 )). The non-interference criteria above is conservative and may not detect all possible paral- lelism, but more
Karakida, Ryo; Okada, Masato; Amari, Shun-Ichi
2016-07-01
The restricted Boltzmann machine (RBM) is an essential constituent of deep learning, but it is hard to train by using maximum likelihood (ML) learning, which minimizes the Kullback-Leibler (KL) divergence. Instead, contrastive divergence (CD) learning has been developed as an approximation of ML learning and widely used in practice. To clarify the performance of CD learning, in this paper, we analytically derive the fixed points where ML and CDn learning rules converge in two types of RBMs: one with Gaussian visible and Gaussian hidden units and the other with Gaussian visible and Bernoulli hidden units. In addition, we analyze the stability of the fixed points. As a result, we find that the stable points of CDn learning rule coincide with those of ML learning rule in a Gaussian-Gaussian RBM. We also reveal that larger principal components of the input data are extracted at the stable points. Moreover, in a Gaussian-Bernoulli RBM, we find that both ML and CDn learning can extract independent components at one of stable points. Our analysis demonstrates that the same feature components as those extracted by ML learning are extracted simply by performing CD1 learning. Expanding this study should elucidate the specific solutions obtained by CD learning in other types of RBMs or in deep networks. Copyright © 2016 Elsevier Ltd. All rights reserved.
Compton scattering from nuclei and photo-absorption sum rules
NASA Astrophysics Data System (ADS)
Gorchtein, Mikhail; Hobbs, Timothy; Londergan, J. Timothy; Szczepaniak, Adam P.
2011-12-01
We revisit the photo-absorption sum rule for real Compton scattering from the proton and from nuclear targets. In analogy with the Thomas-Reiche-Kuhn sum rule appropriate at low energies, we propose a new “constituent quark model” sum rule that relates the integrated strength of hadronic resonances to the scattering amplitude on constituent quarks. We study the constituent quark model sum rule for several nuclear targets. In addition, we extract the α=0 pole contribution for both proton and nuclei. Using the modern high-energy proton data, we find that the α=0 pole contribution differs significantly from the Thomson term, in contrast with the original findings by Damashek and Gilman.
Ban, Jong-Wook; Emparanza, José Ignacio; Urreta, Iratxe; Burls, Amanda
2016-01-01
Background Many new clinical prediction rules are derived and validated. But the design and reporting quality of clinical prediction research has been less than optimal. We aimed to assess whether design characteristics of validation studies were associated with the overestimation of clinical prediction rules’ performance. We also aimed to evaluate whether validation studies clearly reported important methodological characteristics. Methods Electronic databases were searched for systematic reviews of clinical prediction rule studies published between 2006 and 2010. Data were extracted from the eligible validation studies included in the systematic reviews. A meta-analytic meta-epidemiological approach was used to assess the influence of design characteristics on predictive performance. From each validation study, it was assessed whether 7 design and 7 reporting characteristics were properly described. Results A total of 287 validation studies of clinical prediction rule were collected from 15 systematic reviews (31 meta-analyses). Validation studies using case-control design produced a summary diagnostic odds ratio (DOR) 2.2 times (95% CI: 1.2–4.3) larger than validation studies using cohort design and unclear design. When differential verification was used, the summary DOR was overestimated by twofold (95% CI: 1.2 -3.1) compared to complete, partial and unclear verification. The summary RDOR of validation studies with inadequate sample size was 1.9 (95% CI: 1.2 -3.1) compared to studies with adequate sample size. Study site, reliability, and clinical prediction rule was adequately described in 10.1%, 9.4%, and 7.0% of validation studies respectively. Conclusion Validation studies with design shortcomings may overestimate the performance of clinical prediction rules. The quality of reporting among studies validating clinical prediction rules needs to be improved. PMID:26730980
Fusion of classifiers for REIS-based detection of suspicious breast lesions
NASA Astrophysics Data System (ADS)
Lederman, Dror; Wang, Xingwei; Zheng, Bin; Sumkin, Jules H.; Tublin, Mitchell; Gur, David
2011-03-01
After developing a multi-probe resonance-frequency electrical impedance spectroscopy (REIS) system aimed at detecting women with breast abnormalities that may indicate a developing breast cancer, we have been conducting a prospective clinical study to explore the feasibility of applying this REIS system to classify younger women (< 50 years old) into two groups of "higher-than-average risk" and "average risk" of having or developing breast cancer. The system comprises one central probe placed in contact with the nipple, and six additional probes uniformly distributed along an outside circle to be placed in contact with six points on the outer breast skin surface. In this preliminary study, we selected an initial set of 174 examinations on participants that have completed REIS examinations and have clinical status verification. Among these, 66 examinations were recommended for biopsy due to findings of a highly suspicious breast lesion ("positives"), and 108 were determined as negative during imaging based procedures ("negatives"). A set of REIS-based features, extracted using a mirror-matched approach, was computed and fed into five machine learning classifiers. A genetic algorithm was used to select an optimal subset of features for each of the five classifiers. Three fusion rules, namely sum rule, weighted sum rule and weighted median rule, were used to combine the results of the classifiers. Performance evaluation was performed using a leave-one-case-out cross-validation method. The results indicated that REIS may provide a new technology to identify younger women with higher than average risk of having or developing breast cancer. Furthermore, it was shown that fusion rule, such as a weighted median fusion rule and a weighted sum fusion rule may improve performance as compared with the highest performing single classifier.
a Quadtree Organization Construction and Scheduling Method for Urban 3d Model Based on Weight
NASA Astrophysics Data System (ADS)
Yao, C.; Peng, G.; Song, Y.; Duan, M.
2017-09-01
The increasement of Urban 3D model precision and data quantity puts forward higher requirements for real-time rendering of digital city model. Improving the organization, management and scheduling of 3D model data in 3D digital city can improve the rendering effect and efficiency. This paper takes the complexity of urban models into account, proposes a Quadtree construction and scheduling rendering method for Urban 3D model based on weight. Divide Urban 3D model into different rendering weights according to certain rules, perform Quadtree construction and schedule rendering according to different rendering weights. Also proposed an algorithm for extracting bounding box extraction based on model drawing primitives to generate LOD model automatically. Using the algorithm proposed in this paper, developed a 3D urban planning&management software, the practice has showed the algorithm is efficient and feasible, the render frame rate of big scene and small scene are both stable at around 25 frames.
Orthogonal search-based rule extraction for modelling the decision to transfuse.
Etchells, T A; Harrison, M J
2006-04-01
Data from an audit relating to transfusion decisions during intermediate or major surgery were analysed to determine the strengths of certain factors in the decision making process. The analysis, using orthogonal search-based rule extraction (OSRE) from a trained neural network, demonstrated that the risk of tissue hypoxia (ROTH) assessed using a 100-mm visual analogue scale, the haemoglobin value (Hb) and the presence or absence of on-going haemorrhage (OGH) were able to reproduce the transfusion decisions with a joint specificity of 0.96 and sensitivity of 0.93 and a positive predictive value of 0.9. The rules indicating transfusion were: 1. ROTH > 32 mm and Hb < 94 g x l(-1); 2. ROTH > 13 mm and Hb < 87 g x l(-1); 3. ROTH > 38 mm, Hb < 102 g x l(-1) and OGH; 4. Hb < 78 g x l(-1).
Health-Mining: a Disease Management Support Service based on Data Mining and Rule Extraction.
Bei, Andrea; De Luca, Stefano; Ruscitti, Giancarlo; Salamon, Diego
2005-01-01
The disease management is the collection of the processes aimed to control the health care and improving the quality at same time reducing the overall cost of the procedures. Our system, Health-Mining, is a Decision Support System with the objective of controlling the adequacy of hospitalization and therapies, determining the effective use of standard guidelines and eventually identifying better ones emerged from the medical practice (Evidence Based Medicine). In realizing the system, we have the aim of creation of a path to admissions- appropriateness criteria construction, valid at an international level. A main goal of the project is rule extraction and the identification of the rules adequate in term of efficacy, quality and cost reduction, especially in the view of fast changing technologies and medicines. We tested Health-Mining in a real test case for an Italian Region, Regione Veneto, on the installation of pacemaker and ICD.
Chemical named entities recognition: a review on approaches and applications
2014-01-01
The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to “text mine” these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted. PMID:24834132
Chemical named entities recognition: a review on approaches and applications.
Eltyeb, Safaa; Salim, Naomie
2014-01-01
The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted.
Michaleff, Zoe A.; Maher, Chris G.; Verhagen, Arianne P.; Rebbeck, Trudy; Lin, Chung-Wei Christine
2012-01-01
Background: There is uncertainty about the optimal approach to screen for clinically important cervical spine (C-spine) injury following blunt trauma. We conducted a systematic review to investigate the diagnostic accuracy of the Canadian C-spine rule and the National Emergency X-Radiography Utilization Study (NEXUS) criteria, 2 rules that are available to assist emergency physicians to assess the need for cervical spine imaging. Methods: We identified studies by an electronic search of CINAHL, Embase and MEDLINE. We included articles that reported on a cohort of patients who experienced blunt trauma and for whom clinically important cervical spine injury detectable by diagnostic imaging was the differential diagnosis; evaluated the diagnostic accuracy of the Canadian C-spine rule or NEXUS or both; and used an adequate reference standard. We assessed the methodologic quality using the Quality Assessment of Diagnostic Accuracy Studies criteria. We used the extracted data to calculate sensitivity, specificity, likelihood ratios and post-test probabilities. Results: We included 15 studies of modest methodologic quality. For the Canadian C-spine rule, sensitivity ranged from 0.90 to 1.00 and specificity ranged from 0.01 to 0.77. For NEXUS, sensitivity ranged from 0.83 to 1.00 and specificity ranged from 0.02 to 0.46. One study directly compared the accuracy of these 2 rules using the same cohort and found that the Canadian C-spine rule had better accuracy. For both rules, a negative test was more informative for reducing the probability of a clinically important cervical spine injury. Interpretation: Based on studies with modest methodologic quality and only one direct comparison, we found that the Canadian C-spine rule appears to have better diagnostic accuracy than the NEXUS criteria. Future studies need to follow rigorous methodologic procedures to ensure that the findings are as free of bias as possible. PMID:23048086
Rule-guided human classification of Volunteered Geographic Information
NASA Astrophysics Data System (ADS)
Ali, Ahmed Loai; Falomir, Zoe; Schmid, Falko; Freksa, Christian
2017-05-01
During the last decade, web technologies and location sensing devices have evolved generating a form of crowdsourcing known as Volunteered Geographic Information (VGI). VGI acted as a platform of spatial data collection, in particular, when a group of public participants are involved in collaborative mapping activities: they work together to collect, share, and use information about geographic features. VGI exploits participants' local knowledge to produce rich data sources. However, the resulting data inherits problematic data classification. In VGI projects, the challenges of data classification are due to the following: (i) data is likely prone to subjective classification, (ii) remote contributions and flexible contribution mechanisms in most projects, and (iii) the uncertainty of spatial data and non-strict definitions of geographic features. These factors lead to various forms of problematic classification: inconsistent, incomplete, and imprecise data classification. This research addresses classification appropriateness. Whether the classification of an entity is appropriate or inappropriate is related to quantitative and/or qualitative observations. Small differences between observations may be not recognizable particularly for non-expert participants. Hence, in this paper, the problem is tackled by developing a rule-guided classification approach. This approach exploits data mining techniques of Association Classification (AC) to extract descriptive (qualitative) rules of specific geographic features. The rules are extracted based on the investigation of qualitative topological relations between target features and their context. Afterwards, the extracted rules are used to develop a recommendation system able to guide participants to the most appropriate classification. The approach proposes two scenarios to guide participants towards enhancing the quality of data classification. An empirical study is conducted to investigate the classification of grass-related features like forest, garden, park, and meadow. The findings of this study indicate the feasibility of the proposed approach.
NASA Astrophysics Data System (ADS)
Skersys, Tomas; Butleris, Rimantas; Kapocius, Kestutis
2013-10-01
Approaches for the analysis and specification of business vocabularies and rules are very relevant topics in both Business Process Management and Information Systems Development disciplines. However, in common practice of Information Systems Development, the Business modeling activities still are of mostly empiric nature. In this paper, basic aspects of the approach for business vocabularies' semi-automated extraction from business process models are presented. The approach is based on novel business modeling-level OMG standards "Business Process Model and Notation" (BPMN) and "Semantics for Business Vocabularies and Business Rules" (SBVR), thus contributing to OMG's vision about Model-Driven Architecture (MDA) and to model-driven development in general.
Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts
Zhang, Shaodian; Elhadad, Nóemie
2013-01-01
Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work. PMID:23954592
Koyama, Yuka; Matsunami, Katsuyoshi; Otsuka, Hideaki; Shinzato, Takakazu; Takeda, Yoshio
2010-04-01
From a 1-BuOH-soluble fraction of a MeOH extract of the leaves of Microtropis japonica, collected in the Okinawa islands, six ent-labdane glucosides, named microtropiosides A-F, were isolated together with one known acyclic sesquiterpene glucoside. Their structures were elucidated by a combination of spectroscopic analyses, and their absolute configurations determined by application of the beta-D-glucopyranosylation-induced shift-trend rule in (13)C NMR spectroscopy and the modified Mosher's method. Copyright 2010 Elsevier Ltd. All rights reserved.
Simulation of land use change in the three gorges reservoir area based on CART-CA
NASA Astrophysics Data System (ADS)
Yuan, Min
2018-05-01
This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.
Forest fire autonomous decision system based on fuzzy logic
NASA Astrophysics Data System (ADS)
Lei, Z.; Lu, Jianhua
2010-11-01
The proposed system integrates GPS / pseudolite / IMU and thermal camera in order to autonomously process the graphs by identification, extraction, tracking of forest fire or hot spots. The airborne detection platform, the graph-based algorithms and the signal processing frame are analyzed detailed; especially the rules of the decision function are expressed in terms of fuzzy logic, which is an appropriate method to express imprecise knowledge. The membership function and weights of the rules are fixed through a supervised learning process. The perception system in this paper is based on a network of sensorial stations and central stations. The sensorial stations collect data including infrared and visual images and meteorological information. The central stations exchange data to perform distributed analysis. The experiment results show that working procedure of detection system is reasonable and can accurately output the detection alarm and the computation of infrared oscillations.
Combining High Spatial Resolution Optical and LIDAR Data for Object-Based Image Classification
NASA Astrophysics Data System (ADS)
Li, R.; Zhang, T.; Geng, R.; Wang, L.
2018-04-01
In order to classify high spatial resolution images more accurately, in this research, a hierarchical rule-based object-based classification framework was developed based on a high-resolution image with airborne Light Detection and Ranging (LiDAR) data. The eCognition software is employed to conduct the whole process. In detail, firstly, the FBSP optimizer (Fuzzy-based Segmentation Parameter) is used to obtain the optimal scale parameters for different land cover types. Then, using the segmented regions as basic units, the classification rules for various land cover types are established according to the spectral, morphological and texture features extracted from the optical images, and the height feature from LiDAR respectively. Thirdly, the object classification results are evaluated by using the confusion matrix, overall accuracy and Kappa coefficients. As a result, a method using the combination of an aerial image and the airborne Lidar data shows higher accuracy.
A blind dual color images watermarking based on IWT and state coding
NASA Astrophysics Data System (ADS)
Su, Qingtang; Niu, Yugang; Liu, Xianxi; Zhu, Yu
2012-04-01
In this paper, a state-coding based blind watermarking algorithm is proposed to embed color image watermark to color host image. The technique of state coding, which makes the state code of data set be equal to the hiding watermark information, is introduced in this paper. When embedding watermark, using Integer Wavelet Transform (IWT) and the rules of state coding, these components, R, G and B, of color image watermark are embedded to these components, Y, Cr and Cb, of color host image. Moreover, the rules of state coding are also used to extract watermark from the watermarked image without resorting to the original watermark or original host image. Experimental results show that the proposed watermarking algorithm cannot only meet the demand on invisibility and robustness of the watermark, but also have well performance compared with other proposed methods considered in this work.
NASA Astrophysics Data System (ADS)
Chang, Ya-Ting; Chang, Li-Chiu; Chang, Fi-John
2005-04-01
To bridge the gap between academic research and actual operation, we propose an intelligent control system for reservoir operation. The methodology includes two major processes, the knowledge acquired and implemented, and the inference system. In this study, a genetic algorithm (GA) and a fuzzy rule base (FRB) are used to extract knowledge based on the historical inflow data with a design objective function and on the operating rule curves respectively. The adaptive network-based fuzzy inference system (ANFIS) is then used to implement the knowledge, to create the fuzzy inference system, and then to estimate the optimal reservoir operation. To investigate its applicability and practicability, the Shihmen reservoir, Taiwan, is used as a case study. For the purpose of comparison, a simulation of the currently used M-5 operating rule curve is also performed. The results demonstrate that (1) the GA is an efficient way to search the optimal input-output patterns, (2) the FRB can extract the knowledge from the operating rule curves, and (3) the ANFIS models built on different types of knowledge can produce much better performance than the traditional M-5 curves in real-time reservoir operation. Moreover, we show that the model can be more intelligent for reservoir operation if more information (or knowledge) is involved.
Karystianis, George; Thayer, Kristina; Wolfe, Mary; Tsafnat, Guy
2017-06-01
Most data extraction efforts in epidemiology are focused on obtaining targeted information from clinical trials. In contrast, limited research has been conducted on the identification of information from observational studies, a major source for human evidence in many fields, including environmental health. The recognition of key epidemiological information (e.g., exposures) through text mining techniques can assist in the automation of systematic reviews and other evidence summaries. We designed and applied a knowledge-driven, rule-based approach to identify targeted information (study design, participant population, exposure, outcome, confounding factors, and the country where the study was conducted) from abstracts of epidemiological studies included in several systematic reviews of environmental health exposures. The rules were based on common syntactical patterns observed in text and are thus not specific to any systematic review. To validate the general applicability of our approach, we compared the data extracted using our approach versus hand curation for 35 epidemiological study abstracts manually selected for inclusion in two systematic reviews. The returned F-score, precision, and recall ranged from 70% to 98%, 81% to 100%, and 54% to 97%, respectively. The highest precision was observed for exposure, outcome and population (100%) while recall was best for exposure and study design with 97% and 89%, respectively. The lowest recall was observed for the population (54%), which also had the lowest F-score (70%). The generated performance of our text-mining approach demonstrated encouraging results for the identification of targeted information from observational epidemiological study abstracts related to environmental exposures. We have demonstrated that rules based on generic syntactic patterns in one corpus can be applied to other observational study design by simple interchanging the dictionaries aiming to identify certain characteristics (i.e., outcomes, exposures). At the document level, the recognised information can assist in the selection and categorization of studies included in a systematic review. Copyright © 2017 Elsevier Inc. All rights reserved.
Margined winner-take-all: New learning rule for pattern recognition.
Fukushima, Kunihiko
2018-01-01
The neocognitron is a deep (multi-layered) convolutional neural network that can be trained to recognize visual patterns robustly. In the intermediate layers of the neocognitron, local features are extracted from input patterns. In the deepest layer, based on the features extracted in the intermediate layers, input patterns are classified into classes. A method called IntVec (interpolating-vector) is used for this purpose. This paper proposes a new learning rule called margined Winner-Take-All (mWTA) for training the deepest layer. Every time when a training pattern is presented during the learning, if the result of recognition by WTA (Winner-Take-All) is an error, a new cell is generated in the deepest layer. Here we put a certain amount of margin to the WTA. In other words, only during the learning, a certain amount of handicap is given to cells of classes other than that of the training vector, and the winner is chosen under this handicap. By introducing the margin to the WTA, we can generate a compact set of cells, with which a high recognition rate can be obtained with a small computational cost. The ability of this mWTA is demonstrated by computer simulation. Copyright © 2017 Elsevier Ltd. All rights reserved.
DeepNeuron: an open deep learning toolbox for neuron tracing.
Zhou, Zhi; Kuo, Hsien-Chi; Peng, Hanchuan; Long, Fuhui
2018-06-06
Reconstructing three-dimensional (3D) morphology of neurons is essential for understanding brain structures and functions. Over the past decades, a number of neuron tracing tools including manual, semiautomatic, and fully automatic approaches have been developed to extract and analyze 3D neuronal structures. Nevertheless, most of them were developed based on coding certain rules to extract and connect structural components of a neuron, showing limited performance on complicated neuron morphology. Recently, deep learning outperforms many other machine learning methods in a wide range of image analysis and computer vision tasks. Here we developed a new Open Source toolbox, DeepNeuron, which uses deep learning networks to learn features and rules from data and trace neuron morphology in light microscopy images. DeepNeuron provides a family of modules to solve basic yet challenging problems in neuron tracing. These problems include but not limited to: (1) detecting neuron signal under different image conditions, (2) connecting neuronal signals into tree(s), (3) pruning and refining tree morphology, (4) quantifying the quality of morphology, and (5) classifying dendrites and axons in real time. We have tested DeepNeuron using light microscopy images including bright-field and confocal images of human and mouse brain, on which DeepNeuron demonstrates robustness and accuracy in neuron tracing.
Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach
2012-01-01
Background Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. Methods We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. Results We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. Conclusions We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data. PMID:22759462
Intrusion detection using rough set classification.
Zhang, Lian-hua; Zhang, Guan-hua; Zhang, Jie; Bai, Ying-cai
2004-09-01
Recently machine learning-based intrusion detection approaches have been subjected to extensive researches because they can detect both misuse and anomaly. In this paper, rough set classification (RSC), a modern learning algorithm, is used to rank the features extracted for detecting intrusions and generate intrusion detection models. Feature ranking is a very critical step when building the model. RSC performs feature ranking before generating rules, and converts the feature ranking to minimal hitting set problem addressed by using genetic algorithm (GA). This is done in classical approaches using Support Vector Machine (SVM) by executing many iterations, each of which removes one useless feature. Compared with those methods, our method can avoid many iterations. In addition, a hybrid genetic algorithm is proposed to increase the convergence speed and decrease the training time of RSC. The models generated by RSC take the form of "IF-THEN" rules, which have the advantage of explication. Tests and comparison of RSC with SVM on DARPA benchmark data showed that for Probe and DoS attacks both RSC and SVM yielded highly accurate results (greater than 99% accuracy on testing set).
Toward multiscale modelings of grain-fluid systems
NASA Astrophysics Data System (ADS)
Chareyre, Bruno; Yuan, Chao; Montella, Eduard P.; Salager, Simon
2017-06-01
Computationally efficient methods have been developed for simulating partially saturated granular materials in the pendular regime. In contrast, one hardly avoid expensive direct resolutions of 2-phase fluid dynamics problem for mixed pendular-funicular situations or even saturated regimes. Following previous developments for single-phase flow, a pore-network approach of the coupling problems is described. The geometry and movements of phases and interfaces are described on the basis of a tetrahedrization of the pore space, introducing elementary objects such as bridge, meniscus, pore body and pore throat, together with local rules of evolution. As firmly established local rules are still missing on some aspects (entry capillary pressure and pore-scale pressure-saturation relations, forces on the grains, or kinetics of transfers in mixed situations) a multi-scale numerical framework is introduced, enhancing the pore-network approach with the help of direct simulations. Small subsets of a granular system are extracted, in which multiphase scenario are solved using the Lattice-Boltzman method (LBM). In turns, a global problem is assembled and solved at the network scale, as illustrated by a simulated primary drainage.
Unsupervised Feature Learning With Winner-Takes-All Based STDP
Ferré, Paul; Mamalet, Franck; Thorpe, Simon J.
2018-01-01
We present a novel strategy for unsupervised feature learning in image applications inspired by the Spike-Timing-Dependent-Plasticity (STDP) biological learning rule. We show equivalence between rank order coding Leaky-Integrate-and-Fire neurons and ReLU artificial neurons when applied to non-temporal data. We apply this to images using rank-order coding, which allows us to perform a full network simulation with a single feed-forward pass using GPU hardware. Next we introduce a binary STDP learning rule compatible with training on batches of images. Two mechanisms to stabilize the training are also presented : a Winner-Takes-All (WTA) framework which selects the most relevant patches to learn from along the spatial dimensions, and a simple feature-wise normalization as homeostatic process. This learning process allows us to train multi-layer architectures of convolutional sparse features. We apply our method to extract features from the MNIST, ETH80, CIFAR-10, and STL-10 datasets and show that these features are relevant for classification. We finally compare these results with several other state of the art unsupervised learning methods. PMID:29674961
Grammar-based Automatic 3D Model Reconstruction from Terrestrial Laser Scanning Data
NASA Astrophysics Data System (ADS)
Yu, Q.; Helmholz, P.; Belton, D.; West, G.
2014-04-01
The automatic reconstruction of 3D buildings has been an important research topic during the last years. In this paper, a novel method is proposed to automatically reconstruct the 3D building models from segmented data based on pre-defined formal grammar and rules. Such segmented data can be extracted e.g. from terrestrial or mobile laser scanning devices. Two steps are considered in detail. The first step is to transform the segmented data into 3D shapes, for instance using the DXF (Drawing Exchange Format) format which is a CAD data file format used for data interchange between AutoCAD and other program. Second, we develop a formal grammar to describe the building model structure and integrate the pre-defined grammars into the reconstruction process. Depending on the different segmented data, the selected grammar and rules are applied to drive the reconstruction process in an automatic manner. Compared with other existing approaches, our proposed method allows the model reconstruction directly from 3D shapes and takes the whole building into account.
Ramírez-Durón, Rosalba; Ceniceros-Almaguer, Lucía; Salazar-Aranda, Ricardo; Salazar-Cavazos, Ma de la Luz; Waksman de Torres, Noemi
2007-01-01
In Mexico, plant-derived products with health claims are sold as herbal dietary supplements, and there are no rules for their legal quality control. Aesculus hippocastanum, Turnera diffusa, Matricaria recutita, Passiflora incarnata, and Tilia occidentalis are some of the major commercial products obtained from plants used in this region. In this paper, we describe the effectiveness of thin-layer chromatography methods to provide for the quality control of several commercial products containing these plants. Standardized extracts were used. Of the 49 commercial products analyzed, only 32.65% matched the chromatographic characteristic of standardized extracts. A significant number of commercial products did not match their label, indicating a problem resulting from the lack of regulation for these products. The proposed methods are simple, sensitive, and specific and can be used for routine quality control of raw herbals and formulations of the tested plants. The results obtained show the need to develop simple and reliable analytical methods that can be performed in any laboratory for the purpose of quality control of dietary supplements or commercial herbal products sold in Mexico.
NASA Astrophysics Data System (ADS)
Li, Jun; Song, Minghui; Peng, Yuanxi
2018-03-01
Current infrared and visible image fusion methods do not achieve adequate information extraction, i.e., they cannot extract the target information from infrared images while retaining the background information from visible images. Moreover, most of them have high complexity and are time-consuming. This paper proposes an efficient image fusion framework for infrared and visible images on the basis of robust principal component analysis (RPCA) and compressed sensing (CS). The novel framework consists of three phases. First, RPCA decomposition is applied to the infrared and visible images to obtain their sparse and low-rank components, which represent the salient features and background information of the images, respectively. Second, the sparse and low-rank coefficients are fused by different strategies. On the one hand, the measurements of the sparse coefficients are obtained by the random Gaussian matrix, and they are then fused by the standard deviation (SD) based fusion rule. Next, the fused sparse component is obtained by reconstructing the result of the fused measurement using the fast continuous linearized augmented Lagrangian algorithm (FCLALM). On the other hand, the low-rank coefficients are fused using the max-absolute rule. Subsequently, the fused image is superposed by the fused sparse and low-rank components. For comparison, several popular fusion algorithms are tested experimentally. By comparing the fused results subjectively and objectively, we find that the proposed framework can extract the infrared targets while retaining the background information in the visible images. Thus, it exhibits state-of-the-art performance in terms of both fusion effects and timeliness.
Compton Scattering and Photo-absorption Sum Rules on Nuclei
NASA Astrophysics Data System (ADS)
Gorshteyn, Mikhail; Hobbs, Timothy; Londergan, J. Timothy; Szczepaniak, Adam P.
2012-03-01
We revisit the photo-absorption sum rule for real Compton scattering from the proton and from nuclear targets. In analogy with the Thomas-Reiche-Kuhn sum rule appropriate at low energies, we propose a new ``constituent quark model'' sum rule that relates the integrated strength of hadronic resonances to the scattering amplitude on constituent quarks. We study the constituent quark model sum rule for several nuclear targets. In addition we extract the J=0 pole contribution for both proton and nuclei. Using the modern high energy proton data we find that the J=0 pole contribution differs significantly from the Thomson term, in contrast with the original findings by Damashek and Gilman. We discuss phenomenological implications of this new result.
Cooperative dynamics in auditory brain response
NASA Astrophysics Data System (ADS)
Kwapień, J.; DrożdŻ, S.; Liu, L. C.; Ioannides, A. A.
1998-11-01
Simultaneous estimates of activity in the left and right auditory cortex of five normal human subjects were extracted from multichannel magnetoencephalography recordings. Left, right, and binaural stimulations were used, in separate runs, for each subject. The resulting time series of left and right auditory cortex activity were analyzed using the concept of mutual information. The analysis constitutes an objective method to address the nature of interhemispheric correlations in response to auditory stimulations. The results provide clear evidence of the occurrence of such correlations mediated by a direct information transport, with clear laterality effects: as a rule, the contralateral hemisphere leads by 10-20 ms, as can be seen in the average signal. The strength of the interhemispheric coupling, which cannot be extracted from the average data, is found to be highly variable from subject to subject, but remarkably stable for each subject.
A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules
Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos
2015-01-01
Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods. PMID:25938136
A Bayesian Scoring Technique for Mining Predictive and Non-Spurious Rules.
Batal, Iyad; Cooper, Gregory; Hauskrecht, Milos
Rule mining is an important class of data mining methods for discovering interesting patterns in data. The success of a rule mining method heavily depends on the evaluation function that is used to assess the quality of the rules. In this work, we propose a new rule evaluation score - the Predictive and Non-Spurious Rules (PNSR) score. This score relies on Bayesian inference to evaluate the quality of the rules and considers the structure of the rules to filter out spurious rules. We present an efficient algorithm for finding rules with high PNSR scores. The experiments demonstrate that our method is able to cover and explain the data with a much smaller rule set than existing methods.
Friesen, Melissa C.; Wheeler, David C.; Vermeulen, Roel; Locke, Sarah J.; Zaebst, Dennis D.; Koutros, Stella; Pronk, Anjoeka; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Malats, Nuria; Schwenn, Molly; Johnson, Alison; Armenti, Karla R.; Rothman, Nathanial; Stewart, Patricia A.; Kogevinas, Manolis; Silverman, Debra T.
2016-01-01
Objectives: To efficiently and reproducibly assess occupational diesel exhaust exposure in a Spanish case-control study, we examined the utility of applying decision rules that had been extracted from expert estimates and questionnaire response patterns using classification tree (CT) models from a similar US study. Methods: First, previously extracted CT decision rules were used to obtain initial ordinal (0–3) estimates of the probability, intensity, and frequency of occupational exposure to diesel exhaust for the 10 182 jobs reported in a Spanish case-control study of bladder cancer. Second, two experts reviewed the CT estimates for 350 jobs randomly selected from strata based on each CT rule’s agreement with the expert ratings in the original study [agreement rate, from 0 (no agreement) to 1 (perfect agreement)]. Their agreement with each other and with the CT estimates was calculated using weighted kappa (κ w) and guided our choice of jobs for subsequent expert review. Third, an expert review comprised all jobs with lower confidence (low-to-moderate agreement rates or discordant assignments, n = 931) and a subset of jobs with a moderate to high CT probability rating and with moderately high agreement rates (n = 511). Logistic regression was used to examine the likelihood that an expert provided a different estimate than the CT estimate based on the CT rule agreement rates, the CT ordinal rating, and the availability of a module with diesel-related questions. Results: Agreement between estimates made by two experts and between estimates made by each of the experts and the CT estimates was very high for jobs with estimates that were determined by rules with high CT agreement rates (κ w: 0.81–0.90). For jobs with estimates based on rules with lower agreement rates, moderate agreement was observed between the two experts (κ w: 0.42–0.67) and poor-to-moderate agreement was observed between the experts and the CT estimates (κ w: 0.09–0.57). In total, the expert review of 1442 jobs changed 156 probability estimates, 128 intensity estimates, and 614 frequency estimates. The expert was more likely to provide a different estimate when the CT rule agreement rate was <0.8, when the CT ordinal ratings were low to moderate, or when a module with diesel questions was available. Conclusions: Our reliability assessment provided important insight into where to prioritize additional expert review; as a result, only 14% of the jobs underwent expert review, substantially reducing the exposure assessment burden. Overall, we found that we could efficiently, reproducibly, and reliably apply CT decision rules from one study to assess exposure in another study. PMID:26732820
Heart sounds analysis using probability assessment.
Plesinger, F; Viscor, I; Halamek, J; Jurco, J; Jurak, P
2017-07-31
This paper describes a method for automated discrimination of heart sounds recordings according to the Physionet Challenge 2016. The goal was to decide if the recording refers to normal or abnormal heart sounds or if it is not possible to decide (i.e. 'unsure' recordings). Heart sounds S1 and S2 are detected using amplitude envelopes in the band 15-90 Hz. The averaged shape of the S1/S2 pair is computed from amplitude envelopes in five different bands (15-90 Hz; 55-150 Hz; 100-250 Hz; 200-450 Hz; 400-800 Hz). A total of 53 features are extracted from the data. The largest group of features is extracted from the statistical properties of the averaged shapes; other features are extracted from the symmetry of averaged shapes, and the last group of features is independent of S1 and S2 detection. Generated features are processed using logical rules and probability assessment, a prototype of a new machine-learning method. The method was trained using 3155 records and tested on 1277 hidden records. It resulted in a training score of 0.903 (sensitivity 0.869, specificity 0.937) and a testing score of 0.841 (sensitivity 0.770, specificity 0.913). The revised method led to a test score of 0.853 in the follow-up phase of the challenge. The presented solution achieved 7th place out of 48 competing entries in the Physionet Challenge 2016 (official phase). In addition, the PROBAfind software for probability assessment was introduced.
FIR: An Effective Scheme for Extracting Useful Metadata from Social Media.
Chen, Long-Sheng; Lin, Zue-Cheng; Chang, Jing-Rong
2015-11-01
Recently, the use of social media for health information exchange is expanding among patients, physicians, and other health care professionals. In medical areas, social media allows non-experts to access, interpret, and generate medical information for their own care and the care of others. Researchers paid much attention on social media in medical educations, patient-pharmacist communications, adverse drug reactions detection, impacts of social media on medicine and healthcare, and so on. However, relatively few papers discuss how to extract useful knowledge from a huge amount of textual comments in social media effectively. Therefore, this study aims to propose a Fuzzy adaptive resonance theory network based Information Retrieval (FIR) scheme by combining Fuzzy adaptive resonance theory (ART) network, Latent Semantic Indexing (LSI), and association rules (AR) discovery to extract knowledge from social media. In our FIR scheme, Fuzzy ART network firstly has been employed to segment comments. Next, for each customer segment, we use LSI technique to retrieve important keywords. Then, in order to make the extracted keywords understandable, association rules mining is presented to organize these extracted keywords to build metadata. These extracted useful voices of customers will be transformed into design needs by using Quality Function Deployment (QFD) for further decision making. Unlike conventional information retrieval techniques which acquire too many keywords to get key points, our FIR scheme can extract understandable metadata from social media.
Railway Online Booking System Design and Implementation
NASA Astrophysics Data System (ADS)
Zongjiang, Wang
In this paper, we define rule usefulness and introduce one approach to evaluate the rule usefulness in rough sets. And we raise one method to get most useful rules. This method is easy and effective in applications of prisoners' reform. Comparing with the method to get most interesting rules, ours is direct and objective. Rule interestingness must consider the predefined knowledge on what kind of information is interesting. Our method greatly reduces the rule numbers generated and provides a measure of rule usefulness at the same time.
RuleMonkey: software for stochastic simulation of rule-based models
2010-01-01
Background The system-level dynamics of many molecular interactions, particularly protein-protein interactions, can be conveniently represented using reaction rules, which can be specified using model-specification languages, such as the BioNetGen language (BNGL). A set of rules implicitly defines a (bio)chemical reaction network. The reaction network implied by a set of rules is often very large, and as a result, generation of the network implied by rules tends to be computationally expensive. Moreover, the cost of many commonly used methods for simulating network dynamics is a function of network size. Together these factors have limited application of the rule-based modeling approach. Recently, several methods for simulating rule-based models have been developed that avoid the expensive step of network generation. The cost of these "network-free" simulation methods is independent of the number of reactions implied by rules. Software implementing such methods is now needed for the simulation and analysis of rule-based models of biochemical systems. Results Here, we present a software tool called RuleMonkey, which implements a network-free method for simulation of rule-based models that is similar to Gillespie's method. The method is suitable for rule-based models that can be encoded in BNGL, including models with rules that have global application conditions, such as rules for intramolecular association reactions. In addition, the method is rejection free, unlike other network-free methods that introduce null events, i.e., steps in the simulation procedure that do not change the state of the reaction system being simulated. We verify that RuleMonkey produces correct simulation results, and we compare its performance against DYNSTOC, another BNGL-compliant tool for network-free simulation of rule-based models. We also compare RuleMonkey against problem-specific codes implementing network-free simulation methods. Conclusions RuleMonkey enables the simulation of rule-based models for which the underlying reaction networks are large. It is typically faster than DYNSTOC for benchmark problems that we have examined. RuleMonkey is freely available as a stand-alone application http://public.tgen.org/rulemonkey. It is also available as a simulation engine within GetBonNie, a web-based environment for building, analyzing and sharing rule-based models. PMID:20673321
The 1% Rule in Four Digital Health Social Networks: An Observational Study
2014-01-01
Background In recent years, cyberculture has informally reported a phenomenon named the 1% rule, or 90-9-1 principle, which seeks to explain participatory patterns and network effects within Internet communities. The rule states that 90% of actors observe and do not participate, 9% contribute sparingly, and 1% of actors create the vast majority of new content. This 90%, 9%, and 1% are also known as Lurkers, Contributors, and Superusers, respectively. To date, very little empirical research has been conducted to verify the 1% rule. Objective The 1% rule is widely accepted in digital marketing. Our goal was to determine if the 1% rule applies to moderated Digital Health Social Networks (DHSNs) designed to facilitate behavior change. Methods To help gain insight into participatory patterns, descriptive data were extracted from four long-standing DHSNs: the AlcoholHelpCenter, DepressionCenter, PanicCenter, and StopSmokingCenter sites. Results During the study period, 63,990 actors created 578,349 posts. Less than 25% of actors made one or more posts. The applicability of the 1% rule was confirmed as Lurkers, Contributors, and Superusers accounted for a weighted average of 1.3% (n=4668), 24.0% (n=88,732), and 74.7% (n=276,034) of content. Conclusions The 1% rule was consistent across the four DHSNs. As social network sustainability requires fresh content and timely interactions, these results are important for organizations actively promoting and managing Internet communities. Superusers generate the vast majority of traffic and create value, so their recruitment and retention is imperative for long-term success. Although Lurkers may benefit from observing interactions between Superusers and Contributors, they generate limited or no network value. The results of this study indicate that DHSNs may be optimized to produce network effects, positive externalities, and bandwagon effects. Further research in the development and expansion of DHSNs is required. PMID:24496109
NASA Astrophysics Data System (ADS)
Serrano, Rafael; González, Luis Carlos; Martín, Francisco Jesús
2009-11-01
Under the project SENSOR-IA which has had financial funding from the Order of Incentives to the Regional Technology Centers of the Counsil of Innovation, Science and Enterprise of Andalusia, an architecture for the optimization of a machining process in real time through rule-based expert system has been developed. The architecture consists of an acquisition system and sensor data processing engine (SATD) from an expert system (SE) rule-based which communicates with the SATD. The SE has been designed as an inference engine with an algorithm for effective action, using a modus ponens rule model of goal-oriented rules.The pilot test demonstrated that it is possible to govern in real time the machining process based on rules contained in a SE. The tests have been done with approximated rules. Future work includes an exhaustive collection of data with different tool materials and geometries in a database to extract more precise rules.
RANWAR: rank-based weighted association rule mining from gene expression and methylation data.
Mallik, Saurav; Mukhopadhyay, Anirban; Maulik, Ujjwal
2015-01-01
Ranking of association rules is currently an interesting topic in data mining and bioinformatics. The huge number of evolved rules of items (or, genes) by association rule mining (ARM) algorithms makes confusion to the decision maker. In this article, we propose a weighted rule-mining technique (say, RANWAR or rank-based weighted association rule-mining) to rank the rules using two novel rule-interestingness measures, viz., rank-based weighted condensed support (wcs) and weighted condensed confidence (wcc) measures to bypass the problem. These measures are basically depended on the rank of items (genes). Using the rank, we assign weight to each item. RANWAR generates much less number of frequent itemsets than the state-of-the-art association rule mining algorithms. Thus, it saves time of execution of the algorithm. We run RANWAR on gene expression and methylation datasets. The genes of the top rules are biologically validated by Gene Ontologies (GOs) and KEGG pathway analyses. Many top ranked rules extracted from RANWAR that hold poor ranks in traditional Apriori, are highly biologically significant to the related diseases. Finally, the top rules evolved from RANWAR, that are not in Apriori, are reported.
Microbial genotype-phenotype mapping by class association rule mining.
Tamura, Makio; D'haeseleer, Patrik
2008-07-01
Microbial phenotypes are typically due to the concerted action of multiple gene functions, yet the presence of each gene may have only a weak correlation with the observed phenotype. Hence, it may be more appropriate to examine co-occurrence between sets of genes and a phenotype (multiple-to-one) instead of pairwise relations between a single gene and the phenotype. Here, we propose an efficient class association rule mining algorithm, netCAR, in order to extract sets of COGs (clusters of orthologous groups of proteins) associated with a phenotype from COG phylogenetic profiles and a phenotype profile. netCAR takes into account the phylogenetic co-occurrence graph between COGs to restrict hypothesis space, and uses mutual information to evaluate the biconditional relation. We examined the mining capability of pairwise and multiple-to-one association by using netCAR to extract COGs relevant to six microbial phenotypes (aerobic, anaerobic, facultative, endospore, motility and Gram negative) from 11,969 unique COG profiles across 155 prokaryotic organisms. With the same level of false discovery rate, multiple-to-one association can extract about 10 times more relevant COGs than one-to-one association. We also reveal various topologies of association networks among COGs (modules) from extracted multiple-to-one correlation rules relevant with the six phenotypes; including a well-connected network for motility, a star-shaped network for aerobic and intermediate topologies for the other phenotypes. netCAR outperforms a standard CAR mining algorithm, CARapriori, while requiring several orders of magnitude less computational time for extracting 3-COG sets. Source code of the Java implementation is available as Supplementary Material at the Bioinformatics online website, or upon request to the author. Supplementary data are available at Bioinformatics online.
Strehl-constrained reconstruction of post-adaptive optics data and the Software Package AIRY, v. 6.1
NASA Astrophysics Data System (ADS)
Carbillet, Marcel; La Camera, Andrea; Deguignet, Jérémy; Prato, Marco; Bertero, Mario; Aristidi, Éric; Boccacci, Patrizia
2014-08-01
We first briefly present the last version of the Software Package AIRY, version 6.1, a CAOS-based tool which includes various deconvolution methods, accelerations, regularizations, super-resolution, boundary effects reduction, point-spread function extraction/extrapolation, stopping rules, and constraints in the case of iterative blind deconvolution (IBD). Then, we focus on a new formulation of our Strehl-constrained IBD, here quantitatively compared to the original formulation for simulated near-infrared data of an 8-m class telescope equipped with adaptive optics (AO), showing their equivalence. Next, we extend the application of the original method to the visible domain with simulated data of an AO-equipped 1.5-m telescope, testing also the robustness of the method with respect to the Strehl ratio estimation.
Training Extract, AFSC 113X0B, Flight Engineer, Helicopter Qualified.
1982-12-01
TRAINING .................................................................. 1I0 SE. GENERAL rLIGHT RULES A 9 C t | 7. PeRFORM INSPECIIOIS 12 7A... TRAINING .................................................................. lIC 6E. GENERAL FLIGHT RULES A a C...ME"BERS PERFORMING I-A I PROGRAM GENERATED VECTOR IMEMBERS/ NO TYPE VECTOR MEAN - SC DESCRIPTION I TN. SEP J.6 2.J TRAINING EMPHASIS RATINSS I IIOS
Advances in QCD sum-rule calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Melikhov, Dmitri
2016-01-22
We review the recent progress in the applications of QCD sum rules to hadron properties with the emphasis on the following selected problems: (i) development of new algorithms for the extraction of ground-state parameters from two-point correlators; (ii) form factors at large momentum transfers from three-point vacuum correlation functions: (iii) properties of exotic tetraquark hadrons from correlation functions of four-quark currents.
Intelligent Diagnostic Assistant for Complicated Skin Diseases through C5's Algorithm.
Jeddi, Fatemeh Rangraz; Arabfard, Masoud; Kermany, Zahra Arab
2017-09-01
Intelligent Diagnostic Assistant can be used for complicated diagnosis of skin diseases, which are among the most common causes of disability. The aim of this study was to design and implement a computerized intelligent diagnostic assistant for complicated skin diseases through C5's Algorithm. An applied-developmental study was done in 2015. Knowledge base was developed based on interviews with dermatologists through questionnaires and checklists. Knowledge representation was obtained from the train data in the database using Excel Microsoft Office. Clementine Software and C5's Algorithms were applied to draw the decision tree. Analysis of test accuracy was performed based on rules extracted using inference chains. The rules extracted from the decision tree were entered into the CLIPS programming environment and the intelligent diagnostic assistant was designed then. The rules were defined using forward chaining inference technique and were entered into Clips programming environment as RULE. The accuracy and error rates obtained in the training phase from the decision tree were 99.56% and 0.44%, respectively. The accuracy of the decision tree was 98% and the error was 2% in the test phase. Intelligent diagnostic assistant can be used as a reliable system with high accuracy, sensitivity, specificity, and agreement.
Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling.
Tsipouras, Markos G; Exarchos, Themis P; Fotiadis, Dimitrios I; Kotsia, Anna P; Vakalis, Konstantinos V; Naka, Katerina K; Michalis, Lampros K
2008-07-01
A fuzzy rule-based decision support system (DSS) is presented for the diagnosis of coronary artery disease (CAD). The system is automatically generated from an initial annotated dataset, using a four stage methodology: 1) induction of a decision tree from the data; 2) extraction of a set of rules from the decision tree, in disjunctive normal form and formulation of a crisp model; 3) transformation of the crisp set of rules into a fuzzy model; and 4) optimization of the parameters of the fuzzy model. The dataset used for the DSS generation and evaluation consists of 199 subjects, each one characterized by 19 features, including demographic and history data, as well as laboratory examinations. Tenfold cross validation is employed, and the average sensitivity and specificity obtained is 62% and 54%, respectively, using the set of rules extracted from the decision tree (first and second stages), while the average sensitivity and specificity increase to 80% and 65%, respectively, when the fuzzification and optimization stages are used. The system offers several advantages since it is automatically generated, it provides CAD diagnosis based on easily and noninvasively acquired features, and is able to provide interpretation for the decisions made.
2014-01-01
Background Providing scalable clinical decision support (CDS) across institutions that use different electronic health record (EHR) systems has been a challenge for medical informatics researchers. The lack of commonly shared EHR models and terminology bindings has been recognised as a major barrier to sharing CDS content among different organisations. The openEHR Guideline Definition Language (GDL) expresses CDS content based on openEHR archetypes and can support any clinical terminologies or natural languages. Our aim was to explore in an experimental setting the practicability of GDL and its underlying archetype formalism. A further aim was to report on the artefacts produced by this new technological approach in this particular experiment. We modelled and automatically executed compliance checking rules from clinical practice guidelines for acute stroke care. Methods We extracted rules from the European clinical practice guidelines as well as from treatment contraindications for acute stroke care and represented them using GDL. Then we executed the rules retrospectively on 49 mock patient cases to check the cases’ compliance with the guidelines, and manually validated the execution results. We used openEHR archetypes, GDL rules, the openEHR reference information model, reference terminologies and the Data Archetype Definition Language. We utilised the open-sourced GDL Editor for authoring GDL rules, the international archetype repository for reusing archetypes, the open-sourced Ocean Archetype Editor for authoring or modifying archetypes and the CDS Workbench for executing GDL rules on patient data. Results We successfully represented clinical rules about 14 out of 19 contraindications for thrombolysis and other aspects of acute stroke care with 80 GDL rules. These rules are based on 14 reused international archetypes (one of which was modified), 2 newly created archetypes and 51 terminology bindings (to three terminologies). Our manual compliance checks for 49 mock patients were a complete match versus the automated compliance results. Conclusions Shareable guideline knowledge for use in automated retrospective checking of guideline compliance may be achievable using GDL. Whether the same GDL rules can be used for at-the-point-of-care CDS remains unknown. PMID:24886468
Skin tumor area extraction using an improved dynamic programming approach.
Abbas, Qaisar; Celebi, M E; Fondón García, Irene
2012-05-01
Border (B) description of melanoma and other pigmented skin lesions is one of the most important tasks for the clinical diagnosis of dermoscopy images using the ABCD rule. For an accurate description of the border, there must be an effective skin tumor area extraction (STAE) method. However, this task is complicated due to uneven illumination, artifacts present in the lesions and smooth areas or fuzzy borders of the desired regions. In this paper, a novel STAE algorithm based on improved dynamic programming (IDP) is presented. The STAE technique consists of the following four steps: color space transform, pre-processing, rough tumor area detection and refinement of the segmented area. The procedure is performed in the CIE L(*) a(*) b(*) color space, which is approximately uniform and is therefore related to dermatologist's perception. After pre-processing the skin lesions to reduce artifacts, the DP algorithm is improved by introducing a local cost function, which is based on color and texture weights. The STAE method is tested on a total of 100 dermoscopic images. In order to compare the performance of STAE with other state-of-the-art algorithms, various statistical measures based on dermatologist-drawn borders are utilized as a ground truth. The proposed method outperforms the others with a sensitivity of 96.64%, a specificity of 98.14% and an error probability of 5.23%. The results demonstrate that this STAE method by IDP is an effective solution when compared with other state-of-the-art segmentation techniques. The proposed method can accurately extract tumor borders in dermoscopy images. © 2011 John Wiley & Sons A/S.
Exploration of SWRL Rule Bases through Visualization, Paraphrasing, and Categorization of Rules
NASA Astrophysics Data System (ADS)
Hassanpour, Saeed; O'Connor, Martin J.; Das, Amar K.
Rule bases are increasingly being used as repositories of knowledge content on the Semantic Web. As the size and complexity of these rule bases increases, developers and end users need methods of rule abstraction to facilitate rule management. In this paper, we describe a rule abstraction method for Semantic Web Rule Language (SWRL) rules that is based on lexical analysis and a set of heuristics. Our method results in a tree data structure that we exploit in creating techniques to visualize, paraphrase, and categorize SWRL rules. We evaluate our approach by applying it to several biomedical ontologies that contain SWRL rules, and show how the results reveal rule patterns within the rule base. We have implemented our method as a plug-in tool for Protégé-OWL, the most widely used ontology modeling software for the Semantic Web. Our tool can allow users to rapidly explore content and patterns in SWRL rule bases, enabling their acquisition and management.
Sieve-based relation extraction of gene regulatory networks from biological literature
2015-01-01
Background Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. Results We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Conclusions Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains. PMID:26551454
Sieve-based relation extraction of gene regulatory networks from biological literature.
Žitnik, Slavko; Žitnik, Marinka; Zupan, Blaž; Bajec, Marko
2015-01-01
Relation extraction is an essential procedure in literature mining. It focuses on extracting semantic relations between parts of text, called mentions. Biomedical literature includes an enormous amount of textual descriptions of biological entities, their interactions and results of related experiments. To extract them in an explicit, computer readable format, these relations were at first extracted manually from databases. Manual curation was later replaced with automatic or semi-automatic tools with natural language processing capabilities. The current challenge is the development of information extraction procedures that can directly infer more complex relational structures, such as gene regulatory networks. We develop a computational approach for extraction of gene regulatory networks from textual data. Our method is designed as a sieve-based system and uses linear-chain conditional random fields and rules for relation extraction. With this method we successfully extracted the sporulation gene regulation network in the bacterium Bacillus subtilis for the information extraction challenge at the BioNLP 2013 conference. To enable extraction of distant relations using first-order models, we transform the data into skip-mention sequences. We infer multiple models, each of which is able to extract different relationship types. Following the shared task, we conducted additional analysis using different system settings that resulted in reducing the reconstruction error of bacterial sporulation network from 0.73 to 0.68, measured as the slot error rate between the predicted and the reference network. We observe that all relation extraction sieves contribute to the predictive performance of the proposed approach. Also, features constructed by considering mention words and their prefixes and suffixes are the most important features for higher accuracy of extraction. Analysis of distances between different mention types in the text shows that our choice of transforming data into skip-mention sequences is appropriate for detecting relations between distant mentions. Linear-chain conditional random fields, along with appropriate data transformations, can be efficiently used to extract relations. The sieve-based architecture simplifies the system as new sieves can be easily added or removed and each sieve can utilize the results of previous ones. Furthermore, sieves with conditional random fields can be trained on arbitrary text data and hence are applicable to broad range of relation extraction tasks and data domains.
Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text.
Hao, Tianyong; Liu, Hongfang; Weng, Chunhua
2016-05-17
To develop an automated method for extracting and structuring numeric lab test comparison statements from text and evaluate the method using clinical trial eligibility criteria text. Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and domain knowledge acquired from the Internet, Valx takes seven steps to extract and normalize numeric lab test expressions: 1) text preprocessing, 2) numeric, unit, and comparison operator extraction, 3) variable identification using hybrid knowledge, 4) variable - numeric association, 5) context-based association filtering, 6) measurement unit normalization, and 7) heuristic rule-based comparison statements verification. Our reference standard was the consensus-based annotation among three raters for all comparison statements for two variables, i.e., HbA1c and glucose, identified from all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov. The precision, recall, and F-measure for structuring HbA1c comparison statements were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type 2 diabetes trials, respectively. The precision, recall, and F-measure for structuring glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials, and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively. Valx is effective at extracting and structuring free-text lab test comparison statements in clinical trial summaries. Future studies are warranted to test its generalizability beyond eligibility criteria text. The open-source Valx enables its further evaluation and continued improvement among the collaborative scientific community.
Intelligent System Development Using a Rough Sets Methodology
NASA Technical Reports Server (NTRS)
Anderson, Gray T.; Shelton, Robert O.
1997-01-01
The purpose of this research was to examine the potential of the rough sets technique for developing intelligent models of complex systems from limited information. Rough sets a simple but promising technology to extract easily understood rules from data. The rough set methodology has been shown to perform well when used with a large set of exemplars, but its performance with sparse data sets is less certain. The difficulty is that rules will be developed based on just a few examples, each of which might have a large amount of noise associated with them. The question then becomes, what is the probability of a useful rule being developed from such limited information? One nice feature of rough sets is that in unusual situations, the technique can give an answer of 'I don't know'. That is, if a case arises that is different from the cases the rough set rules were developed on, the methodology can recognize this and alert human operators of it. It can also be trained to do this when the desired action is unknown because conflicting examples apply to the same set of inputs. This summer's project was to look at combining rough set theory with statistical theory to develop confidence limits in rules developed by rough sets. Often it is important not to make a certain type of mistake (e.g., false positives or false negatives), so the rules must be biased toward preventing a catastrophic error, rather than giving the most likely course of action. A method to determine the best course of action in the light of such constraints was examined. The resulting technique was tested with files containing electrical power line 'signatures' from the space shuttle and with decompression sickness data.
Corbett, David M.; Sweeting, Alice J.; Robertson, Sam
2017-01-01
Australian Rules football comprises physical and skilled performance for more than 90 min of play. The cognitive and physiological fatigue experienced by participants during a match may reduce performance. Consequently, the length of time an athlete is on the field before being interchanged (known as a stint), is a key tactic which could maximize the skill and physical output of the Australian Rules athlete. This study developed two methods to quantify the relationship between athlete time on field, skilled and physical output. Professional male athletes (n = 39) from a single elite Australian Rules football club participated, with physical output quantified via player tracking systems across 22 competitive matches. Skilled output was calculated as the sum of involvements performed by each athlete, collected from a commercial statistics company. A random intercept and slope model was built to identify how a team and individuals respond to physical outputs and stint lengths. Stint duration (mins), high intensity running (speeds >14.4 km · hr−1) per minute, meterage per minute and very high intensity running (speeds >25 km·hr−1) per minute had some relationship with skilled involvements. However, none of these relationships were strong, and the direction of influence for each player was varied. Three conditional inference trees were computed to identify the extent to which combinations of physical parameters altered the anticipated skilled output of players. Meterage per minute, player, round number and duration were all related to player involvement. All methods had an average error of 10 to 11 involvements, per player per match. Therefore, other factors aside from physical parameters extracted from wearable technologies may be needed to explain skilled output within Australian Rules football matches. PMID:29109688
Diagnosing malignant melanoma in ambulatory care: a systematic review of clinical prediction rules
Harrington, Emma; Clyne, Barbara; Wesseling, Nieneke; Sandhu, Harkiran; Armstrong, Laura; Bennett, Holly; Fahey, Tom
2017-01-01
Objectives Malignant melanoma has high morbidity and mortality rates. Early diagnosis improves prognosis. Clinical prediction rules (CPRs) can be used to stratify patients with symptoms of suspected malignant melanoma to improve early diagnosis. We conducted a systematic review of CPRs for melanoma diagnosis in ambulatory care. Design Systematic review. Data sources A comprehensive search of PubMed, EMBASE, PROSPERO, CINAHL, the Cochrane Library and SCOPUS was conducted in May 2015, using combinations of keywords and medical subject headings (MeSH) terms. Study selection and data extraction Studies deriving and validating, validating or assessing the impact of a CPR for predicting melanoma diagnosis in ambulatory care were included. Data extraction and methodological quality assessment were guided by the CHARMS checklist. Results From 16 334 studies reviewed, 51 were included, validating the performance of 24 unique CPRs. Three impact analysis studies were identified. Five studies were set in primary care. The most commonly evaluated CPRs were the ABCD, more than one or uneven distribution of Colour, or a large (greater than 6 mm) Diameter (ABCD) dermoscopy rule (at a cut-point of >4.75; 8 studies; pooled sensitivity 0.85, 95% CI 0.73 to 0.93, specificity 0.72, 95% CI 0.65 to 0.78) and the 7-point dermoscopy checklist (at a cut-point of ≥1 recommending ruling in melanoma; 11 studies; pooled sensitivity 0.77, 95% CI 0.61 to 0.88, specificity 0.80, 95% CI 0.59 to 0.92). The methodological quality of studies varied. Conclusions At their recommended cut-points, the ABCD dermoscopy rule is more useful for ruling out melanoma than the 7-point dermoscopy checklist. A focus on impact analysis will help translate melanoma risk prediction rules into useful tools for clinical practice. PMID:28264830
Spatio-Temporal Pattern Mining on Trajectory Data Using Arm
NASA Astrophysics Data System (ADS)
Khoshahval, S.; Farnaghi, M.; Taleai, M.
2017-09-01
Preliminary mobile was considered to be a device to make human connections easier. But today the consumption of this device has been evolved to a platform for gaming, web surfing and GPS-enabled application capabilities. Embedding GPS in handheld devices, altered them to significant trajectory data gathering facilities. Raw GPS trajectory data is a series of points which contains hidden information. For revealing hidden information in traces, trajectory data analysis is needed. One of the most beneficial concealed information in trajectory data is user activity patterns. In each pattern, there are multiple stops and moves which identifies users visited places and tasks. This paper proposes an approach to discover user daily activity patterns from GPS trajectories using association rules. Finding user patterns needs extraction of user's visited places from stops and moves of GPS trajectories. In order to locate stops and moves, we have implemented a place recognition algorithm. After extraction of visited points an advanced association rule mining algorithm, called Apriori was used to extract user activity patterns. This study outlined that there are useful patterns in each trajectory that can be emerged from raw GPS data using association rule mining techniques in order to find out about multiple users' behaviour in a system and can be utilized in various location-based applications.
Automated anatomical labeling method for abdominal arteries extracted from 3D abdominal CT images
NASA Astrophysics Data System (ADS)
Oda, Masahiro; Hoang, Bui Huy; Kitasaka, Takayuki; Misawa, Kazunari; Fujiwara, Michitaka; Mori, Kensaku
2012-02-01
This paper presents an automated anatomical labeling method of abdominal arteries. In abdominal surgery, understanding of blood vessel structure concerning with a target organ is very important. Branching pattern of blood vessels differs among individuals. It is required to develop a system that can assist understanding of a blood vessel structure and anatomical names of blood vessels of a patient. Previous anatomical labbeling methods for abdominal arteries deal with either of the upper or lower abdominal arteries. In this paper, we present an automated anatomical labeling method of both of the upper and lower abdominal arteries extracted from CT images. We obtain a tree structure of artery regions and calculate feature values for each branch. These feature values include the diameter, curvature, direction, and running vectors of a branch. Target arteries of this method are grouped based on branching conditions. The following processes are separately applied for each group. We compute candidate artery names by using classifiers that are trained to output artery names. A correction process of the candidate anatomical names based on the rule of majority is applied to determine final names. We applied the proposed method to 23 cases of 3D abdominal CT images. Experimental results showed that the proposed method is able to perform nomenclature of entire major abdominal arteries. The recall and the precision rates of labeling are 79.01% and 80.41%, respectively.
NASA Astrophysics Data System (ADS)
Jiao, Q. S.; Luo, Y.; Shen, W. H.; Li, Q.; Wang, X.
2018-04-01
Jiuzhaigou earthquake led to the collapse of the mountains and formed lots of landslides in Jiuzhaigou scenic spot and surrounding roads which caused road blockage and serious ecological damage. Due to the urgency of the rescue, the authors carried unmanned aerial vehicle (UAV) and entered the disaster area as early as August 9 to obtain the aerial images near the epicenter. On the basis of summarizing the earthquake landslides characteristics in aerial images, by using the object-oriented analysis method, landslides image objects were obtained by multi-scale segmentation, and the feature rule set of each level was automatically built by SEaTH (Separability and Thresholds) algorithm to realize the rapid landslide extraction. Compared with visual interpretation, object-oriented automatic landslides extraction method achieved an accuracy of 94.3 %. The spatial distribution of the earthquake landslide had a significant positive correlation with slope and relief and had a negative correlation with the roughness, but no obvious correlation with the aspect. The relationship between the landslide and the aspect was not found and the probable reason may be that the distance between the study area and the seismogenic fault was too far away. This work provided technical support for the earthquake field emergency, earthquake landslide prediction and disaster loss assessment.
The expert explorer: a tool for hospital data visualization and adverse drug event rules validation.
Băceanu, Adrian; Atasiei, Ionuţ; Chazard, Emmanuel; Leroy, Nicolas
2009-01-01
An important part of adverse drug events (ADEs) detection is the validation of the clinical cases and the assessment of the decision rules to detect ADEs. For that purpose, a software called "Expert Explorer" has been designed by Ideea Advertising. Anonymized datasets have been extracted from hospitals into a common repository. The tool has 3 main features. (1) It can display hospital stays in a visual and comprehensive way (diagnoses, drugs, lab results, etc.) using tables and pretty charts. (2) It allows designing and executing dashboards in order to generate knowledge about ADEs. (3) It finally allows uploading decision rules obtained from data mining. Experts can then review the rules, the hospital stays that match the rules, and finally give their advice thanks to specialized forms. Then the rules can be validated, invalidated, or improved (knowledge elicitation phase).
Connecting clinical and actuarial prediction with rule-based methods.
Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H
2015-06-01
Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).
NASA Astrophysics Data System (ADS)
Hashemi, H.; Tax, D. M. J.; Duin, R. P. W.; Javaherian, A.; de Groot, P.
2008-11-01
Seismic object detection is a relatively new field in which 3-D bodies are visualized and spatial relationships between objects of different origins are studied in order to extract geologic information. In this paper, we propose a method for finding an optimal classifier with the help of a statistical feature ranking technique and combining different classifiers. The method, which has general applicability, is demonstrated here on a gas chimney detection problem. First, we evaluate a set of input seismic attributes extracted at locations labeled by a human expert using regularized discriminant analysis (RDA). In order to find the RDA score for each seismic attribute, forward and backward search strategies are used. Subsequently, two non-linear classifiers: multilayer perceptron (MLP) and support vector classifier (SVC) are run on the ranked seismic attributes. Finally, to capitalize on the intrinsic differences between both classifiers, the MLP and SVC results are combined using logical rules of maximum, minimum and mean. The proposed method optimizes the ranked feature space size and yields the lowest classification error in the final combined result. We will show that the logical minimum reveals gas chimneys that exhibit both the softness of MLP and the resolution of SVC classifiers.
Dieltjes, Patrick; Mieremet, René; Zuniga, Sofia; Kraaijenbrink, Thirsa; Pijpe, Jeroen; de Knijff, Peter
2011-07-01
Exploring technological limits is a common practice in forensic DNA research. Reliable genetic profiling based on only a few cells isolated from trace material retrieved from a crime scene is nowadays more and more the rule rather than the exception. On many crime scenes, cartridges, bullets, and casings (jointly abbreviated as CBCs) are regularly found, and even after firing, these potentially carry trace amounts of biological material. Since 2003, the Forensic Laboratory for DNA Research is routinely involved in the forensic investigation of CBCs in the Netherlands. Reliable DNA profiles were frequently obtained from CBCs and used to match suspects, victims, or other crime scene-related DNA traces. In this paper, we describe the sensitive method developed by us to extract DNA from CBCs. Using PCR-based genotyping of autosomal short tandem repeats, we were able to obtain reliable and reproducible DNA profiles in 163 out of 616 criminal cases (26.5%) and in 283 out of 4,085 individual CBC items (6.9%) during the period January 2003-December 2009. We discuss practical aspects of the method and the sometimes unexpected effects of using cell lysis buffer on the subsequent investigation of striation patterns on CBCs.
Palmprint authentication using multiple classifiers
NASA Astrophysics Data System (ADS)
Kumar, Ajay; Zhang, David
2004-08-01
This paper investigates the performance improvement for palmprint authentication using multiple classifiers. The proposed methods on personal authentication using palmprints can be divided into three categories; appearance- , line -, and texture-based. A combination of these approaches can be used to achieve higher performance. We propose to simultaneously extract palmprint features from PCA, Line detectors and Gabor-filters and combine their corresponding matching scores. This paper also investigates the comparative performance of simple combination rules and the hybrid fusion strategy to achieve performance improvement. Our experimental results on the database of 100 users demonstrate the usefulness of such approach over those based on individual classifiers.
Graph transformation method for calculating waiting times in Markov chains.
Trygubenko, Semen A; Wales, David J
2006-06-21
We describe an exact approach for calculating transition probabilities and waiting times in finite-state discrete-time Markov processes. All the states and the rules for transitions between them must be known in advance. We can then calculate averages over a given ensemble of paths for both additive and multiplicative properties in a nonstochastic and noniterative fashion. In particular, we can calculate the mean first-passage time between arbitrary groups of stationary points for discrete path sampling databases, and hence extract phenomenological rate constants. We present a number of examples to demonstrate the efficiency and robustness of this approach.
Automatic indexing of compound words based on mutual information for Korean text retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pan Koo Kim; Yoo Kun Cho
In this paper, we present an automatic indexing technique for compound words suitable to an aggulutinative language, specifically Korean. Firstly, we present the construction conditions to compose compound words as indexing terms. Also we present the decomposition rules applicable to consecutive nouns to extract all contents of text. Finally we propose a measure to estimate the usefulness of a term, mutual information, to calculate the degree of word association of compound words, based on the information theoretic notion. By applying this method, our system has raised the precision rate of compound words from 72% to 87%.
Robertson, Sam; Gupta, Ritu; McIntosh, Sam
2016-10-01
This study developed a method to determine whether the distribution of individual player performances can be modelled to explain match outcome in team sports, using Australian Rules football as an example. Player-recorded values (converted to a percentage of team total) in 11 commonly reported performance indicators were obtained for all regular season matches played during the 2014 Australian Football League season, with team totals also recorded. Multiple features relating to heuristically determined percentiles for each performance indicator were then extracted for each team and match, along with the outcome (win/loss). A generalised estimating equation model comprising eight key features was developed, explaining match outcome at a median accuracy of 63.9% under 10-fold cross-validation. Lower 75th, 90th and 95th percentile values for team goals and higher 25th and 50th percentile values for disposals were linked with winning. Lower 95th and higher 25th percentile values for Inside 50s and Marks, respectively, were also important contributors. These results provide evidence supporting team strategies which aim to obtain an even spread of goal scorers in Australian Rules football. The method developed in this investigation could be used to quantify the importance of individual contributions to overall team performance in team sports.
Gill, P; Bleka, Ø; Egeland, T
2014-11-01
Likelihood ratio (LR) methods to interpret multi-contributor, low template, complex DNA mixtures are becoming standard practice. The next major development will be to introduce search engines based on the new methods to interrogate very large national DNA databases, such as those held by China, the USA and the UK. Here we describe a rapid method that was used to assign a LR to each individual member of database of 5 million genotypes which can be ranked in order. Previous authors have only considered database trawls in the context of binary match or non-match criteria. However, the concept of match/non-match no longer applies within the new paradigm introduced, since the distribution of resultant LRs is continuous for practical purposes. An English appeal court decision allows scientists to routinely report complex DNA profiles using nothing more than their subjective personal 'experience of casework' and 'observations' in order to apply an expression of the rarity of an evidential sample. This ruling must be considered in context of a recent high profile English case, where an individual was extracted from a database and wrongly accused of a serious crime. In this case the DNA evidence was used to negate the overwhelming exculpatory (non-DNA) evidence. Demonstrable confirmation bias, also known as the 'CSI-effect, seriously affected the investigation. The case demonstrated that in practice, databases could be used to select and prosecute an individual, simply because he ranked high in the list of possible matches. We have identified this phenomenon as a cognitive error which we term: 'the naïve investigator effect'. We take the opportunity to test the performance of database extraction strategies either by using a simple matching allele count (MAC) method or LR. The example heard by the appeal court is used as the exemplar case. It is demonstrated that the LR search-method offers substantial benefits compared to searches based on simple matching allele count (MAC) methods. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Slant rectification in Russian passport OCR system using fast Hough transform
NASA Astrophysics Data System (ADS)
Limonova, Elena; Bezmaternykh, Pavel; Nikolaev, Dmitry; Arlazarov, Vladimir
2017-03-01
In this paper, we introduce slant detection method based on Fast Hough Transform calculation and demonstrate its application in industrial system for Russian passports recognition. About 1.5% of this kind of documents appear to be slant or italic. This fact reduces recognition rate, because Optical Recognition Systems are normally designed to process normal fonts. Our method uses Fast Hough Transform to analyse vertical strokes of characters extracted with the help of x-derivative of a text line image. To improve the quality of detector we also introduce field grouping rules. The resulting algorithm allowed to reach high detection quality. Almost all errors of considered approach happen on passports of nonstandard fonts, while slant detector works in appropriate way.
NASA Technical Reports Server (NTRS)
Kettig, R. L.
1975-01-01
A method of classification of digitized multispectral images is developed and experimentally evaluated on actual earth resources data collected by aircraft and satellite. The method is designed to exploit the characteristic dependence between adjacent states of nature that is neglected by the more conventional simple-symmetric decision rule. Thus contextual information is incorporated into the classification scheme. The principle reason for doing this is to improve the accuracy of the classification. For general types of dependence this would generally require more computation per resolution element than the simple-symmetric classifier. But when the dependence occurs in the form of redundance, the elements can be classified collectively, in groups, therby reducing the number of classifications required.
Demonstration of the spin solar cell and spin photodiode effect
Endres, B.; Ciorga, M.; Schmid, M.; Utz, M.; Bougeard, D.; Weiss, D.; Bayreuther, G.; Back, C.H.
2013-01-01
Spin injection and extraction are at the core of semiconductor spintronics. Electrical injection is one method of choice for the creation of a sizeable spin polarization in a semiconductor, requiring especially tailored tunnel or Schottky barriers. Alternatively, optical orientation can be used to generate spins in semiconductors with significant spin-orbit interaction, if optical selection rules are obeyed, typically by using circularly polarized light at a well-defined wavelength. Here we introduce a novel concept for spin injection/extraction that combines the principle of a solar cell with the creation of spin accumulation. We demonstrate that efficient optical spin injection can be achieved with unpolarized light by illuminating a p-n junction where the p-type region consists of a ferromagnet. The discovered mechanism opens the window for the optical generation of a sizeable spin accumulation also in semiconductors without direct band gap such as Si or Ge. PMID:23820766
Chen, Chuyun; Hong, Jiaming; Zhou, Weilin; Lin, Guohua; Wang, Zhengfei; Zhang, Qufei; Lu, Cuina; Lu, Lihong
2017-07-12
To construct a knowledge platform of acupuncture ancient books based on data mining technology, and to provide retrieval service for users. The Oracle 10 g database was applied and JAVA was selected as development language; based on the standard library and ancient books database established by manual entry, a variety of data mining technologies, including word segmentation, speech tagging, dependency analysis, rule extraction, similarity calculation, ambiguity analysis, supervised classification technology were applied to achieve text automatic extraction of ancient books; in the last, through association mining and decision analysis, the comprehensive and intelligent analysis of disease and symptom, meridians, acupoints, rules of acupuncture and moxibustion in acupuncture ancient books were realized, and retrieval service was provided for users through structure of browser/server (B/S). The platform realized full-text retrieval, word frequency analysis and association analysis; when diseases or acupoints were searched, the frequencies of meridian, acupoints (diseases) and techniques were presented from high to low, meanwhile the support degree and confidence coefficient between disease and acupoints (special acupoint), acupoints and acupoints in prescription, disease or acupoints and technique were presented. The experience platform of acupuncture ancient books based on data mining technology could be used as a reference for selection of disease, meridian and acupoint in clinical treatment and education of acupuncture and moxibustion.
Golbamaki, Azadi; Benfenati, Emilio; Golbamaki, Nazanin; Manganaro, Alberto; Merdivan, Erinc; Roncaglioni, Alessandra; Gini, Giuseppina
2016-04-02
In this study, new molecular fragments associated with genotoxic and nongenotoxic carcinogens are introduced to estimate the carcinogenic potential of compounds. Two rule-based carcinogenesis models were developed with the aid of SARpy: model R (from rodents' experimental data) and model E (from human carcinogenicity data). Structural alert extraction method of SARpy uses a completely automated and unbiased manner with statistical significance. The carcinogenicity models developed in this study are collections of carcinogenic potential fragments that were extracted from two carcinogenicity databases: the ANTARES carcinogenicity dataset with information from bioassay on rats and the combination of ISSCAN and CGX datasets, which take into accounts human-based assessment. The performance of these two models was evaluated in terms of cross-validation and external validation using a 258 compound case study dataset. Combining R and H predictions and scoring a positive or negative result when both models are concordant on a prediction, increased accuracy to 72% and specificity to 79% on the external test set. The carcinogenic fragments present in the two models were compared and analyzed from the point of view of chemical class. The results of this study show that the developed rule sets will be a useful tool to identify some new structural alerts of carcinogenicity and provide effective information on the molecular structures of carcinogenic chemicals.
Ontology-based data integration between clinical and research systems.
Mate, Sebastian; Köpcke, Felix; Toddenroth, Dennis; Martin, Marcus; Prokosch, Hans-Ulrich; Bürkle, Thomas; Ganslandt, Thomas
2015-01-01
Data from the electronic medical record comprise numerous structured but uncoded elements, which are not linked to standard terminologies. Reuse of such data for secondary research purposes has gained in importance recently. However, the identification of relevant data elements and the creation of database jobs for extraction, transformation and loading (ETL) are challenging: With current methods such as data warehousing, it is not feasible to efficiently maintain and reuse semantically complex data extraction and trans-formation routines. We present an ontology-supported approach to overcome this challenge by making use of abstraction: Instead of defining ETL procedures at the database level, we use ontologies to organize and describe the medical concepts of both the source system and the target system. Instead of using unique, specifically developed SQL statements or ETL jobs, we define declarative transformation rules within ontologies and illustrate how these constructs can then be used to automatically generate SQL code to perform the desired ETL procedures. This demonstrates how a suitable level of abstraction may not only aid the interpretation of clinical data, but can also foster the reutilization of methods for un-locking it.
Chen, Hong-Ping; Pan, Huan-Huan; Zhang, Xin; Liu, Fei; Chen, Mei-Jun; Luo, Guan-Hua; Liu, You-Ping
2016-07-01
To investigate the dynamic change rules of volatile components from Atractylodis Macrocephalae Rhizoma with different stir-baking degrees (from slight stir-baking, stir-baking to yellow, stir-baking to brown, to stir-baking to scorch). In the present experiment, the Atractylodis Macrocephalae Rhizoma samples with different stir-baking degrees were collected at different processing time points. The contents of volatile oil in various samples were determined by steam distillation method, and the volatile compounds were extracted by using static headspace sampling method. Gas chromatography-mass spectrography (GC-MS) and automated mass spectral deconrolution and identification system (AMDIS) were combined with Kováts retention index to analyze the chemical constituents of the volatile compounds. The results showed that with the deepening of the stir-baking degree, the content of volatile oil was decreased step by step in 4 phases, and both the compositions and contents of volatile components from Atractylodis Macrocephalae Rhizoma showed significant changes. The results showed that the dynamic change rules of volatile components from Atractylodis Macrocephalae Rhizoma in the process of stir-baking were closely related to the processing degree; in addition, Atractylodis Macrocephalae Rhizoma and honey bran had adsorption on each other. These results can provide a scientific basis for elucidating the stir-baking (with bran) mechanism of Atractylodis Macrocephalae Rhizoma. Copyright© by the Chinese Pharmaceutical Association.
Das, Saptarshi; Pan, Indranil; Das, Shantanu; Gupta, Amitava
2012-03-01
Genetic algorithm (GA) has been used in this study for a new approach of suboptimal model reduction in the Nyquist plane and optimal time domain tuning of proportional-integral-derivative (PID) and fractional-order (FO) PI(λ)D(μ) controllers. Simulation studies show that the new Nyquist-based model reduction technique outperforms the conventional H(2)-norm-based reduced parameter modeling technique. With the tuned controller parameters and reduced-order model parameter dataset, optimum tuning rules have been developed with a test-bench of higher-order processes via genetic programming (GP). The GP performs a symbolic regression on the reduced process parameters to evolve a tuning rule which provides the best analytical expression to map the data. The tuning rules are developed for a minimum time domain integral performance index described by a weighted sum of error index and controller effort. From the reported Pareto optimal front of the GP-based optimal rule extraction technique, a trade-off can be made between the complexity of the tuning formulae and the control performance. The efficacy of the single-gene and multi-gene GP-based tuning rules has been compared with the original GA-based control performance for the PID and PI(λ)D(μ) controllers, handling four different classes of representative higher-order processes. These rules are very useful for process control engineers, as they inherit the power of the GA-based tuning methodology, but can be easily calculated without the requirement for running the computationally intensive GA every time. Three-dimensional plots of the required variation in PID/fractional-order PID (FOPID) controller parameters with reduced process parameters have been shown as a guideline for the operator. Parametric robustness of the reported GP-based tuning rules has also been shown with credible simulation examples. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.
Yang, Jin; Hlavacek, William S.
2011-01-01
Rule-based models, which are typically formulated to represent cell signaling systems, can now be simulated via various network-free simulation methods. In a network-free method, reaction rates are calculated for rules that characterize molecular interactions, and these rule rates, which each correspond to the cumulative rate of all reactions implied by a rule, are used to perform a stochastic simulation of reaction kinetics. Network-free methods, which can be viewed as generalizations of Gillespie’s method, are so named because these methods do not require that a list of individual reactions implied by a set of rules be explicitly generated, which is a requirement of other methods for simulating rule-based models. This requirement is impractical for rule sets that imply large reaction networks (i.e., long lists of individual reactions), as reaction network generation is expensive. Here, we compare the network-free simulation methods implemented in RuleMonkey and NFsim, general-purpose software tools for simulating rule-based models encoded in the BioNetGen language. The method implemented in NFsim uses rejection sampling to correct overestimates of rule rates, which introduces null events (i.e., time steps that do not change the state of the system being simulated). The method implemented in RuleMonkey uses iterative updates to track rule rates exactly, which avoids null events. To ensure a fair comparison of the two methods, we developed implementations of the rejection and rejection-free methods specific to a particular class of kinetic models for multivalent ligand-receptor interactions. These implementations were written with the intention of making them as much alike as possible, minimizing the contribution of irrelevant coding differences to efficiency differences. Simulation results show that performance of the rejection method is equal to or better than that of the rejection-free method over wide parameter ranges. However, when parameter values are such that ligand-induced aggregation of receptors yields a large connected receptor cluster, the rejection-free method is more efficient. PMID:21832806
Research on PM2.5 time series characteristics based on data mining technology
NASA Astrophysics Data System (ADS)
Zhao, Lifang; Jia, Jin
2018-02-01
With the development of data mining technology and the establishment of environmental air quality database, it is necessary to discover the potential correlations and rules by digging the massive environmental air quality information and analyzing the air pollution process. In this paper, we have presented a sequential pattern mining method based on the air quality data and pattern association technology to analyze the PM2.5 time series characteristics. Utilizing the real-time monitoring data of urban air quality in China, the time series rule and variation properties of PM2.5 under different pollution levels are extracted and analyzed. The analysis results show that the time sequence features of the PM2.5 concentration is directly affected by the alteration of the pollution degree. The longest time that PM2.5 remained stable is about 24 hours. As the pollution degree gets severer, the instability time and step ascending time gradually changes from 12-24 hours to 3 hours. The presented method is helpful for the controlling and forecasting of the air quality while saving the measuring costs, which is of great significance for the government regulation and public prevention of the air pollution.
NASA Astrophysics Data System (ADS)
Kumar, Ashok; Thakkar, Ajit J.
2017-03-01
Dipole oscillator strength distributions for Br2 and BrCN are constructed from photoabsorption cross-sections combined with constraints provided by the Kuhn-Reiche-Thomas sum rule, the high-energy behavior of the dipole-oscillator-strength density and molar refractivity data when available. The distributions are used to predict dipole sum rules S (k) , mean excitation energies I (k) , and van der Waals C6 coefficients. Coupled-cluster calculations of the static dipole polarizabilities of Br2 and BrCN are reported for comparison with the values of S (- 2) extracted from the distributions.
NASA Technical Reports Server (NTRS)
Glick, B. J.
1985-01-01
Techniques for classifying objects into groups or clases go under many different names including, most commonly, cluster analysis. Mathematically, the general problem is to find a best mapping of objects into an index set consisting of class identifiers. When an a priori grouping of objects exists, the process of deriving the classification rules from samples of classified objects is known as discrimination. When such rules are applied to objects of unknown class, the process is denoted classification. The specific problem addressed involves the group classification of a set of objects that are each associated with a series of measurements (ratio, interval, ordinal, or nominal levels of measurement). Each measurement produces one variable in a multidimensional variable space. Cluster analysis techniques are reviewed and methods for incuding geographic location, distance measures, and spatial pattern (distribution) as parameters in clustering are examined. For the case of patterning, measures of spatial autocorrelation are discussed in terms of the kind of data (nominal, ordinal, or interval scaled) to which they may be applied.
Due Date Assignment in a Dynamic Job Shop with the Orthogonal Kernel Least Squares Algorithm
NASA Astrophysics Data System (ADS)
Yang, D. H.; Hu, L.; Qian, Y.
2017-06-01
Meeting due dates is a key goal in the manufacturing industries. This paper proposes a method for due date assignment (DDA) by using the Orthogonal Kernel Least Squares Algorithm (OKLSA). A simulation model is built to imitate the production process of a highly dynamic job shop. Several factors describing job characteristics and system state are extracted as attributes to predict job flow-times. A number of experiments under conditions of varying dispatching rules and 90% shop utilization level have been carried out to evaluate the effectiveness of OKLSA applied for DDA. The prediction performance of OKLSA is compared with those of five conventional DDA models and back-propagation neural network (BPNN). The experimental results indicate that OKLSA is statistically superior to other DDA models in terms of mean absolute lateness and root mean squares lateness in most cases. The only exception occurs when the shortest processing time rule is used for dispatching jobs, the difference between OKLSA and BPNN is not statistically significant.
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.
Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi
2013-01-01
The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
Integration of Genetic Algorithms and Fuzzy Logic for Urban Growth Modeling
NASA Astrophysics Data System (ADS)
Foroutan, E.; Delavar, M. R.; Araabi, B. N.
2012-07-01
Urban growth phenomenon as a spatio-temporal continuous process is subject to spatial uncertainty. This inherent uncertainty cannot be fully addressed by the conventional methods based on the Boolean algebra. Fuzzy logic can be employed to overcome this limitation. Fuzzy logic preserves the continuity of dynamic urban growth spatially by choosing fuzzy membership functions, fuzzy rules and the fuzzification-defuzzification process. Fuzzy membership functions and fuzzy rule sets as the heart of fuzzy logic are rather subjective and dependent on the expert. However, due to lack of a definite method for determining the membership function parameters, certain optimization is needed to tune the parameters and improve the performance of the model. This paper integrates genetic algorithms and fuzzy logic as a genetic fuzzy system (GFS) for modeling dynamic urban growth. The proposed approach is applied for modeling urban growth in Tehran Metropolitan Area in Iran. Historical land use/cover data of Tehran Metropolitan Area extracted from the 1988 and 1999 Landsat ETM+ images are employed in order to simulate the urban growth. The extracted land use classes of the year 1988 include urban areas, street, vegetation areas, slope and elevation used as urban growth physical driving forces. Relative Operating Characteristic (ROC) curve as an fitness function has been used to evaluate the performance of the GFS algorithm. The optimum membership function parameter is applied for generating a suitability map for the urban growth. Comparing the suitability map and real land use map of 1999 gives the threshold value for the best suitability map which can simulate the land use map of 1999. The simulation outcomes in terms of kappa of 89.13% and overall map accuracy of 95.58% demonstrated the efficiency and reliability of the proposed model.
Analyzing Divisia Rules Extracted from a Feedforward Neural Network
2006-03-01
assumptions. (Barnett and work, Data Mining, Rule Generation Serletis give a detailed treatment of the the- ory of monetary aggregation [1].) However, 1... Serletis , A. (Eds.) (2000), The The- Swizerland, 1995. ory of Monetary Aggregation, North-H ollandeAmsterdam, Chgaptero , pp.- [11] Vincent A. Schmidt and...gas, Nevada, 2002. sets. Macroeconomic Dynamics, 1:485-512, 1997. Reprinted in Barnett, WA. [12] Vincent A. Schmidt and Jane M. Binner. and Serletis
Experimental determination of the effective strong coupling constant
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alexandre Deur; Volker Burkert; Jian-Ping Chen
2007-07-01
We extract an effective strong coupling constant from low Q{sup 2} data on the Bjorken sum. Using sum rules, we establish its Q{sup 2}-behavior over the complete Q{sup 2}-range. The result is compared to effective coupling constants extracted from different processes and to calculations based on Schwinger-Dyson equations, hadron spectroscopy or lattice QCD. Although the connection between the experimentally extracted effective coupling constant and the calculations is not clear, the results agree surprisingly well.
On the problem of zinc extraction from the slags of lead heat
NASA Astrophysics Data System (ADS)
Kozyrev, V. V.; Besser, A. D.; Paretskii, V. M.
2013-12-01
The possibilities of zinc extraction from the slags of lead heat are studied as applied to the ZAO Karat-TsM lead plant to be built for processing ore lead concentrates. The process of zinc extraction into commercial fumes using the technology of slag fuming by natural gas developed in Gintsvetmet is recommended for this purpose. Technological rules are developed for designing a commercial fuming plant, as applied to the conditions of the ZAO Karat-TsM plant.
Qu, Jing; Hu, You-cai; Li, Jian-bei; Wang, Ying-hong; Zhang, Jin-lan; Abliz, Zeper; Yu, Shi-shan; Liu, Yun-bao
2008-01-01
A combination of electrospray ionization tandem mass spectrometry with high-performance liquid chromatography (HPLC/ESI-MSn), and hyphenation of liquid chromatography to nuclear magnetic resonance spectroscopy (HPLC/NMR), have been extensively utilized for on-line analysis of natural products, analyzing metabolite and drug impurity. In our last paper, we reported an on-line analytical method for structural identification of trace alkaloids in the same class. However, the structural types of the constituents in plants were various, such as flavanoids, terpenoids and steroids. It is important to establish an effective analytical method for on-line structural identification of constituents with molecular diversity in extracts of plants. So, in the present study, the fragmentation patterns of some isolated stilbenes, phloroglucinols and flavanoids from Lysidice rhodostegia were investigated by ESI-MSn. Their fragmentation rules and UV characteristics are summarized, and the relationship between the spectral characteristics, rules and the structures is described. According to the fragmentation rules, NMR and UV spectral characteristics, 24 constituents of different types in the fractions from L. brevicalyx of the same genus were structurally characterized on the basis of HPLC/HRMS, HPLC-UV/ESI-MSn, HPLC/1H NMR and HPLC/1H-1H COSY rapidly. Of these, six (10, 13, 14, 16, 17 and 23) are new compounds and all of them are reported from L. brevicalyx for the first time. The aim is to develop an effective analytical method for on-line structural identification of natural products with molecular diversity in plants, and to guide the rapid and direct isolation of novel compounds by chemical screening.
Naghibi, Fereydoun; Delavar, Mahmoud Reza; Pijanowski, Bryan
2016-12-14
Cellular Automata (CA) is one of the most common techniques used to simulate the urbanization process. CA-based urban models use transition rules to deliver spatial patterns of urban growth and urban dynamics over time. Determining the optimum transition rules of the CA is a critical step because of the heterogeneity and nonlinearities existing among urban growth driving forces. Recently, new CA models integrated with optimization methods based on swarm intelligence algorithms were proposed to overcome this drawback. The Artificial Bee Colony (ABC) algorithm is an advanced meta-heuristic swarm intelligence-based algorithm. Here, we propose a novel CA-based urban change model that uses the ABC algorithm to extract optimum transition rules. We applied the proposed ABC-CA model to simulate future urban growth in Urmia (Iran) with multi-temporal Landsat images from 1997, 2006 and 2015. Validation of the simulation results was made through statistical methods such as overall accuracy, the figure of merit and total operating characteristics (TOC). Additionally, we calibrated the CA model by ant colony optimization (ACO) to assess the performance of our proposed model versus similar swarm intelligence algorithm methods. We showed that the overall accuracy and the figure of merit of the ABC-CA model are 90.1% and 51.7%, which are 2.9% and 8.8% higher than those of the ACO-CA model, respectively. Moreover, the allocation disagreement of the simulation results for the ABC-CA model is 9.9%, which is 2.9% less than that of the ACO-CA model. Finally, the ABC-CA model also outperforms the ACO-CA model with fewer quantity and allocation errors and slightly more hits.
Naghibi, Fereydoun; Delavar, Mahmoud Reza; Pijanowski, Bryan
2016-01-01
Cellular Automata (CA) is one of the most common techniques used to simulate the urbanization process. CA-based urban models use transition rules to deliver spatial patterns of urban growth and urban dynamics over time. Determining the optimum transition rules of the CA is a critical step because of the heterogeneity and nonlinearities existing among urban growth driving forces. Recently, new CA models integrated with optimization methods based on swarm intelligence algorithms were proposed to overcome this drawback. The Artificial Bee Colony (ABC) algorithm is an advanced meta-heuristic swarm intelligence-based algorithm. Here, we propose a novel CA-based urban change model that uses the ABC algorithm to extract optimum transition rules. We applied the proposed ABC-CA model to simulate future urban growth in Urmia (Iran) with multi-temporal Landsat images from 1997, 2006 and 2015. Validation of the simulation results was made through statistical methods such as overall accuracy, the figure of merit and total operating characteristics (TOC). Additionally, we calibrated the CA model by ant colony optimization (ACO) to assess the performance of our proposed model versus similar swarm intelligence algorithm methods. We showed that the overall accuracy and the figure of merit of the ABC-CA model are 90.1% and 51.7%, which are 2.9% and 8.8% higher than those of the ACO-CA model, respectively. Moreover, the allocation disagreement of the simulation results for the ABC-CA model is 9.9%, which is 2.9% less than that of the ACO-CA model. Finally, the ABC-CA model also outperforms the ACO-CA model with fewer quantity and allocation errors and slightly more hits. PMID:27983633
Federal Register 2010, 2011, 2012, 2013, 2014
2011-01-20
... DEPARTMENT OF THE TREASURY Alcohol and Tobacco Tax and Trade Bureau 27 CFR Parts 5 [Docket No. TTB-2010-0008; Notice No. 111] RIN 1513-AB79 Disclosure of Cochineal Extract and Carmine in the Labeling of Wines, Distilled Spirits, and Malt Beverages Correction In proposed rule document 2010-27733 beginning...
Automatic classification of diseases from free-text death certificates for real-time surveillance.
Koopman, Bevan; Karimi, Sarvnaz; Nguyen, Anthony; McGuire, Rhydwyn; Muscatello, David; Kemp, Madonna; Truran, Donna; Zhang, Ming; Thackway, Sarah
2015-07-15
Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV. Two classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules. These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate. An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000-2007 in New South Wales, Australia. Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness. A detailed error analysis was performed on classification errors. Classification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 0.96). More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 0.80). The error analysis revealed that word variations as well as certain word combinations adversely affected classification. In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness. The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths. In addition, the methods are generally applicable to other diseases of interest and to other sources of medical free-text besides death certificates.
System and method for embedding emotion in logic systems
NASA Technical Reports Server (NTRS)
Curtis, Steven A. (Inventor)
2012-01-01
A system, method, and computer readable-media for creating a stable synthetic neural system. The method includes training an intellectual choice-driven synthetic neural system (SNS), training an emotional rule-driven SNS by generating emotions from rules, incorporating the rule-driven SNS into the choice-driven SNS through an evolvable interface, and balancing the emotional SNS and the intellectual SNS to achieve stability in a nontrivial autonomous environment with a Stability Algorithm for Neural Entities (SANE). Generating emotions from rules can include coding the rules into the rule-driven SNS in a self-consistent way. Training the emotional rule-driven SNS can occur during a training stage in parallel with training the choice-driven SNS. The training stage can include a self assessment loop which measures performance characteristics of the rule-driven SNS against core genetic code. The method uses a stability threshold to measure stability of the incorporated rule-driven SNS and choice-driven SNS using SANE.
Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.
Ratkovic, Zorana; Golik, Wiktoria; Warnier, Pierre
2012-06-26
Bacteria biotopes cover a wide range of diverse habitats including animal and plant hosts, natural, medical and industrial environments. The high volume of publications in the microbiology domain provides a rich source of up-to-date information on bacteria biotopes. This information, as found in scientific articles, is expressed in natural language and is rarely available in a structured format, such as a database. This information is of great importance for fundamental research and microbiology applications (e.g., medicine, agronomy, food, bioenergy). The automatic extraction of this information from texts will provide a great benefit to the field. We present a new method for extracting relationships between bacteria and their locations using the Alvis framework. Recognition of bacteria and their locations was achieved using a pattern-based approach and domain lexical resources. For the detection of environment locations, we propose a new approach that combines lexical information and the syntactic-semantic analysis of corpus terms to overcome the incompleteness of lexical resources. Bacteria location relations extend over sentence borders, and we developed domain-specific rules for dealing with bacteria anaphors. We participated in the BioNLP 2011 Bacteria Biotope (BB) task with the Alvis system. Official evaluation results show that it achieves the best performance of participating systems. New developments since then have increased the F-score by 4.1 points. We have shown that the combination of semantic analysis and domain-adapted resources is both effective and efficient for event information extraction in the bacteria biotope domain. We plan to adapt the method to deal with a larger set of location types and a large-scale scientific article corpus to enable microbiologists to integrate and use the extracted knowledge in combination with experimental data.
Misra, Dharitri; Chen, Siyuan; Thoma, George R
2009-01-01
One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques.At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts.In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system.
Patterson, Olga V; Freiberg, Matthew S; Skanderson, Melissa; J Fodeh, Samah; Brandt, Cynthia A; DuVall, Scott L
2017-06-12
In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing.
Analyzing privacy requirements: A case study of healthcare in Saudi Arabia.
Ebad, Shouki A; Jaha, Emad S; Al-Qadhi, Mohammed A
2016-01-01
Developing legally compliant systems is a challenging software engineering problem, especially in systems that are governed by law, such as healthcare information systems. This challenge comes from the ambiguities and domain-specific definitions that are found in governmental rules. Therefore, there is a significant business need to automatically analyze privacy texts, extract rules and subsequently enforce them throughout the supply chain. The existing works that analyze health regulations use the U.S. Health Insurance Portability and Accountability Act as a case study. In this article, we applied the Breaux and Antón approach to the text of the Saudi Arabian healthcare privacy regulations; in Saudi Arabia, privacy is among the top dilemmas for public and private healthcare practitioners. As a result, we extracted and analyzed 2 rights, 4 obligations, 22 constraints, and 6 rules. Our analysis can assist requirements engineers, standards organizations, compliance officers and stakeholders by ensuring that their systems conform to Saudi policy. In addition, this article discusses the threats to the study validity and suggests open problems for future research.
Algorithms and semantic infrastructure for mutation impact extraction and grounding.
Laurila, Jonas B; Naderi, Nona; Witte, René; Riazanov, Alexandre; Kouznetsov, Alexandre; Baker, Christopher J O
2010-12-02
Mutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases. We present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework. We address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.
Simulation-Based Rule Generation Considering Readability
Yahagi, H.; Shimizu, S.; Ogata, T.; Hara, T.; Ota, J.
2015-01-01
Rule generation method is proposed for an aircraft control problem in an airport. Designing appropriate rules for motion coordination of taxiing aircraft in the airport is important, which is conducted by ground control. However, previous studies did not consider readability of rules, which is important because it should be operated and maintained by humans. Therefore, in this study, using the indicator of readability, we propose a method of rule generation based on parallel algorithm discovery and orchestration (PADO). By applying our proposed method to the aircraft control problem, the proposed algorithm can generate more readable and more robust rules and is found to be superior to previous methods. PMID:27347501
A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents.
Segura-Bedmar, Isabel; Martínez, Paloma; de Pablo-Sánchez, César
2011-03-29
A drug-drug interaction (DDI) occurs when one drug influences the level or activity of another drug. The increasing volume of the scientific literature overwhelms health care professionals trying to be kept up-to-date with all published studies on DDI. This paper describes a hybrid linguistic approach to DDI extraction that combines shallow parsing and syntactic simplification with pattern matching. Appositions and coordinate structures are interpreted based on shallow syntactic parsing provided by the UMLS MetaMap tool (MMTx). Subsequently, complex and compound sentences are broken down into clauses from which simple sentences are generated by a set of simplification rules. A pharmacist defined a set of domain-specific lexical patterns to capture the most common expressions of DDI in texts. These lexical patterns are matched with the generated sentences in order to extract DDIs. We have performed different experiments to analyze the performance of the different processes. The lexical patterns achieve a reasonable precision (67.30%), but very low recall (14.07%). The inclusion of appositions and coordinate structures helps to improve the recall (25.70%), however, precision is lower (48.69%). The detection of clauses does not improve the performance. Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract DDI from texts. To the best of our knowledge, this work proposes the first integral solution for the automatic extraction of DDI from biomedical texts.
Hemanth Kumar, A K; Polisetty, Arun Kumar; Sudha, V; Vijayakumar, A; Ramachandran, Geetha
2018-04-01
Cycloserine (CYC) is a second line antitubercular drug that is used for the treatment of multidrug resistant tuberculosis (MDR-TB) along with other antitubercular agents and is often used in developing countries. Monitoring CYC levels in plasma could be useful in the clinical management of patients with MDR-TB. A high performance liquid chromatography method for the determination of CYC in human plasma was developed. The method involved extraction of the sample using solid phase extraction cartridges and analysis of the extracted sample using a reverse phase T3 column (150mm) and detection at 240nm with Photo Diode Array (PDA) detector. The chromatogram was run for 15min at a flow rate of 0.4ml/min at 30°C. The assay was specific for CYC and linear from 5.0 to 50.0μg/ml. The relative standard deviations of within- and between-day assays were less than 10%. Recovery of CYC ranged from 102% to 109%. The interference of other second line anti-TB drugs in the assay of CYC was ruled out. The assay spans the concentration range of clinical interest. The specificity and sensitivity of this assay makes it highly suitable for pharmacokinetic studies. Copyright © 2017 Tuberculosis Association of India. Published by Elsevier B.V. All rights reserved.
Zhang, Haitao; Wu, Chenxue; Chen, Zewei; Liu, Zhao; Zhu, Yunhong
2017-01-01
Analyzing large-scale spatial-temporal k-anonymity datasets recorded in location-based service (LBS) application servers can benefit some LBS applications. However, such analyses can allow adversaries to make inference attacks that cannot be handled by spatial-temporal k-anonymity methods or other methods for protecting sensitive knowledge. In response to this challenge, first we defined a destination location prediction attack model based on privacy-sensitive sequence rules mined from large scale anonymity datasets. Then we proposed a novel on-line spatial-temporal k-anonymity method that can resist such inference attacks. Our anti-attack technique generates new anonymity datasets with awareness of privacy-sensitive sequence rules. The new datasets extend the original sequence database of anonymity datasets to hide the privacy-sensitive rules progressively. The process includes two phases: off-line analysis and on-line application. In the off-line phase, sequence rules are mined from an original sequence database of anonymity datasets, and privacy-sensitive sequence rules are developed by correlating privacy-sensitive spatial regions with spatial grid cells among the sequence rules. In the on-line phase, new anonymity datasets are generated upon LBS requests by adopting specific generalization and avoidance principles to hide the privacy-sensitive sequence rules progressively from the extended sequence anonymity datasets database. We conducted extensive experiments to test the performance of the proposed method, and to explore the influence of the parameter K value. The results demonstrated that our proposed approach is faster and more effective for hiding privacy-sensitive sequence rules in terms of hiding sensitive rules ratios to eliminate inference attacks. Our method also had fewer side effects in terms of generating new sensitive rules ratios than the traditional spatial-temporal k-anonymity method, and had basically the same side effects in terms of non-sensitive rules variation ratios with the traditional spatial-temporal k-anonymity method. Furthermore, we also found the performance variation tendency from the parameter K value, which can help achieve the goal of hiding the maximum number of original sensitive rules while generating a minimum of new sensitive rules and affecting a minimum number of non-sensitive rules.
Wu, Chenxue; Liu, Zhao; Zhu, Yunhong
2017-01-01
Analyzing large-scale spatial-temporal k-anonymity datasets recorded in location-based service (LBS) application servers can benefit some LBS applications. However, such analyses can allow adversaries to make inference attacks that cannot be handled by spatial-temporal k-anonymity methods or other methods for protecting sensitive knowledge. In response to this challenge, first we defined a destination location prediction attack model based on privacy-sensitive sequence rules mined from large scale anonymity datasets. Then we proposed a novel on-line spatial-temporal k-anonymity method that can resist such inference attacks. Our anti-attack technique generates new anonymity datasets with awareness of privacy-sensitive sequence rules. The new datasets extend the original sequence database of anonymity datasets to hide the privacy-sensitive rules progressively. The process includes two phases: off-line analysis and on-line application. In the off-line phase, sequence rules are mined from an original sequence database of anonymity datasets, and privacy-sensitive sequence rules are developed by correlating privacy-sensitive spatial regions with spatial grid cells among the sequence rules. In the on-line phase, new anonymity datasets are generated upon LBS requests by adopting specific generalization and avoidance principles to hide the privacy-sensitive sequence rules progressively from the extended sequence anonymity datasets database. We conducted extensive experiments to test the performance of the proposed method, and to explore the influence of the parameter K value. The results demonstrated that our proposed approach is faster and more effective for hiding privacy-sensitive sequence rules in terms of hiding sensitive rules ratios to eliminate inference attacks. Our method also had fewer side effects in terms of generating new sensitive rules ratios than the traditional spatial-temporal k-anonymity method, and had basically the same side effects in terms of non-sensitive rules variation ratios with the traditional spatial-temporal k-anonymity method. Furthermore, we also found the performance variation tendency from the parameter K value, which can help achieve the goal of hiding the maximum number of original sensitive rules while generating a minimum of new sensitive rules and affecting a minimum number of non-sensitive rules. PMID:28767687
Lecker, Stewart H.; Solomon, Vered; Price, S. Russ; Kwon, Yong Tae; Mitch, William E.; Goldberg, Alfred L.
1999-01-01
Insulin deficiency (e.g., in acute diabetes or fasting) is associated with enhanced protein breakdown in skeletal muscle leading to muscle wasting. Because recent studies have suggested that this increased proteolysis is due to activation of the ubiquitin-proteasome (Ub-proteasome) pathway, we investigated whether diabetes is associated with an increased rate of Ub conjugation to muscle protein. Muscle extracts from streptozotocin-induced insulin-deficient rats contained greater amounts of Ub-conjugated proteins than extracts from control animals and also 40–50% greater rates of conjugation of 125I-Ub to endogenous muscle proteins. This enhanced Ub-conjugation occurred mainly through the N-end rule pathway that involves E214k and E3α. A specific substrate of this pathway, α-lactalbumin, was ubiquitinated faster in the diabetic extracts, and a dominant negative form of E214k inhibited this increase in ubiquitination rates. Both E214k and E3α were shown to be rate-limiting for Ub conjugation because adding small amounts of either to extracts stimulated Ub conjugation. Furthermore, mRNA for E214k and E3α (but not E1) were elevated 2-fold in muscles from diabetic rats, although no significant increase in E214k and E3α content could be detected by immunoblot or activity assays. The simplest interpretation of these results is that small increases in both E214k and E3α in muscles of insulin-deficient animals together accelerate Ub conjugation and protein degradation by the N-end rule pathway, the same pathway activated in cancer cachexia, sepsis, and hyperthyroidism. J. Clin. Invest. 104:1411–1420 (1999). PMID:10562303
Lecker, S H; Solomon, V; Price, S R; Kwon, Y T; Mitch, W E; Goldberg, A L
1999-11-01
Insulin deficiency (e.g., in acute diabetes or fasting) is associated with enhanced protein breakdown in skeletal muscle leading to muscle wasting. Because recent studies have suggested that this increased proteolysis is due to activation of the ubiquitin-proteasome (Ub-proteasome) pathway, we investigated whether diabetes is associated with an increased rate of Ub conjugation to muscle protein. Muscle extracts from streptozotocin-induced insulin-deficient rats contained greater amounts of Ub-conjugated proteins than extracts from control animals and also 40-50% greater rates of conjugation of (125)I-Ub to endogenous muscle proteins. This enhanced Ub-conjugation occurred mainly through the N-end rule pathway that involves E2(14k) and E3alpha. A specific substrate of this pathway, alpha-lactalbumin, was ubiquitinated faster in the diabetic extracts, and a dominant negative form of E2(14k) inhibited this increase in ubiquitination rates. Both E2(14k) and E3alpha were shown to be rate-limiting for Ub conjugation because adding small amounts of either to extracts stimulated Ub conjugation. Furthermore, mRNA for E2(14k) and E3alpha (but not E1) were elevated 2-fold in muscles from diabetic rats, although no significant increase in E2(14k) and E3alpha content could be detected by immunoblot or activity assays. The simplest interpretation of these results is that small increases in both E2(14k) and E3alpha in muscles of insulin-deficient animals together accelerate Ub conjugation and protein degradation by the N-end rule pathway, the same pathway activated in cancer cachexia, sepsis, and hyperthyroidism.
NASA Technical Reports Server (NTRS)
Lecker, S. H.; Solomon, V.; Price, S. R.; Kwon, Y. T.; Mitch, W. E.; Goldberg, A. L.
1999-01-01
Insulin deficiency (e.g., in acute diabetes or fasting) is associated with enhanced protein breakdown in skeletal muscle leading to muscle wasting. Because recent studies have suggested that this increased proteolysis is due to activation of the ubiquitin-proteasome (Ub-proteasome) pathway, we investigated whether diabetes is associated with an increased rate of Ub conjugation to muscle protein. Muscle extracts from streptozotocin-induced insulin-deficient rats contained greater amounts of Ub-conjugated proteins than extracts from control animals and also 40-50% greater rates of conjugation of (125)I-Ub to endogenous muscle proteins. This enhanced Ub-conjugation occurred mainly through the N-end rule pathway that involves E2(14k) and E3alpha. A specific substrate of this pathway, alpha-lactalbumin, was ubiquitinated faster in the diabetic extracts, and a dominant negative form of E2(14k) inhibited this increase in ubiquitination rates. Both E2(14k) and E3alpha were shown to be rate-limiting for Ub conjugation because adding small amounts of either to extracts stimulated Ub conjugation. Furthermore, mRNA for E2(14k) and E3alpha (but not E1) were elevated 2-fold in muscles from diabetic rats, although no significant increase in E2(14k) and E3alpha content could be detected by immunoblot or activity assays. The simplest interpretation of these results is that small increases in both E2(14k) and E3alpha in muscles of insulin-deficient animals together accelerate Ub conjugation and protein degradation by the N-end rule pathway, the same pathway activated in cancer cachexia, sepsis, and hyperthyroidism.
Fuzzylot: a novel self-organising fuzzy-neural rule-based pilot system for automated vehicles.
Pasquier, M; Quek, C; Toh, M
2001-10-01
This paper presents part of our research work concerned with the realisation of an Intelligent Vehicle and the technologies required for its routing, navigation, and control. An automated driver prototype has been developed using a self-organising fuzzy rule-based system (POPFNN-CRI(S)) to model and subsequently emulate human driving expertise. The ability of fuzzy logic to represent vague information using linguistic variables makes it a powerful tool to develop rule-based control systems when an exact working model is not available, as is the case of any vehicle-driving task. Designing a fuzzy system, however, is a complex endeavour, due to the need to define the variables and their associated fuzzy sets, and determine a suitable rule base. Many efforts have thus been devoted to automating this process, yielding the development of learning and optimisation techniques. One of them is the family of POP-FNNs, or Pseudo-Outer Product Fuzzy Neural Networks (TVR, AARS(S), AARS(NS), CRI, Yager). These generic self-organising neural networks developed at the Intelligent Systems Laboratory (ISL/NTU) are based on formal fuzzy mathematical theory and are able to objectively extract a fuzzy rule base from training data. In this application, a driving simulator has been developed, that integrates a detailed model of the car dynamics, complete with engine characteristics and environmental parameters, and an OpenGL-based 3D-simulation interface coupled with driving wheel and accelerator/ brake pedals. The simulator has been used on various road scenarios to record from a human pilot driving data consisting of steering and speed control actions associated to road features. Specifically, the POPFNN-CRI(S) system is used to cluster the data and extract a fuzzy rule base modelling the human driving behaviour. Finally, the effectiveness of the generated rule base has been validated using the simulator in autopilot mode.
Java implementation of Class Association Rule algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tamura, Makio
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix and a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be appliedmore » more generally.« less
Building a common pipeline for rule-based document classification.
Patterson, Olga V; Ginter, Thomas; DuVall, Scott L
2013-01-01
Instance-based classification of clinical text is a widely used natural language processing task employed as a step for patient classification, document retrieval, or information extraction. Rule-based approaches rely on concept identification and context analysis in order to determine the appropriate class. We propose a five-step process that enables even small research teams to develop simple but powerful rule-based NLP systems by taking advantage of a common UIMA AS based pipeline for classification. Our proposed methodology coupled with the general-purpose solution provides researchers with access to the data locked in clinical text in cases of limited human resources and compact timelines.
Statistical Properties of Cell Topology and Geometry in a Tissue-Growth Model
NASA Astrophysics Data System (ADS)
Sahlin, Patrik; Hamant, Olivier; Jönsson, Henrik
Statistical properties of cell topologies in two-dimensional tissues have recently been suggested to be a consequence of cell divisions. Different rules for the positioning of new walls in plants have been proposed, where e.g. Errara’s rule state that new walls are added with the shortest possible path dividing the mother cell’s volume into two equal parts. Here, we show that for an isotropically growing tissue Errara’s rule results in the correct distributions of number of cell neighbors as well as cellular geometries, in contrast to a random division rule. Further we show that wall mechanics constrain the isotropic growth such that the resulting cell shape distributions more closely agree with experimental data extracted from the shoot apex of Arabidopsis thaliana.
Concurrence of rule- and similarity-based mechanisms in artificial grammar learning.
Opitz, Bertram; Hofmann, Juliane
2015-03-01
A current theoretical debate regards whether rule-based or similarity-based learning prevails during artificial grammar learning (AGL). Although the majority of findings are consistent with a similarity-based account of AGL it has been argued that these results were obtained only after limited exposure to study exemplars, and performance on subsequent grammaticality judgment tests has often been barely above chance level. In three experiments the conditions were investigated under which rule- and similarity-based learning could be applied. Participants were exposed to exemplars of an artificial grammar under different (implicit and explicit) learning instructions. The analysis of receiver operating characteristics (ROC) during a final grammaticality judgment test revealed that explicit but not implicit learning led to rule knowledge. It also demonstrated that this knowledge base is built up gradually while similarity knowledge governed the initial state of learning. Together these results indicate that rule- and similarity-based mechanisms concur during AGL. Moreover, it could be speculated that two different rule processes might operate in parallel; bottom-up learning via gradual rule extraction and top-down learning via rule testing. Crucially, the latter is facilitated by performance feedback that encourages explicit hypothesis testing. Copyright © 2015 Elsevier Inc. All rights reserved.
Identification of transplanting stage of rice using Sentinel-1 data
NASA Astrophysics Data System (ADS)
Hongo, C.; Tosa, T.; Tamura, E.; Sigit, G.; Barus, B.
2017-12-01
As the adaptation of climate change, the Government of Indonesia has launched agricultural insurance program for damage of rice by drought, flood and pest and disease. For assessment of the damage ratio and calculation of indemnity, extraction of paddy field and identification of transplanting stage are key issues. In this research, we conducted identification of rice transplanting stage in dry season of 2015, using data from Sentinel-1, for paddy in Cianjur, West Java, Indonesia. As the first step, time series order of backscattering coefficient was analyzed about paddy, forest, villages and fish farming ponds with use of Sentinel-1 data acquired on April 1, April 13, April 25, May 7, May 19, June 24, July 18 and August 11. The result shows that the backscattering coefficient of paddy substantially decreased from data on May 7 and reached minimum value and then after increased toward June. A paddy area showing this change was almost the same area where rice was at harvesting stage and we did field investigation work from August 11 to 13. Considering a growth period of rice in our research site was about 110 days, so the result supported the fact that transplantation of rice was done around May 7. On the other hand, backscattering coefficient of forest, villages and fish farming ponds was constant and showed clear difference from the coefficient of paddy. As the next step, minimum and maximum value of backscattering coefficient were extracted from the data of May 7, May 19 and June 24, respectively. Then increase amount was calculated by deducting the minimum value from the maximum. Finally, using the minimum value of backscattering coefficient and the increased amount, a classification of image was made to identify transplanting stage through maximum likelihood method, decision tree method and threshold setting method (regression analysis by 3σ-rule). As the result, the maximum likelihood method made the most accurate distinguishment about transplanting stage while the decision tree method showed tendency to underestimate a paddy area already planted. As to the threshold setting method (regression analysis by 3σ-rule), its distinguishment accuracy was better than those of other methods about a paddy area adjacent to forest and villages of which backscattering coefficient was influenced by other sources' coefficients.
Automatic diagnosis of malaria based on complete circle-ellipse fitting search algorithm.
Sheikhhosseini, M; Rabbani, H; Zekri, M; Talebi, A
2013-12-01
Diagnosis of malaria parasitemia from blood smears is a subjective and time-consuming task for pathologists. The automatic diagnostic process will reduce the diagnostic time. Also, it can be worked as a second opinion for pathologists and may be useful in malaria screening. This study presents an automatic method for malaria diagnosis from thin blood smears. According to this fact that malaria life cycle is started by forming a ring around the parasite nucleus, the proposed approach is mainly based on curve fitting to detect parasite ring in the blood smear. The method is composed of six main phases: stain object extraction step, which extracts candidate objects that may be infected by malaria parasites. This phase includes stained pixel extraction step based on intensity and colour, and stained object segmentation by defining stained circle matching. Second step is preprocessing phase which makes use of nonlinear diffusion filtering. The process continues with detection of parasite nucleus from resulted image of previous step according to image intensity. Fourth step introduces a complete search process in which the circle search step identifies the direction and initial points for direct least-square ellipse fitting algorithm. Furthermore in the ellipse searching process, although parasite shape is completed undesired regions with high error value are removed and ellipse parameters are modified. Features are extracted from the parasite candidate region instead of whole candidate object in the fifth step. By employing this special feature extraction way, which is provided by special searching process, the necessity of employing clump splitting methods is removed. Also, defining stained circle matching process in the first step speeds up the whole procedure. Finally, a series of decision rules are applied on the extracted features to decide on the positivity or negativity of malaria parasite presence. The algorithm is applied on 26 digital images which are provided from thin blood smear films. The images are contained 1274 objects which may be infected by parasite or healthy. Applying the automatic identification of malaria on provided database showed a sensitivity of 82.28% and specificity of 98.02%. © 2013 The Authors Journal of Microscopy © 2013 Royal Microscopical Society.
A New Self-Constrained Inversion Method of Potential Fields Based on Probability Tomography
NASA Astrophysics Data System (ADS)
Sun, S.; Chen, C.; WANG, H.; Wang, Q.
2014-12-01
The self-constrained inversion method of potential fields uses a priori information self-extracted from potential field data. Differing from external a priori information, the self-extracted information are generally parameters derived exclusively from the analysis of the gravity and magnetic data (Paoletti et al., 2013). Here we develop a new self-constrained inversion method based on probability tomography. Probability tomography doesn't need any priori information, as well as large inversion matrix operations. Moreover, its result can describe the sources, especially the distribution of which is complex and irregular, entirely and clearly. Therefore, we attempt to use the a priori information extracted from the probability tomography results to constrain the inversion for physical properties. The magnetic anomaly data was taken as an example in this work. The probability tomography result of magnetic total field anomaly(ΔΤ) shows a smoother distribution than the anomalous source and cannot display the source edges exactly. However, the gradients of ΔΤ are with higher resolution than ΔΤ in their own direction, and this characteristic is also presented in their probability tomography results. So we use some rules to combine the probability tomography results of ∂ΔΤ⁄∂x, ∂ΔΤ⁄∂y and ∂ΔΤ⁄∂z into a new result which is used for extracting a priori information, and then incorporate the information into the model objective function as spatial weighting functions to invert the final magnetic susceptibility. Some magnetic synthetic examples incorporated with and without a priori information extracted from the probability tomography results were made to do comparison, results of which show that the former are more concentrated and with higher resolution of the source body edges. This method is finally applied in an iron mine in China with field measured ΔΤ data and performs well. ReferencesPaoletti, V., Ialongo, S., Florio, G., Fedi, M. & Cella, F., 2013. Self-constrained inversion of potential fields, Geophys J Int.This research is supported by the Fundamental Research Funds for Institute for Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences (Grant Nos. WHS201210 and WHS201211).
Panas, Robert M.
2016-06-23
This paper presents a new analytical method for predicting the large displacement behavior of flexural double parallelogram (DP) bearings with underconstraint eliminator (UE) linkages. This closed-form perturbative Euler analysis method is able to – for the first time – directly incorporate the elastomechanics of a discrete UE linkage, which is a hybrid flexure element that is linked to ground as well as both stages on the bearing. The models are used to understand a nested linkage UE design, however the method is extensible to other UE linkages. Design rules and figures-of-merit are extracted from the analysis models, which provide powerfulmore » tools for accelerating the design process. The models, rules and figures-of-merit enable the rapid design of a UE for a desired large displacement behavior, as well as providing a means for determining the limits of UE and DP structures. This will aid in the adoption of UE linkages into DP bearings for precision mechanisms. Models are generated for a nested linkage UE design, and the performance of this DP with UE structure is compared to a DP-only bearing. As a result, the perturbative Euler analysis is shown to match existing theories for DP-only bearings with distributed compliance within ≈2%, and Finite Element Analysis for the DP with UE bearings within an average 10%.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panas, Robert M.
This paper presents a new analytical method for predicting the large displacement behavior of flexural double parallelogram (DP) bearings with underconstraint eliminator (UE) linkages. This closed-form perturbative Euler analysis method is able to – for the first time – directly incorporate the elastomechanics of a discrete UE linkage, which is a hybrid flexure element that is linked to ground as well as both stages on the bearing. The models are used to understand a nested linkage UE design, however the method is extensible to other UE linkages. Design rules and figures-of-merit are extracted from the analysis models, which provide powerfulmore » tools for accelerating the design process. The models, rules and figures-of-merit enable the rapid design of a UE for a desired large displacement behavior, as well as providing a means for determining the limits of UE and DP structures. This will aid in the adoption of UE linkages into DP bearings for precision mechanisms. Models are generated for a nested linkage UE design, and the performance of this DP with UE structure is compared to a DP-only bearing. As a result, the perturbative Euler analysis is shown to match existing theories for DP-only bearings with distributed compliance within ≈2%, and Finite Element Analysis for the DP with UE bearings within an average 10%.« less
39 CFR 3001.4 - Method of citing rules.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 39 Postal Service 1 2010-07-01 2010-07-01 false Method of citing rules. 3001.4 Section 3001.4... Applicability § 3001.4 Method of citing rules. This part shall be referred to as the “rules of practice.” Each section, paragraph, or subparagraph shall include only the numbers and letters to the right of the decimal...
Knowledge mining from clinical datasets using rough sets and backpropagation neural network.
Nahato, Kindie Biredagn; Harichandran, Khanna Nehemiah; Arputharaj, Kannan
2015-01-01
The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN) is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI) machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.
A novel Bayesian framework for discriminative feature extraction in Brain-Computer Interfaces.
Suk, Heung-Il; Lee, Seong-Whan
2013-02-01
As there has been a paradigm shift in the learning load from a human subject to a computer, machine learning has been considered as a useful tool for Brain-Computer Interfaces (BCIs). In this paper, we propose a novel Bayesian framework for discriminative feature extraction for motor imagery classification in an EEG-based BCI in which the class-discriminative frequency bands and the corresponding spatial filters are optimized by means of the probabilistic and information-theoretic approaches. In our framework, the problem of simultaneous spatiospectral filter optimization is formulated as the estimation of an unknown posterior probability density function (pdf) that represents the probability that a single-trial EEG of predefined mental tasks can be discriminated in a state. In order to estimate the posterior pdf, we propose a particle-based approximation method by extending a factored-sampling technique with a diffusion process. An information-theoretic observation model is also devised to measure discriminative power of features between classes. From the viewpoint of classifier design, the proposed method naturally allows us to construct a spectrally weighted label decision rule by linearly combining the outputs from multiple classifiers. We demonstrate the feasibility and effectiveness of the proposed method by analyzing the results and its success on three public databases.
An infrared-visible image fusion scheme based on NSCT and compressed sensing
NASA Astrophysics Data System (ADS)
Zhang, Qiong; Maldague, Xavier
2015-05-01
Image fusion, as a research hot point nowadays in the field of infrared computer vision, has been developed utilizing different varieties of methods. Traditional image fusion algorithms are inclined to bring problems, such as data storage shortage and computational complexity increase, etc. Compressed sensing (CS) uses sparse sampling without knowing the priori knowledge and greatly reconstructs the image, which reduces the cost and complexity of image processing. In this paper, an advanced compressed sensing image fusion algorithm based on non-subsampled contourlet transform (NSCT) is proposed. NSCT provides better sparsity than the wavelet transform in image representation. Throughout the NSCT decomposition, the low-frequency and high-frequency coefficients can be obtained respectively. For the fusion processing of low-frequency coefficients of infrared and visible images , the adaptive regional energy weighting rule is utilized. Thus only the high-frequency coefficients are specially measured. Here we use sparse representation and random projection to obtain the required values of high-frequency coefficients, afterwards, the coefficients of each image block can be fused via the absolute maximum selection rule and/or the regional standard deviation rule. In the reconstruction of the compressive sampling results, a gradient-based iterative algorithm and the total variation (TV) method are employed to recover the high-frequency coefficients. Eventually, the fused image is recovered by inverse NSCT. Both the visual effects and the numerical computation results after experiments indicate that the presented approach achieves much higher quality of image fusion, accelerates the calculations, enhances various targets and extracts more useful information.
3D palmprint data fast acquisition and recognition
NASA Astrophysics Data System (ADS)
Wang, Xiaoxu; Huang, Shujun; Gao, Nan; Zhang, Zonghua
2014-11-01
This paper presents a fast 3D (Three-Dimension) palmprint capturing system and develops an efficient 3D palmprint feature extraction and recognition method. In order to fast acquire accurate 3D shape and texture of palmprint, a DLP projector triggers a CCD camera to realize synchronization. By generating and projecting green fringe pattern images onto the measured palm surface, 3D palmprint data are calculated from the fringe pattern images. The periodic feature vector can be derived from the calculated 3D palmprint data, so undistorted 3D biometrics is obtained. Using the obtained 3D palmprint data, feature matching test have been carried out by Gabor filter, competition rules and the mean curvature. Experimental results on capturing 3D palmprint show that the proposed acquisition method can fast get 3D shape information of palmprint. Some initial experiments on recognition show the proposed method is efficient by using 3D palmprint data.
An effective method on pornographic images realtime recognition
NASA Astrophysics Data System (ADS)
Wang, Baosong; Lv, Xueqiang; Wang, Tao; Wang, Chengrui
2013-03-01
In this paper, skin detection, texture filtering and face detection are used to extract feature on an image library, training them with the decision tree arithmetic to create some rules as a decision tree classifier to distinguish an unknown image. Experiment based on more than twenty thousand images, the precision rate can get 76.21% when testing on 13025 pornographic images and elapsed time is less than 0.2s. This experiment shows it has a good popularity. Among the steps mentioned above, proposing a new skin detection model which called irregular polygon region skin detection model based on YCbCr color space. This skin detection model can lower the false detection rate on skin detection. A new method called sequence region labeling on binary connected area can calculate features on connected area, it is faster and needs less memory than other recursive methods.
Hassanpour, Saeed; Bay, Graham; Langlotz, Curtis P
2017-06-01
We built a natural language processing (NLP) method to automatically extract clinical findings in radiology reports and characterize their level of change and significance according to a radiology-specific information model. We utilized a combination of machine learning and rule-based approaches for this purpose. Our method is unique in capturing different features and levels of abstractions at surface, entity, and discourse levels in text analysis. This combination has enabled us to recognize the underlying semantics of radiology report narratives for this task. We evaluated our method on radiology reports from four major healthcare organizations. Our evaluation showed the efficacy of our method in highlighting important changes (accuracy 99.2%, precision 96.3%, recall 93.5%, and F1 score 94.7%) and identifying significant observations (accuracy 75.8%, precision 75.2%, recall 75.7%, and F1 score 75.3%) to characterize radiology reports. This method can help clinicians quickly understand the key observations in radiology reports and facilitate clinical decision support, review prioritization, and disease surveillance.
27 CFR 19.302 - Treatment during production.
Code of Federal Regulations, 2011 CFR
2011-04-01
... BUREAU, DEPARTMENT OF THE TREASURY LIQUORS DISTILLED SPIRITS PLANTS Production of Distilled Spirits Rules... remain in the spirits, so as to preclude the extraction of potable spirits. (26 U.S.C. 5201) ...
An improved cellular automaton method to model multispecies biofilms.
Tang, Youneng; Valocchi, Albert J
2013-10-01
Biomass-spreading rules used in previous cellular automaton methods to simulate multispecies biofilm introduced extensive mixing between different biomass species or resulted in spatially discontinuous biomass concentration and distribution; this caused results based on the cellular automaton methods to deviate from experimental results and those from the more computationally intensive continuous method. To overcome the problems, we propose new biomass-spreading rules in this work: Excess biomass spreads by pushing a line of grid cells that are on the shortest path from the source grid cell to the destination grid cell, and the fractions of different biomass species in the grid cells on the path change due to the spreading. To evaluate the new rules, three two-dimensional simulation examples are used to compare the biomass distribution computed using the continuous method and three cellular automaton methods, one based on the new rules and the other two based on rules presented in two previous studies. The relationship between the biomass species is syntrophic in one example and competitive in the other two examples. Simulation results generated using the cellular automaton method based on the new rules agree much better with the continuous method than do results using the other two cellular automaton methods. The new biomass-spreading rules are no more complex to implement than the existing rules. Copyright © 2013 Elsevier Ltd. All rights reserved.
Utilizing Chinese Admission Records for MACE Prediction of Acute Coronary Syndrome
Hu, Danqing; Huang, Zhengxing; Chan, Tak-Ming; Dong, Wei; Lu, Xudong; Duan, Huilong
2016-01-01
Background: Clinical major adverse cardiovascular event (MACE) prediction of acute coronary syndrome (ACS) is important for a number of applications including physician decision support, quality of care assessment, and efficient healthcare service delivery on ACS patients. Admission records, as typical media to contain clinical information of patients at the early stage of their hospitalizations, provide significant potential to be explored for MACE prediction in a proactive manner. Methods: We propose a hybrid approach for MACE prediction by utilizing a large volume of admission records. Firstly, both a rule-based medical language processing method and a machine learning method (i.e., Conditional Random Fields (CRFs)) are developed to extract essential patient features from unstructured admission records. After that, state-of-the-art supervised machine learning algorithms are applied to construct MACE prediction models from data. Results: We comparatively evaluate the performance of the proposed approach on a real clinical dataset consisting of 2930 ACS patient samples collected from a Chinese hospital. Our best model achieved 72% AUC in MACE prediction. In comparison of the performance between our models and two well-known ACS risk score tools, i.e., GRACE and TIMI, our learned models obtain better performances with a significant margin. Conclusions: Experimental results reveal that our approach can obtain competitive performance in MACE prediction. The comparison of classifiers indicates the proposed approach has a competitive generality with datasets extracted by different feature extraction methods. Furthermore, our MACE prediction model obtained a significant improvement by comparison with both GRACE and TIMI. It indicates that using admission records can effectively provide MACE prediction service for ACS patients at the early stage of their hospitalizations. PMID:27649220
A hybrid learning method for constructing compact rule-based fuzzy models.
Zhao, Wanqing; Niu, Qun; Li, Kang; Irwin, George W
2013-12-01
The Takagi–Sugeno–Kang-type rule-based fuzzy model has found many applications in different fields; a major challenge is, however, to build a compact model with optimized model parameters which leads to satisfactory model performance. To produce a compact model, most existing approaches mainly focus on selecting an appropriate number of fuzzy rules. In contrast, this paper considers not only the selection of fuzzy rules but also the structure of each rule premise and consequent, leading to the development of a novel compact rule-based fuzzy model. Here, each fuzzy rule is associated with two sets of input attributes, in which the first is used for constructing the rule premise and the other is employed in the rule consequent. A new hybrid learning method combining the modified harmony search method with a fast recursive algorithm is hereby proposed to determine the structure and the parameters for the rule premises and consequents. This is a hard mixed-integer nonlinear optimization problem, and the proposed hybrid method solves the problem by employing an embedded framework, leading to a significantly reduced number of model parameters and a small number of fuzzy rules with each being as simple as possible. Results from three examples are presented to demonstrate the compactness (in terms of the number of model parameters and the number of rules) and the performance of the fuzzy models obtained by the proposed hybrid learning method, in comparison with other techniques from the literature.
Significance testing of rules in rule-based models of human problem solving
NASA Technical Reports Server (NTRS)
Lewis, C. M.; Hammer, J. M.
1986-01-01
Rule-based models of human problem solving have typically not been tested for statistical significance. Three methods of testing rules - analysis of variance, randomization, and contingency tables - are presented. Advantages and disadvantages of the methods are also described.
Research on key technology of the verification system of steel rule based on vision measurement
NASA Astrophysics Data System (ADS)
Jia, Siyuan; Wang, Zhong; Liu, Changjie; Fu, Luhua; Li, Yiming; Lu, Ruijun
2018-01-01
The steel rule plays an important role in quantity transmission. However, the traditional verification method of steel rule based on manual operation and reading brings about low precision and low efficiency. A machine vison based verification system of steel rule is designed referring to JJG1-1999-Verificaiton Regulation of Steel Rule [1]. What differentiates this system is that it uses a new calibration method of pixel equivalent and decontaminates the surface of steel rule. Experiments show that these two methods fully meet the requirements of the verification system. Measuring results strongly prove that these methods not only meet the precision of verification regulation, but also improve the reliability and efficiency of the verification system.
Fuzzy association rule mining and classification for the prediction of malaria in South Korea.
Buczak, Anna L; Baugher, Benjamin; Guven, Erhan; Ramac-Thomas, Liane C; Elbert, Yevgeniy; Babin, Steven M; Lewis, Sheri H
2015-06-18
Malaria is the world's most prevalent vector-borne disease. Accurate prediction of malaria outbreaks may lead to public health interventions that mitigate disease morbidity and mortality. We describe an application of a method for creating prediction models utilizing Fuzzy Association Rule Mining to extract relationships between epidemiological, meteorological, climatic, and socio-economic data from Korea. These relationships are in the form of rules, from which the best set of rules is automatically chosen and forms a classifier. Two classifiers have been built and their results fused to become a malaria prediction model. Future malaria cases are predicted as Low, Medium or High, where these classes are defined as a total of 0-2, 3-16, and above 17 cases, respectively, for a region in South Korea during a two-week period. Based on user recommendations, HIGH is considered an outbreak. Model accuracy is described by Positive Predictive Value (PPV), Sensitivity, and F-score for each class, computed on test data not previously used to develop the model. For predictions made 7-8 weeks in advance, model PPV and Sensitivity are 0.842 and 0.681, respectively, for the HIGH classes. The F0.5 and F3 scores (which combine PPV and Sensitivity) are 0.804 and 0.694, respectively, for the HIGH classes. The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class. For the Medium class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3. A previously described method for creating disease prediction models has been modified and extended to build models for predicting malaria. In addition, some new input variables were used, including indicators of intervention measures. The South Korea malaria prediction models predict Low, Medium or High cases 7-8 weeks in the future. This paper demonstrates that our data driven approach can be used for the prediction of different diseases.
Ontology-Based Data Integration between Clinical and Research Systems
Mate, Sebastian; Köpcke, Felix; Toddenroth, Dennis; Martin, Marcus; Prokosch, Hans-Ulrich
2015-01-01
Data from the electronic medical record comprise numerous structured but uncoded ele-ments, which are not linked to standard terminologies. Reuse of such data for secondary research purposes has gained in importance recently. However, the identification of rele-vant data elements and the creation of database jobs for extraction, transformation and loading (ETL) are challenging: With current methods such as data warehousing, it is not feasible to efficiently maintain and reuse semantically complex data extraction and trans-formation routines. We present an ontology-supported approach to overcome this challenge by making use of abstraction: Instead of defining ETL procedures at the database level, we use ontologies to organize and describe the medical concepts of both the source system and the target system. Instead of using unique, specifically developed SQL statements or ETL jobs, we define declarative transformation rules within ontologies and illustrate how these constructs can then be used to automatically generate SQL code to perform the desired ETL procedures. This demonstrates how a suitable level of abstraction may not only aid the interpretation of clinical data, but can also foster the reutilization of methods for un-locking it. PMID:25588043
NASA Astrophysics Data System (ADS)
Wang, Yongbo; Sheng, Yehua; Lu, Guonian; Tian, Peng; Zhang, Kai
2008-04-01
Surface reconstruction is an important task in the field of 3d-GIS, computer aided design and computer graphics (CAD & CG), virtual simulation and so on. Based on available incremental surface reconstruction methods, a feature-constrained surface reconstruction approach for point cloud is presented. Firstly features are extracted from point cloud under the rules of curvature extremes and minimum spanning tree. By projecting local sample points to the fitted tangent planes and using extracted features to guide and constrain the process of local triangulation and surface propagation, topological relationship among sample points can be achieved. For the constructed models, a process named consistent normal adjustment and regularization is adopted to adjust normal of each face so that the correct surface model is achieved. Experiments show that the presented approach inherits the convenient implementation and high efficiency of traditional incremental surface reconstruction method, meanwhile, it avoids improper propagation of normal across sharp edges, which means the applicability of incremental surface reconstruction is greatly improved. Above all, appropriate k-neighborhood can help to recognize un-sufficient sampled areas and boundary parts, the presented approach can be used to reconstruct both open and close surfaces without additional interference.
Optical 3D watermark based digital image watermarking for telemedicine
NASA Astrophysics Data System (ADS)
Li, Xiao Wei; Kim, Seok Tae
2013-12-01
Region of interest (ROI) of a medical image is an area including important diagnostic information and must be stored without any distortion. This algorithm for application of watermarking technique for non-ROI of the medical image preserving ROI. The paper presents a 3D watermark based medical image watermarking scheme. In this paper, a 3D watermark object is first decomposed into 2D elemental image array (EIA) by a lenslet array, and then the 2D elemental image array data is embedded into the host image. The watermark extraction process is an inverse process of embedding. The extracted EIA through the computational integral imaging reconstruction (CIIR) technique, the 3D watermark can be reconstructed. Because the EIA is composed of a number of elemental images possesses their own perspectives of a 3D watermark object. Even though the embedded watermark data badly damaged, the 3D virtual watermark can be successfully reconstructed. Furthermore, using CAT with various rule number parameters, it is possible to get many channels for embedding. So our method can recover the weak point having only one transform plane in traditional watermarking methods. The effectiveness of the proposed watermarking scheme is demonstrated with the aid of experimental results.
Detection and categorization of bacteria habitats using shallow linguistic analysis
2015-01-01
Background Information regarding bacteria biotopes is important for several research areas including health sciences, microbiology, and food processing and preservation. One of the challenges for scientists in these domains is the huge amount of information buried in the text of electronic resources. Developing methods to automatically extract bacteria habitat relations from the text of these electronic resources is crucial for facilitating research in these areas. Methods We introduce a linguistically motivated rule-based approach for recognizing and normalizing names of bacteria habitats in biomedical text by using an ontology. Our approach is based on the shallow syntactic analysis of the text that include sentence segmentation, part-of-speech (POS) tagging, partial parsing, and lemmatization. In addition, we propose two methods for identifying bacteria habitat localization relations. The underlying assumption for the first method is that discourse changes with a new paragraph. Therefore, it operates on a paragraph-basis. The second method performs a more fine-grained analysis of the text and operates on a sentence-basis. We also develop a novel anaphora resolution method for bacteria coreferences and incorporate it with the sentence-based relation extraction approach. Results We participated in the Bacteria Biotope (BB) Task of the BioNLP Shared Task 2013. Our system (Boun) achieved the second best performance with 68% Slot Error Rate (SER) in Sub-task 1 (Entity Detection and Categorization), and ranked third with an F-score of 27% in Sub-task 2 (Localization Event Extraction). This paper reports the system that is implemented for the shared task, including the novel methods developed and the improvements obtained after the official evaluation. The extensions include the expansion of the OntoBiotope ontology using the training set for Sub-task 1, and the novel sentence-based relation extraction method incorporated with anaphora resolution for Sub-task 2. These extensions resulted in promising results for Sub-task 1 with a SER of 68%, and state-of-the-art performance for Sub-task 2 with an F-score of 53%. Conclusions Our results show that a linguistically-oriented approach based on the shallow syntactic analysis of the text is as effective as machine learning approaches for the detection and ontology-based normalization of habitat entities. Furthermore, the newly developed sentence-based relation extraction system with the anaphora resolution module significantly outperforms the paragraph-based one, as well as the other systems that participated in the BB Shared Task 2013. PMID:26201262
Automatic rule generation for high-level vision
NASA Technical Reports Server (NTRS)
Rhee, Frank Chung-Hoon; Krishnapuram, Raghu
1992-01-01
Many high-level vision systems use rule-based approaches to solving problems such as autonomous navigation and image understanding. The rules are usually elaborated by experts. However, this procedure may be rather tedious. In this paper, we propose a method to generate such rules automatically from training data. The proposed method is also capable of filtering out irrelevant features and criteria from the rules.
Methods, systems, and computer program products for network firewall policy optimization
Fulp, Errin W [Winston-Salem, NC; Tarsa, Stephen J [Duxbury, MA
2011-10-18
Methods, systems, and computer program products for firewall policy optimization are disclosed. According to one method, a firewall policy including an ordered list of firewall rules is defined. For each rule, a probability indicating a likelihood of receiving a packet matching the rule is determined. The rules are sorted in order of non-increasing probability in a manner that preserves the firewall policy.
Automatic rule generation for high-level vision
NASA Technical Reports Server (NTRS)
Rhee, Frank Chung-Hoon; Krishnapuram, Raghu
1992-01-01
A new fuzzy set based technique that was developed for decision making is discussed. It is a method to generate fuzzy decision rules automatically for image analysis. This paper proposes a method to generate rule-based approaches to solve problems such as autonomous navigation and image understanding automatically from training data. The proposed method is also capable of filtering out irrelevant features and criteria from the rules.
An Integrated Children Disease Prediction Tool within a Special Social Network.
Apostolova Trpkovska, Marika; Yildirim Yayilgan, Sule; Besimi, Adrian
2016-01-01
This paper proposes a social network with an integrated children disease prediction system developed by the use of the specially designed Children General Disease Ontology (CGDO). This ontology consists of children diseases and their relationship with symptoms and Semantic Web Rule Language (SWRL rules) that are specially designed for predicting diseases. The prediction process starts by filling data about the appeared signs and symptoms by the user which are after that mapped with the CGDO ontology. Once the data are mapped, the prediction results are presented. The phase of prediction executes the rules which extract the predicted disease details based on the SWRL rule specified. The motivation behind the development of this system is to spread knowledge about the children diseases and their symptoms in a very simple way using the specialized social networking website www.emama.mk.
Computer-aided detection of prostate cancer in T2-weighted MRI within the peripheral zone
NASA Astrophysics Data System (ADS)
Rampun, Andrik; Zheng, Ling; Malcolm, Paul; Tiddeman, Bernie; Zwiggelaar, Reyer
2016-07-01
In this paper we propose a prostate cancer computer-aided diagnosis (CAD) system and suggest a set of discriminant texture descriptors extracted from T2-weighted MRI data which can be used as a good basis for a multimodality system. For this purpose, 215 texture descriptors were extracted and eleven different classifiers were employed to achieve the best possible results. The proposed method was tested based on 418 T2-weighted MR images taken from 45 patients and evaluated using 9-fold cross validation with five patients in each fold. The results demonstrated comparable results to existing CAD systems using multimodality MRI. We achieved an area under the receiver operating curve (A z ) values equal to 90.0%+/- 7.6% , 89.5%+/- 8.9% , 87.9%+/- 9.3% and 87.4%+/- 9.2% for Bayesian networks, ADTree, random forest and multilayer perceptron classifiers, respectively, while a meta-voting classifier using average probability as a combination rule achieved 92.7%+/- 7.4% .
NASA Astrophysics Data System (ADS)
Jung, Chinte; Sun, Chih-Hong
2006-10-01
Motivated by the increasing accessibility of technology, more and more spatial data are being made digitally available. How to extract the valuable knowledge from these large (spatial) databases is becoming increasingly important to businesses, as well. It is essential to be able to analyze and utilize these large datasets, convert them into useful knowledge, and transmit them through GIS-enabled instruments and the Internet, conveying the key information to business decision-makers effectively and benefiting business entities. In this research, we combine the techniques of GIS, spatial decision support system (SDSS), spatial data mining (SDM), and ArcGIS Server to achieve the following goals: (1) integrate databases from spatial and non-spatial datasets about the locations of businesses in Taipei, Taiwan; (2) use the association rules, one of the SDM methods, to extract the knowledge from the integrated databases; and (3) develop a Web-based SDSS GIService as a location-selection tool for business by the product of ArcGIS Server.
Misra, Dharitri; Chen, Siyuan; Thoma, George R.
2010-01-01
One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques. At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts. In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system. PMID:21179386
Assessment of spatial information for hyperspectral imaging of lesion
NASA Astrophysics Data System (ADS)
Yang, Xue; Li, Gang; Lin, Ling
2016-10-01
Multiple diseases such as breast tumor poses a great threat to women's health and life, while the traditional detection method is complex, costly and unsuitable for frequently self-examination, therefore, an inexpensive, convenient and efficient method for tumor self-inspection is needed urgently, and lesion localization is an important step. This paper proposes an self-examination method for positioning of a lesion. The method adopts transillumination to acquire the hyperspectral images and to assess the spatial information of lesion. Firstly, multi-wavelength sources are modulated with frequency division, which is advantageous to separate images of different wavelength, meanwhile, the source serves as fill light to each other to improve the sensitivity in the low-lightlevel imaging. Secondly, the signal-to-noise ratio of transmitted images after demodulation are improved by frame accumulation technology. Next, gray distributions of transmitted images are analyzed. The gray-level differences is constituted by the actual transmitted images and fitting transmitted images of tissue without lesion, which is to rule out individual differences. Due to scattering effect, there will be transition zones between tissue and lesion, and the zone changes with wavelength change, which will help to identify the structure details of lesion. Finally, image segmentation is adopted to extract the lesion and the transition zones, and the spatial features of lesion are confirmed according to the transition zones and the differences of transmitted light intensity distributions. Experiment using flat-shaped tissue as an example shows that the proposed method can extract the space information of lesion.
Assembly of objects with not fully predefined shapes
NASA Technical Reports Server (NTRS)
Arlotti, M. A.; Dimartino, V.
1989-01-01
An assembly problem in a non-deterministic environment, i.e., where parts to be assembled have unknown shape, size and location, is described. The only knowledge used by the robot to perform the assembly operation is given by a connectivity rule and geometrical constraints concerning parts. Once a set of geometrical features of parts has been extracted by a vision system, applying such a rule allows the dtermination of the composition sequence. A suitable sensory apparatus allows the control the whole operation.
2015-01-01
Background Modern methods for mining biomolecular interactions from literature typically make predictions based solely on the immediate textual context, in effect a single sentence. No prior work has been published on extending this context to the information automatically gathered from the whole biomedical literature. Thus, our motivation for this study is to explore whether mutually supporting evidence, aggregated across several documents can be utilized to improve the performance of the state-of-the-art event extraction systems. In this paper, we describe our participation in the latest BioNLP Shared Task using the large-scale text mining resource EVEX. We participated in the Genia Event Extraction (GE) and Gene Regulation Network (GRN) tasks with two separate systems. In the GE task, we implemented a re-ranking approach to improve the precision of an existing event extraction system, incorporating features from the EVEX resource. In the GRN task, our system relied solely on the EVEX resource and utilized a rule-based conversion algorithm between the EVEX and GRN formats. Results In the GE task, our re-ranking approach led to a modest performance increase and resulted in the first rank of the official Shared Task results with 50.97% F-score. Additionally, in this paper we explore and evaluate the usage of distributed vector representations for this challenge. In the GRN task, we ranked fifth in the official results with a strict/relaxed SER score of 0.92/0.81 respectively. To try and improve upon these results, we have implemented a novel machine learning based conversion system and benchmarked its performance against the original rule-based system. Conclusions For the GRN task, we were able to produce a gene regulatory network from the EVEX data, warranting the use of such generic large-scale text mining data in network biology settings. A detailed performance and error analysis provides more insight into the relatively low recall rates. In the GE task we demonstrate that both the re-ranking approach and the word vectors can provide slight performance improvement. A manual evaluation of the re-ranking results pinpoints some of the challenges faced in applying large-scale text mining knowledge to event extraction. PMID:26551766
Method, systems, and computer program products for implementing function-parallel network firewall
Fulp, Errin W [Winston-Salem, NC; Farley, Ryan J [Winston-Salem, NC
2011-10-11
Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.
Applying the Rule Space Model to Develop a Learning Progression for Thermochemistry
NASA Astrophysics Data System (ADS)
Chen, Fu; Zhang, Shanshan; Guo, Yanfang; Xin, Tao
2017-12-01
We used the Rule Space Model, a cognitive diagnostic model, to measure the learning progression for thermochemistry for senior high school students. We extracted five attributes and proposed their hierarchical relationships to model the construct of thermochemistry at four levels using a hypothesized learning progression. For this study, we developed 24 test items addressing the attributes of exothermic and endothermic reactions, chemical bonds and heat quantity change, reaction heat and enthalpy, thermochemical equations, and Hess's law. The test was administered to a sample base of 694 senior high school students taught in 3 schools across 2 cities. Results based on the Rule Space Model analysis indicated that (1) the test items developed by the Rule Space Model were of high psychometric quality for good analysis of difficulties, discriminations, reliabilities, and validities; (2) the Rule Space Model analysis classified the students into seven different attribute mastery patterns; and (3) the initial hypothesized learning progression was modified by the attribute mastery patterns and the learning paths to be more precise and detailed.
Verification and Validation of KBS with Neural Network Components
NASA Technical Reports Server (NTRS)
Wen, Wu; Callahan, John
1996-01-01
Artificial Neural Network (ANN) play an important role in developing robust Knowledge Based Systems (KBS). The ANN based components used in these systems learn to give appropriate predictions through training with correct input-output data patterns. Unlike traditional KBS that depends on a rule database and a production engine, the ANN based system mimics the decisions of an expert without specifically formulating the if-than type of rules. In fact, the ANNs demonstrate their superiority when such if-then type of rules are hard to generate by human expert. Verification of traditional knowledge based system is based on the proof of consistency and completeness of the rule knowledge base and correctness of the production engine.These techniques, however, can not be directly applied to ANN based components.In this position paper, we propose a verification and validation procedure for KBS with ANN based components. The essence of the procedure is to obtain an accurate system specification through incremental modification of the specifications using an ANN rule extraction algorithm.
Change Detection of Remote Sensing Images by Dt-Cwt and Mrf
NASA Astrophysics Data System (ADS)
Ouyang, S.; Fan, K.; Wang, H.; Wang, Z.
2017-05-01
Aiming at the significant loss of high frequency information during reducing noise and the pixel independence in change detection of multi-scale remote sensing image, an unsupervised algorithm is proposed based on the combination between Dual-tree Complex Wavelet Transform (DT-CWT) and Markov random Field (MRF) model. This method first performs multi-scale decomposition for the difference image by the DT-CWT and extracts the change characteristics in high-frequency regions by using a MRF-based segmentation algorithm. Then our method estimates the final maximum a posterior (MAP) according to the segmentation algorithm of iterative condition model (ICM) based on fuzzy c-means(FCM) after reconstructing the high-frequency and low-frequency sub-bands of each layer respectively. Finally, the method fuses the above segmentation results of each layer by using the fusion rule proposed to obtain the mask of the final change detection result. The results of experiment prove that the method proposed is of a higher precision and of predominant robustness properties.
D'Avolio, Leonard W; Nguyen, Thien M; Goryachev, Sergey; Fiore, Louis D
2011-01-01
Despite at least 40 years of promising empirical performance, very few clinical natural language processing (NLP) or information extraction systems currently contribute to medical science or care. The authors address this gap by reducing the need for custom software and rules development with a graphical user interface-driven, highly generalizable approach to concept-level retrieval. A 'learn by example' approach combines features derived from open-source NLP pipelines with open-source machine learning classifiers to automatically and iteratively evaluate top-performing configurations. The Fourth i2b2/VA Shared Task Challenge's concept extraction task provided the data sets and metrics used to evaluate performance. Top F-measure scores for each of the tasks were medical problems (0.83), treatments (0.82), and tests (0.83). Recall lagged precision in all experiments. Precision was near or above 0.90 in all tasks. Discussion With no customization for the tasks and less than 5 min of end-user time to configure and launch each experiment, the average F-measure was 0.83, one point behind the mean F-measure of the 22 entrants in the competition. Strong precision scores indicate the potential of applying the approach for more specific clinical information extraction tasks. There was not one best configuration, supporting an iterative approach to model creation. Acceptable levels of performance can be achieved using fully automated and generalizable approaches to concept-level information extraction. The described implementation and related documentation is available for download.
Treating Zc(3900 ) and Z (4430 ) as the ground state and first radially excited tetraquarks
NASA Astrophysics Data System (ADS)
Agaev, S. S.; Azizi, K.; Sundu, H.
2017-08-01
Exploration of the resonances Zc(3900 ) and Z (4430 ) are performed by assuming that they are the ground state and first radial excitation of the same tetraquark with JP=1+. The mass and current coupling of the Zc(3900 ) and Z (4430 ) states are calculated using the QCD two-point sum rule method by taking into account vacuum condensates up to eight dimensions. We investigate the vertices ZcMhMl and Z MhMl, with Mh and Ml being the heavy and light mesons and evaluate the strong couplings gZcMhMl and gZ MhMl using the QCD sum rule on the light cone. The extracted couplings allow us to find the partial width of the decays Zc(3900 )→J /ψ π , ψ'π , ηcρ and Z (4430 )→ψ'π , J /ψ π , ηc'ρ , ηcρ , which may help in comprehensive investigation of these resonances. We compare the width of the decays of Zc(3900 ) and Z (4430 ) resonances with available experimental data as well as existing theoretical predictions.
Modelling of Indoor Environments Using Lindenmayer Systems
NASA Astrophysics Data System (ADS)
Peter, M.
2017-09-01
Documentation of the "as-built" state of building interiors has gained a lot of interest in the recent years. Various data acquisition methods exist, e.g. the extraction from photographed evacuation plans using image processing or, most prominently, indoor mobile laser scanning. Due to clutter or data gaps as well as errors during data acquisition and processing, automatic reconstruction of CAD/BIM-like models from these data sources is not a trivial task. Thus it is often tried to support reconstruction by general rules for the perpendicularity and parallelism which are predominant in man-made structures. Indoor environments of large, public buildings, however, often also follow higher-level rules like symmetry and repetition of e.g. room sizes and corridor widths. In the context of reconstruction of city city elements (e.g. street networks) or building elements (e.g. façade layouts), formal grammars have been put to use. In this paper, we describe the use of Lindenmayer systems - which originally have been developed for the computer-based modelling of plant growth - to model and reproduce the layout of indoor environments in 2D.
Method and system for analyzing and classifying electronic information
McGaffey, Robert W.; Bell, Michael Allen; Kortman, Peter J.; Wilson, Charles H.
2003-04-29
A data analysis and classification system that reads the electronic information, analyzes the electronic information according to a user-defined set of logical rules, and returns a classification result. The data analysis and classification system may accept any form of computer-readable electronic information. The system creates a hash table wherein each entry of the hash table contains a concept corresponding to a word or phrase which the system has previously encountered. The system creates an object model based on the user-defined logical associations, used for reviewing each concept contained in the electronic information in order to determine whether the electronic information is classified. The data analysis and classification system extracts each concept in turn from the electronic information, locates it in the hash table, and propagates it through the object model. In the event that the system can not find the electronic information token in the hash table, that token is added to a missing terms list. If any rule is satisfied during propagation of the concept through the object model, the electronic information is classified.
Classification of hadith into positive suggestion, negative suggestion, and information
NASA Astrophysics Data System (ADS)
Faraby, Said Al; Riviera Rachmawati Jasin, Eliza; Kusumaningrum, Andina; Adiwijaya
2018-03-01
As one of the Muslim life guidelines, based on the meaning of its sentence(s), a hadith can be viewed as a suggestion for doing something, or a suggestion for not doing something, or just information without any suggestion. In this paper, we tried to classify the Bahasa translation of hadith into the three categories using machine learning approach. We tried stemming and stopword removal in preprocessing, and TF-IDF of unigram, bigram, and trigram as the extracted features. As the classifier, we compared between SVM and Neural Network. Since the categories are new, so in order to compare the results of the previous pipelines, we created a baseline classifier using simple rule-based string matching technique. The rule-based algorithm conditions on the occurrence of words such as “janganlah, sholatlah, and so on” to determine the category. The baseline method achieved F1-Score of 0.69, while the best F1-Score from the machine learning approach was 0.88, and it was produced by SVM model with the linear kernel.
NASA Astrophysics Data System (ADS)
Bruckner, B.; Roth, D.; Goebl, D.; Bauer, P.; Primetzhofer, D.
2018-05-01
Electronic stopping measurements in chemically reactive targets, e.g., transition and rare earth metals are challenging. These metals often contain low Z impurities, which contribute to electronic stopping. In this article, we present two ways how one can correct for the presence of impurities in the evaluation of proton and He stopping in Ni for primary energies between 1 and 100 keV, either considering or ignoring the contribution of the low Z impurities to multiple scattering. We find, that for protons either method leads to concordant results, but for heavier projectiles, e.g. He ions, the influence on multiple scattering must not be neglected.
Selective Extraction of Metals from Pacific Sea Nodules with Dissolved Sulfur Dioxide
NASA Astrophysics Data System (ADS)
Khalafalla, Sanaa E.; Pahlman, John E.
1981-08-01
How to tritrate a rock? … The following article illustrates the possibility of titrating a metallic constituent in a mineral with a selective reagent to an endpoint of near complete metal extraction. A very rapid and efficient—almost instantaneous and quantitative—method has been devised to differentially leach manganese, nickel, and cobalt to the exclusion of copper and iron from deep-sea nodules.1 In this method, a given weight of raw sea nodules ground to -200 mesh in an aqueous slurry is contacted for 10 min at room temperature and ambient pressure with a specified quantity of SO2. An independent leaching parameter R has been defined as the ratio of the number of moles of SO2 in the leaching solution to the weight of sea nodules. Variation of metal extraction with R generates sigmoidal curves characteristic of the metals extracted. A threshold value of R is required to initiate the leaching of a given metal from the mixed oxides. Once this threshold is reached, the metal recovery can rise above 95% in less than 10 minutes. For increasing value of R the extractability of various metals from Pacific sea nodules by SO2 follows the order: Mn > Ni > Co ≫ Fe, Al, Cu. Disparity in the R values permits a variety of selective leaching systems and metal separations simply by changing the quantity of SO2 in the contacting solution. Success in this leaching system depends on comminuting the nodules to less than 100 mesh. Above this critical size, leaching is slowed due to the inaccessibility of the inner particle reacting groups to the SO2 leaching agent, resulting in lower and nonselective extractions of preferred metal values. Leaching with HCl solutions of the same pH level as dissolved SO2 yielded mixed, slow, and incomplete metal extractions. This finding rules out any interpretation based on hydrogen ion from the ionization of sulfurous acid as the leaching agent. The leaching curves observed in the new system resemble the complexometric titration curves of heavy metals with specific coordination species.
Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan; Mallik, Saurav; Bhadra, Tapas; Mukherji, Ayan
2018-04-01
Association rule mining is an important technique for identifying interesting relationships between gene pairs in a biological data set. Earlier methods basically work for a single biological data set, and, in maximum cases, a single minimum support cutoff can be applied globally, i.e., across all genesets/itemsets. To overcome this limitation, in this paper, we propose dynamic threshold-based FP-growth rule mining algorithm that integrates gene expression, methylation and protein-protein interaction profiles based on weighted shortest distance to find the novel associations among different pairs of genes in multi-view data sets. For this purpose, we introduce three new thresholds, namely, Distance-based Variable/Dynamic Supports (DVS), Distance-based Variable Confidences (DVC), and Distance-based Variable Lifts (DVL) for each rule by integrating co-expression, co-methylation, and protein-protein interactions existed in the multi-omics data set. We develop the proposed algorithm utilizing these three novel multiple threshold measures. In the proposed algorithm, the values of , , and are computed for each rule separately, and subsequently it is verified whether the support, confidence, and lift of each evolved rule are greater than or equal to the corresponding individual , , and values, respectively, or not. If all these three conditions for a rule are found to be true, the rule is treated as a resultant rule. One of the major advantages of the proposed method compared with other related state-of-the-art methods is that it considers both the quantitative and interactive significance among all pairwise genes belonging to each rule. Moreover, the proposed method generates fewer rules, takes less running time, and provides greater biological significance for the resultant top-ranking rules compared to previous methods.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-12-24
...: http://www.epa.gov/dockets . Abstract: The sources subject to this rule (i.e., extraction plants, ceramic plants, foundries, incinerators, propellant plants, and machine shops which process beryllium and...
Hotz, Christine S; Templeton, Steven J; Christopher, Mary M
2005-03-01
A rule-based expert system using CLIPS programming language was created to classify body cavity effusions as transudates, modified transudates, exudates, chylous, and hemorrhagic effusions. The diagnostic accuracy of the rule-based system was compared with that produced by 2 machine-learning methods: Rosetta, a rough sets algorithm and RIPPER, a rule-induction method. Results of 508 body cavity fluid analyses (canine, feline, equine) obtained from the University of California-Davis Veterinary Medical Teaching Hospital computerized patient database were used to test CLIPS and to test and train RIPPER and Rosetta. The CLIPS system, using 17 rules, achieved an accuracy of 93.5% compared with pathologist consensus diagnoses. Rosetta accurately classified 91% of effusions by using 5,479 rules. RIPPER achieved the greatest accuracy (95.5%) using only 10 rules. When the original rules of the CLIPS application were replaced with those of RIPPER, the accuracy rates were identical. These results suggest that both rule-based expert systems and machine-learning methods hold promise for the preliminary classification of body fluids in the clinical laboratory.
The effects of cumulative practice on mathematics problem solving.
Mayfield, Kristin H; Chase, Philip N
2002-01-01
This study compared three different methods of teaching five basic algebra rules to college students. All methods used the same procedures to teach the rules and included four 50-question review sessions interspersed among the training of the individual rules. The differences among methods involved the kinds of practice provided during the four review sessions. Participants who received cumulative practice answered 50 questions covering a mix of the rules learned prior to each review session. Participants who received a simple review answered 50 questions on one previously trained rule. Participants who received extra practice answered 50 extra questions on the rule they had just learned. Tests administered after each review included new questions for applying each rule (application items) and problems that required novel combinations of the rules (problem-solving items). On the final test, the cumulative group outscored the other groups on application and problem-solving items. In addition, the cumulative group solved the problem-solving items significantly faster than the other groups. These results suggest that cumulative practice of component skills is an effective method of training problem solving.
The effects of cumulative practice on mathematics problem solving.
Mayfield, Kristin H; Chase, Philip N
2002-01-01
This study compared three different methods of teaching five basic algebra rules to college students. All methods used the same procedures to teach the rules and included four 50-question review sessions interspersed among the training of the individual rules. The differences among methods involved the kinds of practice provided during the four review sessions. Participants who received cumulative practice answered 50 questions covering a mix of the rules learned prior to each review session. Participants who received a simple review answered 50 questions on one previously trained rule. Participants who received extra practice answered 50 extra questions on the rule they had just learned. Tests administered after each review included new questions for applying each rule (application items) and problems that required novel combinations of the rules (problem-solving items). On the final test, the cumulative group outscored the other groups on application and problem-solving items. In addition, the cumulative group solved the problem-solving items significantly faster than the other groups. These results suggest that cumulative practice of component skills is an effective method of training problem solving. PMID:12102132
DesAutels, Spencer J; Fox, Zachary E; Giuse, Dario A; Williams, Annette M; Kou, Qing-Hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia
2016-01-01
Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems.
Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network.
Yang, Zhongliang; Huang, Yongfeng; Jiang, Yiran; Sun, Yuxi; Zhang, Yu-Jin; Luo, Pengcheng
2018-04-20
Automatically extracting useful information from electronic medical records along with conducting disease diagnoses is a promising task for both clinical decision support(CDS) and neural language processing(NLP). Most of the existing systems are based on artificially constructed knowledge bases, and then auxiliary diagnosis is done by rule matching. In this study, we present a clinical intelligent decision approach based on Convolutional Neural Networks(CNN), which can automatically extract high-level semantic information of electronic medical records and then perform automatic diagnosis without artificial construction of rules or knowledge bases. We use collected 18,590 copies of the real-world clinical electronic medical records to train and test the proposed model. Experimental results show that the proposed model can achieve 98.67% accuracy and 96.02% recall, which strongly supports that using convolutional neural network to automatically learn high-level semantic features of electronic medical records and then conduct assist diagnosis is feasible and effective.
Intelligent bandwidth compression
NASA Astrophysics Data System (ADS)
Tseng, D. Y.; Bullock, B. L.; Olin, K. E.; Kandt, R. K.; Olsen, J. D.
1980-02-01
The feasibility of a 1000:1 bandwidth compression ratio for image transmission has been demonstrated using image-analysis algorithms and a rule-based controller. Such a high compression ratio was achieved by first analyzing scene content using auto-cueing and feature-extraction algorithms, and then transmitting only the pertinent information consistent with mission requirements. A rule-based controller directs the flow of analysis and performs priority allocations on the extracted scene content. The reconstructed bandwidth-compressed image consists of an edge map of the scene background, with primary and secondary target windows embedded in the edge map. The bandwidth-compressed images are updated at a basic rate of 1 frame per second, with the high-priority target window updated at 7.5 frames per second. The scene-analysis algorithms used in this system together with the adaptive priority controller are described. Results of simulated 1000:1 bandwidth-compressed images are presented.
Symbolic rule-based classification of lung cancer stages from free-text pathology reports.
Nguyen, Anthony N; Lawley, Michael J; Hansen, David P; Bowman, Rayleen V; Clarke, Belinda E; Duhig, Edwina E; Colquist, Shoni
2010-01-01
To classify automatically lung tumor-node-metastases (TNM) cancer stages from free-text pathology reports using symbolic rule-based classification. By exploiting report substructure and the symbolic manipulation of systematized nomenclature of medicine-clinical terms (SNOMED CT) concepts in reports, statements in free text can be evaluated for relevance against factors relating to the staging guidelines. Post-coordinated SNOMED CT expressions based on templates were defined and populated by concepts in reports, and tested for subsumption by staging factors. The subsumption results were used to build logic according to the staging guidelines to calculate the TNM stage. The accuracy measure and confusion matrices were used to evaluate the TNM stages classified by the symbolic rule-based system. The system was evaluated against a database of multidisciplinary team staging decisions and a machine learning-based text classification system using support vector machines. Overall accuracy on a corpus of pathology reports for 718 lung cancer patients against a database of pathological TNM staging decisions were 72%, 78%, and 94% for T, N, and M staging, respectively. The system's performance was also comparable to support vector machine classification approaches. A system to classify lung TNM stages from free-text pathology reports was developed, and it was verified that the symbolic rule-based approach using SNOMED CT can be used for the extraction of key lung cancer characteristics from free-text reports. Future work will investigate the applicability of using the proposed methodology for extracting other cancer characteristics and types.
Wild Birds Use an Ordering Rule to Decode Novel Call Sequences.
Suzuki, Toshitaka N; Wheatcroft, David; Griesser, Michael
2017-08-07
The generative power of human language depends on grammatical rules, such as word ordering, that allow us to produce and comprehend even novel combinations of words [1-3]. Several species of birds and mammals produce sequences of calls [4-6], and, like words in human sentences, their order may influence receiver responses [7]. However, it is unknown whether animals use call ordering to extract meaning from truly novel sequences. Here, we use a novel experimental approach to test this in a wild bird species, the Japanese tit (Parus minor). Japanese tits are attracted to mobbing a predator when they hear conspecific alert and recruitment calls ordered as alert-recruitment sequences [7]. They also approach in response to recruitment calls of heterospecific individuals in mixed-species flocks [8, 9]. Using experimental playbacks, we assess their responses to artificial sequences in which their own alert calls are combined into different orderings with heterospecific recruitment calls. We find that Japanese tits respond similarly to mixed-species alert-recruitment call sequences and to their own alert-recruitment sequences. Importantly, however, tits rarely respond to mixed-species sequences in which the call order is reversed. Thus, Japanese tits extract a compound meaning from novel call sequences using an ordering rule. These results demonstrate a new parallel between animal communication systems and human language, opening new avenues for exploring the evolution of ordering rules and compositionality in animal vocal sequences. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Shauly, Eitan N.; Levi, Shimon; Schwarzband, Ishai; Adan, Ofer; Latinsky, Sergey
2015-04-01
A fully automated silicon-based methodology for systematic analysis of electrical features is shown. The system was developed for process monitoring and electrical variability reduction. A mapping step was created by dedicated structures such as static-random-access-memory (SRAM) array or standard cell library, or by using a simple design rule checking run-set. The resulting database was then used as an input for choosing locations for critical dimension scanning electron microscope images and for specific layout parameter extraction then was input to SPICE compact modeling simulation. Based on the experimental data, we identified two items that must be checked and monitored using the method described here: transistor's sensitivity to the distance between the poly end cap and edge of active area (AA) due to AA rounding, and SRAM leakage due to a too close N-well to P-well. Based on this example, for process monitoring and variability analyses, we extensively used this method to analyze transistor gates having different shapes. In addition, analysis for a large area of high density standard cell library was done. Another set of monitoring focused on a high density SRAM array is also presented. These examples provided information on the poly and AA layers, using transistor parameters such as leakage current and drive current. We successfully define "robust" and "less-robust" transistor configurations included in the library and identified unsymmetrical transistors in the SRAM bit-cells. These data were compared to data extracted from the same devices at the end of the line. Another set of analyses was done to samples after Cu M1 etch. Process monitoring information on M1 enclosed contact was extracted based on contact resistance as a feedback. Guidelines for the optimal M1 space for different layout configurations were also extracted. All these data showed the successful in-field implementation of our methodology as a useful process monitoring method.
Discovering Fine-grained Sentiment in Suicide Notes
Wang, Wenbo; Chen, Lu; Tan, Ming; Wang, Shaojun; Sheth, Amit P.
2012-01-01
This paper presents our solution for the i2b2 sentiment classification challenge. Our hybrid system consists of machine learning and rule-based classifiers. For the machine learning classifier, we investigate a variety of lexical, syntactic and knowledge-based features, and show how much these features contribute to the performance of the classifier through experiments. For the rule-based classifier, we propose an algorithm to automatically extract effective syntactic and lexical patterns from training examples. The experimental results show that the rule-based classifier outperforms the baseline machine learning classifier using unigram features. By combining the machine learning classifier and the rule-based classifier, the hybrid system gains a better trade-off between precision and recall, and yields the highest micro-averaged F-measure (0.5038), which is better than the mean (0.4875) and median (0.5027) micro-average F-measures among all participating teams. PMID:22879770
Generating Concise Rules for Human Motion Retrieval
NASA Astrophysics Data System (ADS)
Mukai, Tomohiko; Wakisaka, Ken-Ichi; Kuriyama, Shigeru
This paper proposes a method for retrieving human motion data with concise retrieval rules based on the spatio-temporal features of motion appearance. Our method first converts motion clip into a form of clausal language that represents geometrical relations between body parts and their temporal relationship. A retrieval rule is then learned from the set of manually classified examples using inductive logic programming (ILP). ILP automatically discovers the essential rule in the same clausal form with a user-defined hypothesis-testing procedure. All motions are indexed using this clausal language, and the desired clips are retrieved by subsequence matching using the rule. Such rule-based retrieval offers reasonable performance and the rule can be intuitively edited in the same language form. Consequently, our method enables efficient and flexible search from a large dataset with simple query language.
Code of Federal Regulations, 2010 CFR
2010-04-01
... to reasonable funding methods. 1.412(c)(3)-2 Section 1.412(c)(3)-2 Internal Revenue INTERNAL REVENUE... reasonable funding methods. (a) Introduction. This section prescribes effective dates for rules relating to reasonable funding methods, under section 412(c)(3) and § 1.412(c)(3)-1. Also, this section sets forth rules...
Effectiveness of feature and classifier algorithms in character recognition systems
NASA Astrophysics Data System (ADS)
Wilson, Charles L.
1993-04-01
At the first Census Optical Character Recognition Systems Conference, NIST generated accuracy data for more than character recognition systems. Most systems were tested on the recognition of isolated digits and upper and lower case alphabetic characters. The recognition experiments were performed on sample sizes of 58,000 digits, and 12,000 upper and lower case alphabetic characters. The algorithms used by the 26 conference participants included rule-based methods, image-based methods, statistical methods, and neural networks. The neural network methods included Multi-Layer Perceptron's, Learned Vector Quantitization, Neocognitrons, and cascaded neural networks. In this paper 11 different systems are compared using correlations between the answers of different systems, comparing the decrease in error rate as a function of confidence of recognition, and comparing the writer dependence of recognition. This comparison shows that methods that used different algorithms for feature extraction and recognition performed with very high levels of correlation. This is true for neural network systems, hybrid systems, and statistically based systems, and leads to the conclusion that neural networks have not yet demonstrated a clear superiority to more conventional statistical methods. Comparison of these results with the models of Vapnick (for estimation problems), MacKay (for Bayesian statistical models), Moody (for effective parameterization), and Boltzmann models (for information content) demonstrate that as the limits of training data variance are approached, all classifier systems have similar statistical properties. The limiting condition can only be approached for sufficiently rich feature sets because the accuracy limit is controlled by the available information content of the training set, which must pass through the feature extraction process prior to classification.
Discovering body site and severity modifiers in clinical texts
Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana K
2014-01-01
Objective To research computational methods for discovering body site and severity modifiers in clinical texts. Methods We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. Results The performance of our method for discovering body site modifiers achieves F1 of 0.740–0.908 and our method for discovering severity modifiers achieves F1 of 0.905–0.929. Discussion Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. Conclusions We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES). PMID:24091648
Garcia, Ernest V.; Taylor, Andrew; Manatunga, Daya; Folks, Russell
2013-01-01
The purposes of this study were to describe and evaluate a software engine to justify the conclusions reached by a renal expert system (RENEX) for assessing patients with suspected renal obstruction and to obtain from this evaluation new knowledge that can be incorporated into RENEX to attempt to improve diagnostic performance. Methods RENEX consists of 60 heuristic rules extracted from the rules used by a domain expert to generate the knowledge base and a forward-chaining inference engine to determine obstruction. The justification engine keeps track of the sequence of the rules that are instantiated to reach a conclusion. The interpreter can then request justification by clicking on the specific conclusion. The justification process then reports the English translation of all concatenated rules instantiated to reach that conclusion. The justification engine was evaluated with a prospective group of 60 patients (117 kidneys). After reviewing the standard renal mercaptoacetyltriglycine (MAG3) scans obtained before and after the administration of furosemide, a masked expert determined whether each kidney was obstructed, whether the results were equivocal, or whether the kidney was not obstructed and identified and ranked the main variables associated with each interpretation. Two parameters were then tabulated: the frequency with which the main variables associated with obstruction by the expert were also justified by RENEX and the frequency with which the justification rules provided by RENEX were deemed to be correct by the expert. Only when RENEX and the domain expert agreed on the diagnosis (87 kidneys) were the results used to test the justification. Results RENEX agreed with 91% (184/203) of the rules supplied by the expert for justifying the diagnosis. RENEX provided 103 additional rules justifying the diagnosis; the expert agreed that 102 (99%) were correct, although the rules were considered to be of secondary importance. Conclusion We have described and evaluated a software engine to justify the conclusions of RENEX for detecting renal obstruction with MAG3 renal scans obtained before and after the administration of furosemide. This tool is expected to increase physician confidence in the interpretations provided by RENEX and to assist physicians and trainees in gaining a higher level of expertise. PMID:17332625
NASA Astrophysics Data System (ADS)
Nehm, Ross H.; Haertig, Hendrik
2012-02-01
Our study examines the efficacy of Computer Assisted Scoring (CAS) of open-response text relative to expert human scoring within the complex domain of evolutionary biology. Specifically, we explored whether CAS can diagnose the explanatory elements (or Key Concepts) that comprise undergraduate students' explanatory models of natural selection with equal fidelity as expert human scorers in a sample of >1,000 essays. We used SPSS Text Analysis 3.0 to perform our CAS and measure Kappa values (inter-rater reliability) of KC detection (i.e., computer-human rating correspondence). Our first analysis indicated that the text analysis functions (or extraction rules) developed and deployed in SPSSTA to extract individual Key Concepts (KCs) from three different items differing in several surface features (e.g., taxon, trait, type of evolutionary change) produced "substantial" (Kappa 0.61-0.80) or "almost perfect" (0.81-1.00) agreement. The second analysis explored the measurement of human-computer correspondence for KC diversity (the number of different accurate knowledge elements) in the combined sample of all 827 essays. Here we found outstanding correspondence; extraction rules generated using one prompt type are broadly applicable to other evolutionary scenarios (e.g., bacterial resistance, cheetah running speed, etc.). This result is encouraging, as it suggests that the development of new item sets may not necessitate the development of new text analysis rules. Overall, our findings suggest that CAS tools such as SPSS Text Analysis may compensate for some of the intrinsic limitations of currently used multiple-choice Concept Inventories designed to measure student knowledge of natural selection.
Using uncertainty to link and rank evidence from biomedical literature for model curation
Zerva, Chrysoula; Batista-Navarro, Riza; Day, Philip; Ananiadou, Sophia
2017-01-01
Abstract Motivation In recent years, there has been great progress in the field of automated curation of biomedical networks and models, aided by text mining methods that provide evidence from literature. Such methods must not only extract snippets of text that relate to model interactions, but also be able to contextualize the evidence and provide additional confidence scores for the interaction in question. Although various approaches calculating confidence scores have focused primarily on the quality of the extracted information, there has been little work on exploring the textual uncertainty conveyed by the author. Despite textual uncertainty being acknowledged in biomedical text mining as an attribute of text mined interactions (events), it is significantly understudied as a means of providing a confidence measure for interactions in pathways or other biomedical models. In this work, we focus on improving identification of textual uncertainty for events and explore how it can be used as an additional measure of confidence for biomedical models. Results We present a novel method for extracting uncertainty from the literature using a hybrid approach that combines rule induction and machine learning. Variations of this hybrid approach are then discussed, alongside their advantages and disadvantages. We use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction. Our approach achieves F-scores of 0.76 and 0.88 based on the BioNLP-ST and Genia-MK corpora, respectively, making considerable improvements over previously published work. Moreover, we evaluate our proposed system on pathways related to two different areas, namely leukemia and melanoma cancer research. Availability and implementation The leukemia pathway model used is available in Pathway Studio while the Ras model is available via PathwayCommons. Online demonstration of the uncertainty extraction system is available for research purposes at http://argo.nactem.ac.uk/test. The related code is available on https://github.com/c-zrv/uncertainty_components.git. Details on the above are available in the Supplementary Material. Contact sophia.ananiadou@manchester.ac.uk Supplementary information Supplementary data are available at Bioinformatics online. PMID:29036627
Using uncertainty to link and rank evidence from biomedical literature for model curation.
Zerva, Chrysoula; Batista-Navarro, Riza; Day, Philip; Ananiadou, Sophia
2017-12-01
In recent years, there has been great progress in the field of automated curation of biomedical networks and models, aided by text mining methods that provide evidence from literature. Such methods must not only extract snippets of text that relate to model interactions, but also be able to contextualize the evidence and provide additional confidence scores for the interaction in question. Although various approaches calculating confidence scores have focused primarily on the quality of the extracted information, there has been little work on exploring the textual uncertainty conveyed by the author. Despite textual uncertainty being acknowledged in biomedical text mining as an attribute of text mined interactions (events), it is significantly understudied as a means of providing a confidence measure for interactions in pathways or other biomedical models. In this work, we focus on improving identification of textual uncertainty for events and explore how it can be used as an additional measure of confidence for biomedical models. We present a novel method for extracting uncertainty from the literature using a hybrid approach that combines rule induction and machine learning. Variations of this hybrid approach are then discussed, alongside their advantages and disadvantages. We use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction. Our approach achieves F-scores of 0.76 and 0.88 based on the BioNLP-ST and Genia-MK corpora, respectively, making considerable improvements over previously published work. Moreover, we evaluate our proposed system on pathways related to two different areas, namely leukemia and melanoma cancer research. The leukemia pathway model used is available in Pathway Studio while the Ras model is available via PathwayCommons. Online demonstration of the uncertainty extraction system is available for research purposes at http://argo.nactem.ac.uk/test. The related code is available on https://github.com/c-zrv/uncertainty_components.git. Details on the above are available in the Supplementary Material. sophia.ananiadou@manchester.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Event Recognition Based on Deep Learning in Chinese Texts
Zhang, Yajun; Liu, Zongtian; Zhou, Wen
2016-01-01
Event recognition is the most fundamental and critical task in event-based natural language processing systems. Existing event recognition methods based on rules and shallow neural networks have certain limitations. For example, extracting features using methods based on rules is difficult; methods based on shallow neural networks converge too quickly to a local minimum, resulting in low recognition precision. To address these problems, we propose the Chinese emergency event recognition model based on deep learning (CEERM). Firstly, we use a word segmentation system to segment sentences. According to event elements labeled in the CEC 2.0 corpus, we classify words into five categories: trigger words, participants, objects, time and location. Each word is vectorized according to the following six feature layers: part of speech, dependency grammar, length, location, distance between trigger word and core word and trigger word frequency. We obtain deep semantic features of words by training a feature vector set using a deep belief network (DBN), then analyze those features in order to identify trigger words by means of a back propagation neural network. Extensive testing shows that the CEERM achieves excellent recognition performance, with a maximum F-measure value of 85.17%. Moreover, we propose the dynamic-supervised DBN, which adds supervised fine-tuning to a restricted Boltzmann machine layer by monitoring its training performance. Test analysis reveals that the new DBN improves recognition performance and effectively controls the training time. Although the F-measure increases to 88.11%, the training time increases by only 25.35%. PMID:27501231
NASA Astrophysics Data System (ADS)
Takahashi, Noriyuki; Kinoshita, Toshibumi; Ohmura, Tomomi; Matsuyama, Eri; Toyoshima, Hideto
2018-02-01
The rapid increase in the incidence of Alzheimer's disease (AD) has become a critical issue in low and middle income countries. In general, MR imaging has become sufficiently suitable in clinical situations, while CT scan might be uncommonly used in the diagnosis of AD due to its low contrast between brain tissues. However, in those countries, CT scan, which is less costly and readily available, will be desired to become useful for the diagnosis of AD. For CT scan, the enlargement of the temporal horn of the lateral ventricle (THLV) is one of few findings for the diagnosis of AD. In this paper, we present an automated volumetry of THLV with segmentation based on Bayes' rule on CT images. In our method, first, all CT data sets are normalized into an atlas by using linear affine transformation and non-linear wrapping techniques. Next, a probability map of THLV is constructed in the normalized data. Then, THLV regions are extracted based on Bayes' rule. Finally, the volume of the THLV is evaluated. This scheme was applied to CT scans from 20 AD patients and 20 controls to evaluate the performance of the method for detecting AD. The estimated THLV volume was markedly increased in the AD group compared with the controls (P < .0001), and the area under the receiver operating characteristic curve (AUC) was 0.921. Therefore, this computerized method may have the potential to accurately detect AD on CT images.
Event Recognition Based on Deep Learning in Chinese Texts.
Zhang, Yajun; Liu, Zongtian; Zhou, Wen
2016-01-01
Event recognition is the most fundamental and critical task in event-based natural language processing systems. Existing event recognition methods based on rules and shallow neural networks have certain limitations. For example, extracting features using methods based on rules is difficult; methods based on shallow neural networks converge too quickly to a local minimum, resulting in low recognition precision. To address these problems, we propose the Chinese emergency event recognition model based on deep learning (CEERM). Firstly, we use a word segmentation system to segment sentences. According to event elements labeled in the CEC 2.0 corpus, we classify words into five categories: trigger words, participants, objects, time and location. Each word is vectorized according to the following six feature layers: part of speech, dependency grammar, length, location, distance between trigger word and core word and trigger word frequency. We obtain deep semantic features of words by training a feature vector set using a deep belief network (DBN), then analyze those features in order to identify trigger words by means of a back propagation neural network. Extensive testing shows that the CEERM achieves excellent recognition performance, with a maximum F-measure value of 85.17%. Moreover, we propose the dynamic-supervised DBN, which adds supervised fine-tuning to a restricted Boltzmann machine layer by monitoring its training performance. Test analysis reveals that the new DBN improves recognition performance and effectively controls the training time. Although the F-measure increases to 88.11%, the training time increases by only 25.35%.
Decision rules for unbiased inventory estimates
NASA Technical Reports Server (NTRS)
Argentiero, P. D.; Koch, D.
1979-01-01
An efficient and accurate procedure for estimating inventories from remote sensing scenes is presented. In place of the conventional and expensive full dimensional Bayes decision rule, a one-dimensional feature extraction and classification technique was employed. It is shown that this efficient decision rule can be used to develop unbiased inventory estimates and that for large sample sizes typical of satellite derived remote sensing scenes, resulting accuracies are comparable or superior to more expensive alternative procedures. Mathematical details of the procedure are provided in the body of the report and in the appendix. Results of a numerical simulation of the technique using statistics obtained from an observed LANDSAT scene are included. The simulation demonstrates the effectiveness of the technique in computing accurate inventory estimates.
Prediction of High Incidence of Dengue in the Philippines
Buczak, Anna L.; Baugher, Benjamin; Babin, Steven M.; Ramac-Thomas, Liane C.; Guven, Erhan; Elbert, Yevgeniy; Koshute, Phillip T.; Velasco, John Mark S.; Roque, Vito G.; Tayag, Enrique A.; Yoon, In-Kyu; Lewis, Sheri H.
2014-01-01
Background Accurate prediction of dengue incidence levels weeks in advance of an outbreak may reduce the morbidity and mortality associated with this neglected disease. Therefore, models were developed to predict high and low dengue incidence in order to provide timely forewarnings in the Philippines. Methods Model inputs were chosen based on studies indicating variables that may impact dengue incidence. The method first uses Fuzzy Association Rule Mining techniques to extract association rules from these historical epidemiological, environmental, and socio-economic data, as well as climate data indicating future weather patterns. Selection criteria were used to choose a subset of these rules for a classifier, thereby generating a Prediction Model. The models predicted high or low incidence of dengue in a Philippines province four weeks in advance. The threshold between high and low was determined relative to historical incidence data. Principal Findings Model accuracy is described by Positive Predictive Value (PPV), Negative Predictive Value (NPV), Sensitivity, and Specificity computed on test data not previously used to develop the model. Selecting a model using the F0.5 measure, which gives PPV more importance than Sensitivity, gave these results: PPV = 0.780, NPV = 0.938, Sensitivity = 0.547, Specificity = 0.978. Using the F3 measure, which gives Sensitivity more importance than PPV, the selected model had PPV = 0.778, NPV = 0.948, Sensitivity = 0.627, Specificity = 0.974. The decision as to which model has greater utility depends on how the predictions will be used in a particular situation. Conclusions This method builds prediction models for future dengue incidence in the Philippines and is capable of being modified for use in different situations; for diseases other than dengue; and for regions beyond the Philippines. The Philippines dengue prediction models predicted high or low incidence of dengue four weeks in advance of an outbreak with high accuracy, as measured by PPV, NPV, Sensitivity, and Specificity. PMID:24722434
Information Extraction from Multiple Syntactic Sources
2004-05-01
Performance of SVM and KNN (k=3) on different kernel setups. Types are ordered in decreasing order of frequency of occurrence in the ACE corpus. For SVM, the...name. But it is not easy to recognize “A Real New York Bargain” as a company name. In other languages or transcripts of English speech where...symbolic rules for extraction of posted computer jobs. It only assumed simple syntactic preprocessing such as tokeniza- tion and Part-of- Speech tagging
Automatic Generation of Supervisory Control System Software Using Graph Composition
NASA Astrophysics Data System (ADS)
Nakata, Hideo; Sano, Tatsuro; Kojima, Taizo; Seo, Kazuo; Uchida, Tomoyuki; Nakamura, Yasuaki
This paper describes the automatic generation of system descriptions for SCADA (Supervisory Control And Data Acquisition) systems. The proposed method produces various types of data and programs for SCADA systems from equipment definitions using conversion rules. At first, this method makes directed graphs, which represent connections between the equipment, from equipment definitions. System descriptions are generated using the conversion rules, by analyzing these directed graphs, and finding the groups of equipment that involve similar operations. This method can make the conversion rules multi levels by using the composition of graphs, and can reduce the number of rules. The developer can define and manage these rules efficiently.
Refining Linear Fuzzy Rules by Reinforcement Learning
NASA Technical Reports Server (NTRS)
Berenji, Hamid R.; Khedkar, Pratap S.; Malkani, Anil
1996-01-01
Linear fuzzy rules are increasingly being used in the development of fuzzy logic systems. Radial basis functions have also been used in the antecedents of the rules for clustering in product space which can automatically generate a set of linear fuzzy rules from an input/output data set. Manual methods are usually used in refining these rules. This paper presents a method for refining the parameters of these rules using reinforcement learning which can be applied in domains where supervised input-output data is not available and reinforcements are received only after a long sequence of actions. This is shown for a generalization of radial basis functions. The formation of fuzzy rules from data and their automatic refinement is an important step in closing the gap between the application of reinforcement learning methods in the domains where only some limited input-output data is available.
He3 Spin-Dependent Cross Sections and Sum Rules
NASA Astrophysics Data System (ADS)
Slifer, K.; Amarian, M.; Auerbach, L.; Averett, T.; Berthot, J.; Bertin, P.; Bertozzi, B.; Black, T.; Brash, E.; Brown, D.; Burtin, E.; Calarco, J.; Cates, G.; Chai, Z.; Chen, J.-P.; Choi, Seonho; Chudakov, E.; Ciofi Degli Atti, C.; Cisbani, E.; de Jager, C. W.; Deur, A.; Disalvo, R.; Dieterich, S.; Djawotho, P.; Finn, M.; Fissum, K.; Fonvieille, H.; Frullani, S.; Gao, H.; Gao, J.; Garibaldi, F.; Gasparian, A.; Gilad, S.; Gilman, R.; Glamazdin, A.; Glashausser, C.; Glöckle, W.; Golak, J.; Goldberg, E.; Gomez, J.; Gorbenko, V.; Hansen, J.-O.; Hersman, B.; Holmes, R.; Huber, G. M.; Hughes, E.; Humensky, B.; Incerti, S.; Iodice, M.; Jensen, S.; Jiang, X.; Jones, C.; Jones, G.; Jones, M.; Jutier, C.; Kamada, H.; Ketikyan, A.; Kominis, I.; Korsch, W.; Kramer, K.; Kumar, K.; Kumbartzki, G.; Kuss, M.; Lakuriqi, E.; Laveissiere, G.; Lerose, J. J.; Liang, M.; Liyanage, N.; Lolos, G.; Malov, S.; Marroncle, J.; McCormick, K.; McKeown, R. D.; Meziani, Z.-E.; Michaels, R.; Mitchell, J.; Nogga, A.; Pace, E.; Papandreou, Z.; Pavlin, T.; Petratos, G. G.; Pripstein, D.; Prout, D.; Ransome, R.; Roblin, Y.; Rowntree, D.; Rvachev, M.; Sabatié, F.; Saha, A.; Salmè, G.; Scopetta, S.; Skibiński, R.; Souder, P.; Saito, T.; Strauch, S.; Suleiman, R.; Takahashi, K.; Teijiro, S.; Todor, L.; Tsubota, H.; Ueno, H.; Urciuoli, G.; van der Meer, R.; Vernin, P.; Voskanian, H.; Witała, H.; Wojtsekhowski, B.; Xiong, F.; Xu, W.; Yang, J.-C.; Zhang, B.; Zolnierczuk, P.
2008-07-01
We present a measurement of the spin-dependent cross sections for the He→3(e→,e')X reaction in the quasielastic and resonance regions at a four-momentum transfer 0.1≤Q2≤0.9GeV2. The spin-structure functions have been extracted and used to evaluate the nuclear Burkhardt-Cottingham and extended Gerasimov-Drell-Hearn sum rules for the first time. The data are also compared to an impulse approximation calculation and an exact three-body Faddeev calculation in the quasielastic region.
Software Engineering Laboratory (SEL) relationships, models, and management rules
NASA Technical Reports Server (NTRS)
Decker, William; Hendrick, Robert; Valett, Jon D.
1991-01-01
Over 50 individual Software Engineering Laboratory (SEL) research results, extracted from a review of published SEL documentation, that can be applied directly to managing software development projects are captured. Four basic categories of results are defined and discussed - environment profiles, relationships, models, and management rules. In each category, research results are presented as a single page that summarizes the individual result, lists potential uses of the result by managers, and references the original SEL documentation where the result was found. The document serves as a concise reference summary of applicable research for SEL managers.
Natural-Language Parser for PBEM
NASA Technical Reports Server (NTRS)
James, Mark
2010-01-01
A computer program called "Hunter" accepts, as input, a colloquial-English description of a set of policy-based-management rules, and parses that description into a form useable by policy-based enterprise management (PBEM) software. PBEM is a rules-based approach suitable for automating some management tasks. PBEM simplifies the management of a given enterprise through establishment of policies addressing situations that are likely to occur. Hunter was developed to have a unique capability to extract the intended meaning instead of focusing on parsing the exact ways in which individual words are used.
Golpayegani, Gelayol Nazari; Jafari, Amir Homayoun; Dabanloo, Nader Jafarnia
2017-01-01
According to the World Health Organization, by the end of last year, about 37 million people throughout the world were diagnosed with AIDS and millions of people die each year from this disease. To develop an appropriate model which depicts the mechanism of the dynamics involved in the interactions between HIV and immune system in peripheral bloodstream of HIV infected individuals by considering the phenomena of virus mutation and taking into account the role of latently infected cells in speared of infection and considering the effects of antiretroviral drugs and occurrence of drug resistance in our model in order to assess the results obtained from applying different therapeutic methods. Two-dimensional CA model with Moor neighboring was developed. Various agents which they were referring to peripheral bloodstream particles of HIV infected individuals were defined. Then the biological rules were extracted from both expert knowledge and the authoritative articles. The extracted rules were applied for updating the states of these agents. The effects of using antiretroviral drug treatment were considered by applying drug's effectiveness of both of protease and reverse transcriptase inhibitors as two separate inputs of model. Time evolution curves of concentrations of defined agents were shown as our results. In case of considering no treatment, our results showed that concentrations of healthy CD4+T cells reached the threshold of AIDS after a bout 250 weeks. By applying monotherapy method, the concentrations of these cells remained on the threshold of AIDS for a long time and applying combined antiretroviral therapy (cART) method leaded to increase the concentration of these cells 20% upper than threshold of AIDS. Also, by applying monotherapy and cART compared with no treatment, the concentrations of infected CD4+T cells 10% and 40% decreased further, respectively and for the level of viral load, leads to a reduction of almost 55% and 90%, respectively. Belated treatment, comparison with early treatment, caused almost 10% reduce (increase) in steady state concentrations of healthy (infected) cells.
Hierarchy-associated semantic-rule inference framework for classifying indoor scenes
NASA Astrophysics Data System (ADS)
Yu, Dan; Liu, Peng; Ye, Zhipeng; Tang, Xianglong; Zhao, Wei
2016-03-01
Typically, the initial task of classifying indoor scenes is challenging, because the spatial layout and decoration of a scene can vary considerably. Recent efforts at classifying object relationships commonly depend on the results of scene annotation and predefined rules, making classification inflexible. Furthermore, annotation results are easily affected by external factors. Inspired by human cognition, a scene-classification framework was proposed using the empirically based annotation (EBA) and a match-over rule-based (MRB) inference system. The semantic hierarchy of images is exploited by EBA to construct rules empirically for MRB classification. The problem of scene classification is divided into low-level annotation and high-level inference from a macro perspective. Low-level annotation involves detecting the semantic hierarchy and annotating the scene with a deformable-parts model and a bag-of-visual-words model. In high-level inference, hierarchical rules are extracted to train the decision tree for classification. The categories of testing samples are generated from the parts to the whole. Compared with traditional classification strategies, the proposed semantic hierarchy and corresponding rules reduce the effect of a variable background and improve the classification performance. The proposed framework was evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.
A shape-based inter-layer contours correspondence method for ICT-based reverse engineering
Duan, Liming; Yang, Shangpeng; Zhang, Gui; Feng, Fei; Gu, Minghui
2017-01-01
The correspondence of a stack of planar contours in ICT (industrial computed tomography)-based reverse engineering, a key step in surface reconstruction, is difficult when the contours or topology of the object are complex. Given the regularity of industrial parts and similarity of the inter-layer contours, a specialized shape-based inter-layer contours correspondence method for ICT-based reverse engineering was presented to solve the above problem based on the vectorized contours. In this paper, the vectorized contours extracted from the slices consist of three graphical primitives: circles, arcs and segments. First, the correspondence of the inter-layer primitives is conducted based on the characteristics of the primitives. Second, based on the corresponded primitives, the inter-layer contours correspond with each other using the proximity rules and exhaustive search. The proposed method can make full use of the shape information to handle industrial parts with complex structures. The feasibility and superiority of this method have been demonstrated via the related experiments. This method can play an instructive role in practice and provide a reference for the related research. PMID:28489867
A shape-based inter-layer contours correspondence method for ICT-based reverse engineering.
Duan, Liming; Yang, Shangpeng; Zhang, Gui; Feng, Fei; Gu, Minghui
2017-01-01
The correspondence of a stack of planar contours in ICT (industrial computed tomography)-based reverse engineering, a key step in surface reconstruction, is difficult when the contours or topology of the object are complex. Given the regularity of industrial parts and similarity of the inter-layer contours, a specialized shape-based inter-layer contours correspondence method for ICT-based reverse engineering was presented to solve the above problem based on the vectorized contours. In this paper, the vectorized contours extracted from the slices consist of three graphical primitives: circles, arcs and segments. First, the correspondence of the inter-layer primitives is conducted based on the characteristics of the primitives. Second, based on the corresponded primitives, the inter-layer contours correspond with each other using the proximity rules and exhaustive search. The proposed method can make full use of the shape information to handle industrial parts with complex structures. The feasibility and superiority of this method have been demonstrated via the related experiments. This method can play an instructive role in practice and provide a reference for the related research.
Carvalho, J J; Jerónimo, P C A; Gonçalves, C; Alpendurada, M F
2008-11-01
European Council Directive 98/83/EC on the quality of water intended for human consumption brought a new challenge for water-quality control routine laboratories, mainly on pesticides analysis. Under the guidelines of ISO/IEC 17025:2005, a multiresidue method was developed, validated, implemented in routine, and studied with real samples during a one-year period. The proposed method enables routine laboratories to handle a large number of samples, since 28 pesticides of 14 different chemical groups can be quantitated in a single procedure. The method comprises a solid-phase extraction step and subsequent analysis by liquid chromatography-mass spectrometry (LC-MS-MS). The accuracy was established on the basis of participation in interlaboratory proficiency tests, with encouraging results (majority |z-score| <2), and the precision was consistently analysed over one year. The limits of quantitation (below 0.050 microg L(-1)) are in agreement with the enforced threshold value for pesticides of 0.10 microg L(-1). Overall method performance is suitable for routine use according to accreditation rules, taking into account the data collected over one year.
This document contains the presentations from the Unregulated Contaminant Monitoring Rule 4 Methods Stakeholder Meeting held on May 15, 2013 about the Contaminant Candidate List (CCL) and the Unregulated Contaminant Monitoring Rule.
Process service quality evaluation based on Dempster-Shafer theory and support vector machine.
Pei, Feng-Que; Li, Dong-Bo; Tong, Yi-Fei; He, Fei
2017-01-01
Human involvement influences traditional service quality evaluations, which triggers an evaluation's low accuracy, poor reliability and less impressive predictability. This paper proposes a method by employing a support vector machine (SVM) and Dempster-Shafer evidence theory to evaluate the service quality of a production process by handling a high number of input features with a low sampling data set, which is called SVMs-DS. Features that can affect production quality are extracted by a large number of sensors. Preprocessing steps such as feature simplification and normalization are reduced. Based on three individual SVM models, the basic probability assignments (BPAs) are constructed, which can help the evaluation in a qualitative and quantitative way. The process service quality evaluation results are validated by the Dempster rules; the decision threshold to resolve conflicting results is generated from three SVM models. A case study is presented to demonstrate the effectiveness of the SVMs-DS method.
Automated Mounting Bias Calibration for Airborne LIDAR System
NASA Astrophysics Data System (ADS)
Zhang, J.; Jiang, W.; Jiang, S.
2012-07-01
Mounting bias is the major error source of Airborne LIDAR system. In this paper, an automated calibration method for estimating LIDAR system mounting parameters is introduced. LIDAR direct geo-referencing model is used to calculate systematic errors. Due to LIDAR footprints discretely sampled, the real corresponding laser points are hardly existence among different strips. The traditional corresponding point methodology does not seem to apply to LIDAR strip registration. We proposed a Virtual Corresponding Point Model to resolve the corresponding problem among discrete laser points. Each VCPM contains a corresponding point and three real laser footprints. Two rules are defined to calculate tie point coordinate from real laser footprints. The Scale Invariant Feature Transform (SIFT) is used to extract corresponding points in LIDAR strips, and the automatic flow of LIDAR system calibration based on VCPM is detailed described. The practical examples illustrate the feasibility and effectiveness of the proposed calibration method.
Gorazda, Katarzyna; Tarko, Barbara; Wzorek, Zbigniew; Kominko, Halyna; Nowak, Anna K; Kulczycka, Joanna; Henclik, Anna; Smol, Marzena
2017-04-01
Sustainable development and circular economy rules force the global fertilizer industry to develop new phosphorous recovery methods from alternative sources. In this paper a phosphorus recovery technology from Polish industrial Sewage Sludge Ashes was investigated (PolFerAsh - Polish Fertilizers form Ash). A wet method with the use of mineral acid and neutralization was proposed. Detailed characteristic of SSA from largest mono-combustion plans were given and compared to raw materials used on the market. The technological factors associated with such materials were discussed. The composition of the extracts was compared to typical industrial phosphoric acid and standard values characterizing suspension fertilizers. The most favorable conditions for selective precipitation of phosphorus compounds were revealed. The fertilizers obtained also meet EU regulations in the case of the newly discussed Cd content. The process was scaled up and a flow mass diagram was defined. Copyright © 2017 Elsevier Inc. All rights reserved.
Measurement of the generalized form factors near threshold via γ*p→nπ+ at high Q2
NASA Astrophysics Data System (ADS)
Park, K.; Gothe, R. W.; Adhikari, K. P.; Adikaram, D.; Anghinolfi, M.; Baghdasaryan, H.; Ball, J.; Battaglieri, M.; Batourine, V.; Bedlinskiy, I.; Bennett, R. P.; Biselli, A. S.; Bookwalter, C.; Boiarinov, S.; Branford, D.; Briscoe, W. J.; Brooks, W. K.; Burkert, V. D.; Carman, D. S.; Celentano, A.; Chandavar, S.; Charles, G.; Cole, P. L.; Contalbrigo, M.; Crede, V.; D'Angelo, A.; Daniel, A.; Dashyan, N.; De Vita, R.; De Sanctis, E.; Deur, A.; Djalali, C.; Doughty, D.; Dupre, R.; El Alaoui, A.; El Fassi, L.; Eugenio, P.; Fedotov, G.; Fradi, A.; Gabrielyan, M. Y.; Gevorgyan, N.; Gilfoyle, G. P.; Giovanetti, K. L.; Girod, F. X.; Goetz, J. T.; Gohn, W.; Golovatch, E.; Graham, L.; Griffioen, K. A.; Guidal, M.; Guo, L.; Hafidi, K.; Hakobyan, H.; Hanretty, C.; Heddle, D.; Hicks, K.; Holtrop, M.; Hyde, C. E.; Ilieva, Y.; Ireland, D. G.; Ishkhanov, B. S.; Isupov, E. L.; Jenkins, D.; Jo, H. S.; Joo, K.; Kalantarians, N.; Khandaker, M.; Khetarpal, P.; Kim, A.; Kim, W.; Klein, A.; Klein, F. J.; Kubarovsky, A.; Kubarovsky, V.; Kuhn, S. E.; Kuleshov, S. V.; Kvaltine, N. D.; Livingston, K.; Lu, H. Y.; MacGregor, I. J. D.; Markov, N.; Mayer, M.; McKinnon, B.; Mestayer, M. D.; Meyer, C. A.; Mineeva, T.; Mirazita, M.; Mokeev, V.; Moutarde, H.; Munevar, E.; Nadel-Turonski, P.; Nasseripour, R.; Niccolai, S.; Niculescu, G.; Niculescu, I.; Osipenko, M.; Ostrovidov, A. I.; Paolone, M.; Pappalardo, L.; Paremuzyan, R.; Park, S.; Pereira, S. Anefalos; Phelps, E.; Pisano, S.; Pogorelko, O.; Pozdniakov, S.; Price, J. W.; Procureur, S.; Prok, Y.; Ricco, G.; Rimal, D.; Ripani, M.; Ritchie, B. G.; Rosner, G.; Rossi, P.; Sabatié, F.; Saini, M. S.; Salgado, C.; Schott, D.; Schumacher, R. A.; Seraydaryan, H.; Sharabian, Y. G.; Smith, E. S.; Smith, G. D.; Sober, D. I.; Sokhan, D.; Stepanyan, S. S.; Stepanyan, S.; Stoler, P.; Strakovsky, I. I.; Strauch, S.; Taiuti, M.; Tang, W.; Taylor, C. E.; Tian, Y.; Tkachenko, S.; Trivedi, A.; Ungaro, M.; Vernarsky, B.; Vlassov, A. V.; Voutier, E.; Watts, D. P.; Weygand, D. P.; Wood, M. H.; Zachariou, N.; Zhao, B.; Zhao, Z. W.
2012-03-01
We report the first extraction of the pion-nucleon multipoles near the production threshold for the nπ+ channel at relatively high momentum transfer (Q2 up to 4.2 GeV2). The dominance of the s-wave transverse multipole (E0+), expected in this region, allowed us to access the generalized form factor G1 within the light-cone sum-rule (LCSR) framework as well as the axial form factor GA. The data analyzed in this work were collected by the nearly 4π CEBAF Large Acceptance Spectrometer (CLAS) using a 5.754-GeV electron beam on a proton target. The differential cross section and the π-N multipole E0+/GD were measured using two different methods, the LCSR and a direct multipole fit. The results from the two methods are found to be consistent and almost Q2 independent.
NASA Astrophysics Data System (ADS)
Xie, Tian; Grossman, Jeffrey C.
2018-04-01
The use of machine learning methods for accelerating the design of crystalline materials usually requires manually constructed feature vectors or complex transformation of atom coordinates to input the crystal structure, which either constrains the model to certain crystal types or makes it difficult to provide chemical insights. Here, we develop a crystal graph convolutional neural networks framework to directly learn material properties from the connection of atoms in the crystal, providing a universal and interpretable representation of crystalline materials. Our method provides a highly accurate prediction of density functional theory calculated properties for eight different properties of crystals with various structure types and compositions after being trained with 1 04 data points. Further, our framework is interpretable because one can extract the contributions from local chemical environments to global properties. Using an example of perovskites, we show how this information can be utilized to discover empirical rules for materials design.
Kargarfard, Fatemeh; Sami, Ashkan; Mohammadi-Dehcheshmeh, Manijeh; Ebrahimie, Esmaeil
2016-11-16
Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.
NASA Astrophysics Data System (ADS)
Candare, Rudolph Joshua; Japitana, Michelle; Cubillas, James Earl; Ramirez, Cherry Bryan
2016-06-01
This research describes the methods involved in the mapping of different high value crops in Agusan del Norte Philippines using LiDAR. This project is part of the Phil-LiDAR 2 Program which aims to conduct a nationwide resource assessment using LiDAR. Because of the high resolution data involved, the methodology described here utilizes object-based image analysis and the use of optimal features from LiDAR data and Orthophoto. Object-based classification was primarily done by developing rule-sets in eCognition. Several features from the LiDAR data and Orthophotos were used in the development of rule-sets for classification. Generally, classes of objects can't be separated by simple thresholds from different features making it difficult to develop a rule-set. To resolve this problem, the image-objects were subjected to Support Vector Machine learning. SVMs have gained popularity because of their ability to generalize well given a limited number of training samples. However, SVMs also suffer from parameter assignment issues that can significantly affect the classification results. More specifically, the regularization parameter C in linear SVM has to be optimized through cross validation to increase the overall accuracy. After performing the segmentation in eCognition, the optimization procedure as well as the extraction of the equations of the hyper-planes was done in Matlab. The learned hyper-planes separating one class from another in the multi-dimensional feature-space can be thought of as super-features which were then used in developing the classifier rule set in eCognition. In this study, we report an overall classification accuracy of greater than 90% in different areas.
Paavilainen, P; Simola, J; Jaramillo, M; Näätänen, R; Winkler, I
2001-03-01
Brain mechanisms extracting invariant information from varying auditory inputs were studied using the mismatch-negativity (MMN) brain response. We wished to determine whether the preattentive sound-analysis mechanisms, reflected by MMN, are capable of extracting invariant relationships based on abstract conjunctions between two sound features. The standard stimuli varied over a large range in frequency and intensity dimensions following the rule that the higher the frequency, the louder the intensity. The occasional deviant stimuli violated this frequency-intensity relationship and elicited an MMN. The results demonstrate that preattentive processing of auditory stimuli extends to unexpectedly complex relationships between the stimulus features.
A Risk Assessment System with Automatic Extraction of Event Types
NASA Astrophysics Data System (ADS)
Capet, Philippe; Delavallade, Thomas; Nakamura, Takuya; Sandor, Agnes; Tarsitano, Cedric; Voyatzi, Stavroula
In this article we describe the joint effort of experts in linguistics, information extraction and risk assessment to integrate EventSpotter, an automatic event extraction engine, into ADAC, an automated early warning system. By detecting as early as possible weak signals of emerging risks ADAC provides a dynamic synthetic picture of situations involving risk. The ADAC system calculates risk on the basis of fuzzy logic rules operated on a template graph whose leaves are event types. EventSpotter is based on a general purpose natural language dependency parser, XIP, enhanced with domain-specific lexical resources (Lexicon-Grammar). Its role is to automatically feed the leaves with input data.
NASA Astrophysics Data System (ADS)
Wan, Xiaoqing; Zhao, Chunhui; Gao, Bing
2017-11-01
The integration of an edge-preserving filtering technique in the classification of a hyperspectral image (HSI) has been proven effective in enhancing classification performance. This paper proposes an ensemble strategy for HSI classification using an edge-preserving filter along with a deep learning model and edge detection. First, an adaptive guided filter is applied to the original HSI to reduce the noise in degraded images and to extract powerful spectral-spatial features. Second, the extracted features are fed as input to a stacked sparse autoencoder to adaptively exploit more invariant and deep feature representations; then, a random forest classifier is applied to fine-tune the entire pretrained network and determine the classification output. Third, a Prewitt compass operator is further performed on the HSI to extract the edges of the first principal component after dimension reduction. Moreover, the regional growth rule is applied to the resulting edge logical image to determine the local region for each unlabeled pixel. Finally, the categories of the corresponding neighborhood samples are determined in the original classification map; then, the major voting mechanism is implemented to generate the final output. Extensive experiments proved that the proposed method achieves competitive performance compared with several traditional approaches.
Effective Multifocus Image Fusion Based on HVS and BP Neural Network
Yang, Yong
2014-01-01
The aim of multifocus image fusion is to fuse the images taken from the same scene with different focuses to obtain a resultant image with all objects in focus. In this paper, a novel multifocus image fusion method based on human visual system (HVS) and back propagation (BP) neural network is presented. Three features which reflect the clarity of a pixel are firstly extracted and used to train a BP neural network to determine which pixel is clearer. The clearer pixels are then used to construct the initial fused image. Thirdly, the focused regions are detected by measuring the similarity between the source images and the initial fused image followed by morphological opening and closing operations. Finally, the final fused image is obtained by a fusion rule for those focused regions. Experimental results show that the proposed method can provide better performance and outperform several existing popular fusion methods in terms of both objective and subjective evaluations. PMID:24683327
Spectroscopic Detection of Caries Lesions
Ruohonen, Mika; Palo, Katri; Alander, Jarmo
2013-01-01
Background. A caries lesion causes changes in the optical properties of the affected tissue. Currently a caries lesion can be detected only at a relatively late stage of development. Caries diagnosis also suffers from high interobserver variance. Methods. This is a pilot study to test the suitability of an optical diffuse reflectance spectroscopy for caries diagnosis. Reflectance visible/near-infrared spectroscopy (VIS/NIRS) was used to measure caries lesions and healthy enamel on extracted human teeth. The results were analysed with a computational algorithm in order to find a rule-based classification method to detect caries lesions. Results. The classification indicated that the measured points of enamel could be assigned to one of three classes: healthy enamel, a caries lesion, and stained healthy enamel. The features that enabled this were consistent with theory. Conclusions. It seems that spectroscopic measurements can help to reduce false positives at in vitro setting. However, further research is required to evaluate the strength of the evidence for the method's performance. PMID:27006907
Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah
2016-01-01
Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.
Discovering body site and severity modifiers in clinical texts.
Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana K
2014-01-01
To research computational methods for discovering body site and severity modifiers in clinical texts. We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. The performance of our method for discovering body site modifiers achieves F1 of 0.740-0.908 and our method for discovering severity modifiers achieves F1 of 0.905-0.929. Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES).
Emadzadeh, Ehsan; Sarker, Abeed; Nikfarjam, Azadeh; Gonzalez, Graciela
2017-01-01
Social networks, such as Twitter, have become important sources for active monitoring of user-reported adverse drug reactions (ADRs). Automatic extraction of ADR information can be crucial for healthcare providers, drug manufacturers, and consumers. However, because of the non-standard nature of social media language, automatically extracted ADR mentions need to be mapped to standard forms before they can be used by operational pharmacovigilance systems. We propose a modular natural language processing pipeline for mapping (normalizing) colloquial mentions of ADRs to their corresponding standardized identifiers. We seek to accomplish this task and enable customization of the pipeline so that distinct unlabeled free text resources can be incorporated to use the system for other normalization tasks. Our approach, which we call Hybrid Semantic Analysis (HSA), sequentially employs rule-based and semantic matching algorithms for mapping user-generated mentions to concept IDs in the Unified Medical Language System vocabulary. The semantic matching component of HSA is adaptive in nature and uses a regression model to combine various measures of semantic relatedness and resources to optimize normalization performance on the selected data source. On a publicly available corpus, our normalization method achieves 0.502 recall and 0.823 precision (F-measure: 0.624). Our proposed method outperforms a baseline based on latent semantic analysis and another that uses MetaMap.
Measuring Data Quality Through a Source Data Verification Audit in a Clinical Research Setting.
Houston, Lauren; Probst, Yasmine; Humphries, Allison
2015-01-01
Health data has long been scrutinised in relation to data quality and integrity problems. Currently, no internationally accepted or "gold standard" method exists measuring data quality and error rates within datasets. We conducted a source data verification (SDV) audit on a prospective clinical trial dataset. An audit plan was applied to conduct 100% manual verification checks on a 10% random sample of participant files. A quality assurance rule was developed, whereby if >5% of data variables were incorrect a second 10% random sample would be extracted from the trial data set. Error was coded: correct, incorrect (valid or invalid), not recorded or not entered. Audit-1 had a total error of 33% and audit-2 36%. The physiological section was the only audit section to have <5% error. Data not recorded to case report forms had the greatest impact on error calculations. A significant association (p=0.00) was found between audit-1 and audit-2 and whether or not data was deemed correct or incorrect. Our study developed a straightforward method to perform a SDV audit. An audit rule was identified and error coding was implemented. Findings demonstrate that monitoring data quality by a SDV audit can identify data quality and integrity issues within clinical research settings allowing quality improvement to be made. The authors suggest this approach be implemented for future research.
Establishment of a New Drug Code for Marihuana Extract. Final rule.
2016-12-14
The Drug Enforcement Administration is creating a new Administration Controlled Substances Code Number for "Marihuana Extract." This code number will allow DEA and DEA-registered entities to track quantities of this material separately from quantities of marihuana. This, in turn, will aid in complying with relevant treaty provisions. Under international drug control treaties administered by the United Nations, some differences exist between the regulatory controls pertaining to marihuana extract versus those for marihuana and tetrahydrocannabinols. The DEA has previously established separate code numbers for marihuana and for tetrahydrocannabinols, but not for marihuana extract. To better track these materials and comply with treaty provisions, DEA is creating a separate code number for marihuana extract with the following definition: "Meaning an extract containing one or more cannabinoids that has been derived from any plant of the genus Cannabis, other than the separated resin (whether crude or purified) obtained from the plant." Extracts of marihuana will continue to be treated as Schedule I controlled substances.
Sleep facilitates learning a new linguistic rule
Batterink, Laura J.; Oudiette, Delphine; Reber, Paul J.; Paller, Ken A.
2014-01-01
Natural languages contain countless regularities. Extraction of these patterns is an essential component of language acquisition. Here we examined the hypothesis that memory processing during sleep contributes to this learning. We exposed participants to a hidden linguistic rule by presenting a large number of two-word phrases, each including a noun preceded by one of four novel words that functioned as an article (e.g., gi rhino). These novel words (ul, gi, ro and ne) were presented as obeying an explicit rule: two words signified that the noun referent was relatively near, and two that it was relatively far. Undisclosed to participants was the fact that the novel articles also predicted noun animacy, with two of the articles preceding animate referents and the other two preceding inanimate referents. Rule acquisition was tested implicitly using a task in which participants responded to each phrase according to whether the noun was animate or inanimate. Learning of the hidden rule was evident in slower responses to phrases that violated the rule. Responses were delayed regardless of whether rule-knowledge was consciously accessible. Brain potentials provided additional confirmation of implicit and explicit rule-knowledge. An afternoon nap was interposed between two 20-min learning sessions. Participants who obtained greater amounts of both slow-wave and rapid-eye-movement sleep showed increased sensitivity to the hidden linguistic rule in the second session. We conclude that during sleep, reactivation of linguistic information linked with the rule was instrumental for stabilizing learning. The combination of slow-wave and rapid-eye-movement sleep may synergistically facilitate the abstraction of complex patterns in linguistic input. PMID:25447376
Bass, Anne R; Fields, Kara G; Goto, Rie; Turissini, Gregory; Dey, Shirin; Russell, Linda A
2017-11-01
Background Clinical decision rules (CDRs) for pulmonary embolism (PE) have been validated in outpatients, but their performance in hospitalized patients is not well characterized. Objectives The goal of this systematic literature review was to assess the performance of CDRs for PE in hospitalized patients. Methods We performed a structured literature search using Medline, EMBASE and the Cochrane library for articles published on or before January 18, 2017. Two authors reviewed all titles, abstracts and full texts. We selected prospective studies of symptomatic hospitalized patients in which a CDR was used to estimate the likelihood of PE. The diagnosis of PE had to be confirmed using an accepted reference standard. Data on hospitalized patients were solicited from authors of studies in mixed populations of outpatients and hospitalized patients. Study characteristics, PE prevalence and CDR performance were extracted. The methodological quality of the studies was assessed using the QUADAS instrument. Results Twelve studies encompassing 3,942 hospitalized patients were included. Studies varied in methodology (randomized controlled trials and observational studies) and reference standards used. The pooled sensitivity of the modified Wells rule (cut-off ≤ 4) in hospitalized patients was 72.1% (95% confidence interval [CI], 63.7-79.2) and the pooled specificity was 62.2% (95% CI, 52.6-70.9). The modified Wells rule (cut-off ≤ 4) plus D-dimer testing had a pooled sensitivity 99.7% (95% CI, 96.7-100) and pooled specificity 10.8% (95% CI, 6.7-16.9). The efficiency (proportion of patients stratified into the 'PE unlikely' group) was 8.4% (95% CI, 4.1-16.5), and the failure rate (proportion of low likelihood patients who were diagnosed with PE during follow-up) was 0.1% (95% CI, 0-5.3). Conclusion In symptomatic hospitalized patients, use of the Wells rule plus D-dimer to rule out PE is safe, but allows very few patients to forgo imaging. Schattauer GmbH Stuttgart.
A Comparison of different learning models used in Data Mining for Medical Data
NASA Astrophysics Data System (ADS)
Srimani, P. K.; Koti, Manjula Sanjay
2011-12-01
The present study aims at investigating the different Data mining learning models for different medical data sets and to give practical guidelines to select the most appropriate algorithm for a specific medical data set. In practical situations, it is absolutely necessary to take decisions with regard to the appropriate models and parameters for diagnosis and prediction problems. Learning models and algorithms are widely implemented for rule extraction and the prediction of system behavior. In this paper, some of the well-known Machine Learning(ML) systems are investigated for different methods and are tested on five medical data sets. The practical criteria for evaluating different learning models are presented and the potential benefits of the proposed methodology for diagnosis and learning are suggested.
76 FR 73885 - Mandatory Reporting of Greenhouse Gases
Federal Register 2010, 2011, 2012, 2013, 2014
2011-11-29
.... 211112 Natural gas liquid extraction facilities. Underground Coal Mines........ 212113 Underground... natural gas liquids in addition to suppliers of petroleum products. 2. Summary of Comments and Responses... Mandatory Reporting of Greenhouse Gases; Final Rule #0;#0;Federal Register / Vol. 76, No. 229 / Tuesday...
SAMS--a systems architecture for developing intelligent health information systems.
Yılmaz, Özgün; Erdur, Rıza Cenk; Türksever, Mustafa
2013-12-01
In this paper, SAMS, a novel health information system architecture for developing intelligent health information systems is proposed and also some strategies for developing such systems are discussed. The systems fulfilling this architecture will be able to store electronic health records of the patients using OWL ontologies, share patient records among different hospitals and provide physicians expertise to assist them in making decisions. The system is intelligent because it is rule-based, makes use of rule-based reasoning and has the ability to learn and evolve itself. The learning capability is provided by extracting rules from previously given decisions by the physicians and then adding the extracted rules to the system. The proposed system is novel and original in all of these aspects. As a case study, a system is implemented conforming to SAMS architecture for use by dentists in the dental domain. The use of the developed system is described with a scenario. For evaluation, the developed dental information system will be used and tried by a group of dentists. The development of this system proves the applicability of SAMS architecture. By getting decision support from a system derived from this architecture, the cognitive gap between experienced and inexperienced physicians can be compensated. Thus, patient satisfaction can be achieved, inexperienced physicians are supported in decision making and the personnel can improve their knowledge. A physician can diagnose a case, which he/she has never diagnosed before, using this system. With the help of this system, it will be possible to store general domain knowledge in this system and the personnel's need to medical guideline documents will be reduced.
Neural coding of syntactic structure in learned vocalizations in the songbird.
Fujimoto, Hisataka; Hasegawa, Taku; Watanabe, Dai
2011-07-06
Although vocal signals including human languages are composed of a finite number of acoustic elements, complex and diverse vocal patterns can be created from combinations of these elements, linked together by syntactic rules. To enable such syntactic vocal behaviors, neural systems must extract the sequence patterns from auditory information and establish syntactic rules to generate motor commands for vocal organs. However, the neural basis of syntactic processing of learned vocal signals remains largely unknown. Here we report that the basal ganglia projecting premotor neurons (HVC(X) neurons) in Bengalese finches represent syntactic rules that generate variable song sequences. When vocalizing an alternative transition segment between song elements called syllables, sparse burst spikes of HVC(X) neurons code the identity of a specific syllable type or a specific transition direction among the alternative trajectories. When vocalizing a variable repetition sequence of the same syllable, HVC(X) neurons not only signal the initiation and termination of the repetition sequence but also indicate the progress and state-of-completeness of the repetition. These different types of syntactic information are frequently integrated within the activity of single HVC(X) neurons, suggesting that syntactic attributes of the individual neurons are not programmed as a basic cellular subtype in advance but acquired in the course of vocal learning and maturation. Furthermore, some auditory-vocal mirroring type HVC(X) neurons display transition selectivity in the auditory phase, much as they do in the vocal phase, suggesting that these songbirds may extract syntactic rules from auditory experience and apply them to form their own vocal behaviors.
On the fusion of tuning parameters of fuzzy rules and neural network
NASA Astrophysics Data System (ADS)
Mamuda, Mamman; Sathasivam, Saratha
2017-08-01
Learning fuzzy rule-based system with neural network can lead to a precise valuable empathy of several problems. Fuzzy logic offers a simple way to reach at a definite conclusion based upon its vague, ambiguous, imprecise, noisy or missing input information. Conventional learning algorithm for tuning parameters of fuzzy rules using training input-output data usually end in a weak firing state, this certainly powers the fuzzy rule and makes it insecure for a multiple-input fuzzy system. In this paper, we introduce a new learning algorithm for tuning the parameters of the fuzzy rules alongside with radial basis function neural network (RBFNN) in training input-output data based on the gradient descent method. By the new learning algorithm, the problem of weak firing using the conventional method was addressed. We illustrated the efficiency of our new learning algorithm by means of numerical examples. MATLAB R2014(a) software was used in simulating our result The result shows that the new learning method has the best advantage of training the fuzzy rules without tempering with the fuzzy rule table which allowed a membership function of the rule to be used more than one time in the fuzzy rule base.
NASA Astrophysics Data System (ADS)
Yang, Yuchen; Mabu, Shingo; Shimada, Kaoru; Hirasawa, Kotaro
Intertransaction association rules have been reported to be useful in many fields such as stock market prediction, but still there are not so many efficient methods to dig them out from large data sets. Furthermore, how to use and measure these more complex rules should be considered carefully. In this paper, we propose a new intertransaction class association rule mining method based on Genetic Network Programming (GNP), which has the ability to overcome some shortages of Apriori-like based intertransaction association methods. Moreover, a general classifier model for intertransaction rules is also introduced. In experiments on the real world application of stock market prediction, the method shows its efficiency and ability to obtain good results and can bring more benefits with a suitable classifier considering larger interval span.
Association Rule Analysis for Tour Route Recommendation and Application to Wctsnop
NASA Astrophysics Data System (ADS)
Fang, H.; Chen, C.; Lin, J.; Liu, X.; Fang, D.
2017-09-01
The increasing E-tourism systems provide intelligent tour recommendation for tourists. In this sense, recommender system can make personalized suggestions and provide satisfied information associated with their tour cycle. Data mining is a proper tool that extracting potential information from large database for making strategic decisions. In the study, association rule analysis based on FP-growth algorithm is applied to find the association relationship among scenic spots in different cities as tour route recommendation. In order to figure out valuable rules, Kulczynski interestingness measure is adopted and imbalance ratio is computed. The proposed scheme was evaluated on Wangluzhe cultural tourism service network operation platform (WCTSNOP), where it could verify that it is able to quick recommend tour route and to rapidly enhance the recommendation quality.
Complex Road Intersection Modelling Based on Low-Frequency GPS Track Data
NASA Astrophysics Data System (ADS)
Huang, J.; Deng, M.; Zhang, Y.; Liu, H.
2017-09-01
It is widely accepted that digital map becomes an indispensable guide for human daily traveling. Traditional road network maps are produced in the time-consuming and labour-intensive ways, such as digitizing printed maps and extraction from remote sensing images. At present, a large number of GPS trajectory data collected by floating vehicles makes it a reality to extract high-detailed and up-to-date road network information. Road intersections are often accident-prone areas and very critical to route planning and the connectivity of road networks is mainly determined by the topological geometry of road intersections. A few studies paid attention on detecting complex road intersections and mining the attached traffic information (e.g., connectivity, topology and turning restriction) from massive GPS traces. To the authors' knowledge, recent studies mainly used high frequency (1 s sampling rate) trajectory data to detect the crossroads regions or extract rough intersection models. It is still difficult to make use of low frequency (20-100 s) and easily available trajectory data to modelling complex road intersections geometrically and semantically. The paper thus attempts to construct precise models for complex road intersection by using low frequency GPS traces. We propose to firstly extract the complex road intersections by a LCSS-based (Longest Common Subsequence) trajectory clustering method, then delineate the geometry shapes of complex road intersections by a K-segment principle curve algorithm, and finally infer the traffic constraint rules inside the complex intersections.
Meystre, Stéphane M; Thibault, Julien; Shen, Shuying; Hurdle, John F; South, Brett R
2010-01-01
OBJECTIVE To describe a new medication information extraction system-Textractor-developed for the 'i2b2 medication extraction challenge'. The development, functionalities, and official evaluation of the system are detailed. Textractor is based on the Apache Unstructured Information Management Architecture (UMIA) framework, and uses methods that are a hybrid between machine learning and pattern matching. Two modules in the system are based on machine learning algorithms, while other modules use regular expressions, rules, and dictionaries, and one module embeds MetaMap Transfer. The official evaluation was based on a reference standard of 251 discharge summaries annotated by all teams participating in the challenge. The metrics used were recall, precision, and the F(1)-measure. They were calculated with exact and inexact matches, and were averaged at the level of systems and documents. The reference metric for this challenge, the system-level overall F(1)-measure, reached about 77% for exact matches, with a recall of 72% and a precision of 83%. Performance was the best with route information (F(1)-measure about 86%), and was good for dosage and frequency information, with F(1)-measures of about 82-85%. Results were not as good for durations, with F(1)-measures of 36-39%, and for reasons, with F(1)-measures of 24-27%. The official evaluation of Textractor for the i2b2 medication extraction challenge demonstrated satisfactory performance. This system was among the 10 best performing systems in this challenge.
Lu, Yao; Harrington, Peter B
2010-08-01
Direct methylation and solid-phase microextraction (SPME) were used as a sample preparation technique for classification of bacteria based on fatty acid methyl ester (FAME) profiles. Methanolic tetramethylammonium hydroxide was applied as a dual-function reagent to saponify and derivatize whole-cell bacterial fatty acids into FAMEs in one step, and SPME was used to extract the bacterial FAMEs from the headspace. Compared with traditional alkaline saponification and sample preparation using liquid-liquid extraction, the method presented in this work avoids using comparatively large amounts of inorganic and organic solvents and greatly decreases the sample preparation time as well. Characteristic gas chromatography/mass spectrometry (GC/MS) of FAME profiles was achieved for six bacterial species. The difference between Gram-positive and Gram-negative bacteria was clearly visualized with the application of principal component analysis of the GC/MS data of bacterial FAMEs. A cross-validation study using ten bootstrap Latin partitions and the fuzzy rule building expert system demonstrated 87 +/- 3% correct classification efficiency.
A Novel Model-Based Driving Behavior Recognition System Using Motion Sensors.
Wu, Minglin; Zhang, Sheng; Dong, Yuhan
2016-10-20
In this article, a novel driving behavior recognition system based on a specific physical model and motion sensory data is developed to promote traffic safety. Based on the theory of rigid body kinematics, we build a specific physical model to reveal the data change rule during the vehicle moving process. In this work, we adopt a nine-axis motion sensor including a three-axis accelerometer, a three-axis gyroscope and a three-axis magnetometer, and apply a Kalman filter for noise elimination and an adaptive time window for data extraction. Based on the feature extraction guided by the built physical model, various classifiers are accomplished to recognize different driving behaviors. Leveraging the system, normal driving behaviors (such as accelerating, braking, lane changing and turning with caution) and aggressive driving behaviors (such as accelerating, braking, lane changing and turning with a sudden) can be classified with a high accuracy of 93.25%. Compared with traditional driving behavior recognition methods using machine learning only, the proposed system possesses a solid theoretical basis, performs better and has good prospects.
NASA Astrophysics Data System (ADS)
Liu, W.; Butté, R.; Dussaigne, A.; Grandjean, N.; Deveaud, B.; Jacopin, G.
2016-11-01
We study the carrier-density-dependent recombination dynamics in m -plane InGaN/GaN multiple quantum wells in the presence of n -type background doping by time-resolved photoluminescence. Based on Fermi's golden rule and Saha's equation, we decompose the radiative recombination channel into an excitonic and an electron-hole pair contribution, and extract the injected carrier-density-dependent bimolecular recombination coefficients. Contrary to the standard electron-hole picture, our results confirm the strong influence of excitons even at room temperature. Indeed, at 300 K, excitons represent up to 63 ± 6% of the photoexcited carriers. In addition, following the Shockley-Read-Hall model, we extract the electron and hole capture rates by deep levels and demonstrate that the increase in the effective lifetime with injected carrier density is due to asymmetric capture rates in presence of an n -type background doping. Thanks to the proper determination of the density-dependent recombination coefficients up to high injection densities, our method provides a way to evaluate the importance of Auger recombination.
De La Torre-Roche, Roberto J.; Lee, Wen-Yee; Campos-Díaz, Sandra I.
2009-01-01
Ultrasonic extraction followed by Stir Bar Sorptive Extraction (SBSE) and thermal desorption inline coupled with Gas Chromatography and Mass Spectrometry (TD/GC/MS)was used to perform a comprehensive determination of soil-borne polycyclic aromatic hydrocarbons (PAHs) in El Paso, Texas. The method provided good sensitivity and faster processing time for the analysis. The total PAHs in El Paso soil ranged from 0.1 to 2225.5 µg kg−1. Although the majority of PAH concentrations did not exceed the soil screening levels regulated by the United States Environmental Protection Agency, the existence of PAHs in this ecosystem is ubiquitous. Naphthalene were found in 100% of the soil samples; while the heavy PAHs (five- and six-ring) were not often detected and mostly remained in closer proximity to industrial areas and major traffic points. The results ruled out the possibility of petroleum refining as the significant source of local soil-borne PAH contamination, but they suggested that the PAHs found in El Paso soil were closely linked to human activities and possible other industrial processes. PMID:18768257
Identification of Age-Related Macular Degeneration Using OCT Images
NASA Astrophysics Data System (ADS)
Arabi, Punal M., Dr; Krishna, Nanditha; Ashwini, V.; Prathibha, H. M.
2018-02-01
Age-related Macular Degeneration is the most leading retinal disease in the recent years. Macular degeneration occurs when the central portion of the retina, called macula deteriorates. As the deterioration occurs with the age, it is commonly referred as Age-related Macular Degeneration. This disease can be visualized by several imaging modalities such as Fundus imaging technique, Optical Coherence Tomography (OCT) technique and many other. Optical Coherence Tomography is the widely used technique for screening the Age-related Macular Degeneration disease, because it has an ability to detect the very minute changes in the retina. The Healthy and AMD affected OCT images are classified by extracting the Retinal Pigmented Epithelium (RPE) layer of the images using the image processing technique. The extracted layer is sampled, the no. of white pixels in each of the sample is counted and the mean value of the no. of pixels is calculated. The average mean value is calculated for both the Healthy and the AMD affected images and a threshold value is fixed and a decision rule is framed to classify the images of interest. The proposed method showed an accuracy of 75%.
A Novel Model-Based Driving Behavior Recognition System Using Motion Sensors
Wu, Minglin; Zhang, Sheng; Dong, Yuhan
2016-01-01
In this article, a novel driving behavior recognition system based on a specific physical model and motion sensory data is developed to promote traffic safety. Based on the theory of rigid body kinematics, we build a specific physical model to reveal the data change rule during the vehicle moving process. In this work, we adopt a nine-axis motion sensor including a three-axis accelerometer, a three-axis gyroscope and a three-axis magnetometer, and apply a Kalman filter for noise elimination and an adaptive time window for data extraction. Based on the feature extraction guided by the built physical model, various classifiers are accomplished to recognize different driving behaviors. Leveraging the system, normal driving behaviors (such as accelerating, braking, lane changing and turning with caution) and aggressive driving behaviors (such as accelerating, braking, lane changing and turning with a sudden) can be classified with a high accuracy of 93.25%. Compared with traditional driving behavior recognition methods using machine learning only, the proposed system possesses a solid theoretical basis, performs better and has good prospects. PMID:27775625
Soil quality assessment using weighted fuzzy association rules
Xue, Yue-Ju; Liu, Shu-Guang; Hu, Yue-Ming; Yang, Jing-Feng
2010-01-01
Fuzzy association rules (FARs) can be powerful in assessing regional soil quality, a critical step prior to land planning and utilization; however, traditional FARs mined from soil quality database, ignoring the importance variability of the rules, can be redundant and far from optimal. In this study, we developed a method applying different weights to traditional FARs to improve accuracy of soil quality assessment. After the FARs for soil quality assessment were mined, redundant rules were eliminated according to whether the rules were significant or not in reducing the complexity of the soil quality assessment models and in improving the comprehensibility of FARs. The global weights, each representing the importance of a FAR in soil quality assessment, were then introduced and refined using a gradient descent optimization method. This method was applied to the assessment of soil resources conditions in Guangdong Province, China. The new approach had an accuracy of 87%, when 15 rules were mined, as compared with 76% from the traditional approach. The accuracy increased to 96% when 32 rules were mined, in contrast to 88% from the traditional approach. These results demonstrated an improved comprehensibility of FARs and a high accuracy of the proposed method.
26 CFR 1.460-5 - Cost allocation rules.
Code of Federal Regulations, 2014 CFR
2014-04-01
... rules. (a) Overview. This section prescribes methods of allocating costs to long-term contracts... section provides rules concerning consistency in method of allocating costs to long-term contracts. (b... paragraph (b)(2) of this section, a taxpayer must allocate costs to each long-term contract subject to the...
26 CFR 1.460-5 - Cost allocation rules.
Code of Federal Regulations, 2012 CFR
2012-04-01
... rules. (a) Overview. This section prescribes methods of allocating costs to long-term contracts... section provides rules concerning consistency in method of allocating costs to long-term contracts. (b... paragraph (b)(2) of this section, a taxpayer must allocate costs to each long-term contract subject to the...
26 CFR 1.460-5 - Cost allocation rules.
Code of Federal Regulations, 2011 CFR
2011-04-01
... rules. (a) Overview. This section prescribes methods of allocating costs to long-term contracts... section provides rules concerning consistency in method of allocating costs to long-term contracts. (b... paragraph (b)(2) of this section, a taxpayer must allocate costs to each long-term contract subject to the...
26 CFR 1.460-5 - Cost allocation rules.
Code of Federal Regulations, 2013 CFR
2013-04-01
... rules. (a) Overview. This section prescribes methods of allocating costs to long-term contracts... section provides rules concerning consistency in method of allocating costs to long-term contracts. (b... paragraph (b)(2) of this section, a taxpayer must allocate costs to each long-term contract subject to the...
Knowledge Discovery in Variant Databases Using Inductive Logic Programming
Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D.
2013-01-01
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/. PMID:23589683
Knowledge discovery in variant databases using inductive logic programming.
Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D
2013-01-01
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.
DesAutels, Spencer J.; Fox, Zachary E.; Giuse, Dario A.; Williams, Annette M.; Kou, Qing-hua; Weitkamp, Asli; Neal R, Patel; Bettinsoli Giuse, Nunzia
2016-01-01
Clinical decision support (CDS) knowledge, embedded over time in mature medical systems, presents an interesting and complex opportunity for information organization, maintenance, and reuse. To have a holistic view of all decision support requires an in-depth understanding of each clinical system as well as expert knowledge of the latest evidence. This approach to clinical decision support presents an opportunity to unify and externalize the knowledge within rules-based decision support. Driven by an institutional need to prioritize decision support content for migration to new clinical systems, the Center for Knowledge Management and Health Information Technology teams applied their unique expertise to extract content from individual systems, organize it through a single extensible schema, and present it for discovery and reuse through a newly created Clinical Support Knowledge Acquisition and Archival Tool (CS-KAAT). CS-KAAT can build and maintain the underlying knowledge infrastructure needed by clinical systems. PMID:28269846
Intelligent bandwith compression
NASA Astrophysics Data System (ADS)
Tseng, D. Y.; Bullock, B. L.; Olin, K. E.; Kandt, R. K.; Olsen, J. D.
1980-02-01
The feasibility of a 1000:1 bandwidth compression ratio for image transmission has been demonstrated using image-analysis algorithms and a rule-based controller. Such a high compression ratio was achieved by first analyzing scene content using auto-cueing and feature-extraction algorithms, and then transmitting only the pertinent information consistent with mission requirements. A rule-based controller directs the flow of analysis and performs priority allocations on the extracted scene content. The reconstructed bandwidth-compressed image consists of an edge map of the scene background, with primary and secondary target windows embedded in the edge map. The bandwidth-compressed images are updated at a basic rate of 1 frame per second, with the high-priority target window updated at 7.5 frames per second. The scene-analysis algorithms used in this system together with the adaptive priority controller are described. Results of simulated 1000:1 band width-compressed images are presented. A video tape simulation of the Intelligent Bandwidth Compression system has been produced using a sequence of video input from the data base.
An Algorithm of Association Rule Mining for Microbial Energy Prospection
Shaheen, Muhammad; Shahbaz, Muhammad
2017-01-01
The presence of hydrocarbons beneath earth’s surface produces some microbiological anomalies in soils and sediments. The detection of such microbial populations involves pure bio chemical processes which are specialized, expensive and time consuming. This paper proposes a new algorithm of context based association rule mining on non spatial data. The algorithm is a modified form of already developed algorithm which was for spatial database only. The algorithm is applied to mine context based association rules on microbial database to extract interesting and useful associations of microbial attributes with existence of hydrocarbon reserve. The surface and soil manifestations caused by the presence of hydrocarbon oxidizing microbes are selected from existing literature and stored in a shared database. The algorithm is applied on the said database to generate direct and indirect associations among the stored microbial indicators. These associations are then correlated with the probability of hydrocarbon’s existence. The numerical evaluation shows better accuracy for non-spatial data as compared to conventional algorithms at generating reliable and robust rules. PMID:28393846
Spin structure of the neutron ({sup 3}He) and the Bjoerken sum rule
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meziani, Z.E.
1994-12-01
A first measurement of the longitudinal asymmetry of deep-inelastic scattering of polarized electrons from a polarized {sup 3}He target at energies ranging from 19 to 26 GeV has been performed at the Stanford Linear Accelerator Center (SLAC). The spin-structure function of the neutron g{sub 1}{sup n} has been extracted from the measured asymmetries. The Quark Parton Model (QPM) interpretation of the nucleon spin-structure function is examined in light of the new results. A test of the Ellis-Jaffe sum rule (E-J) on the neutron is performed at high momentum transfer and found to be satisfied. Furthermore, combining the proton results ofmore » the European Muon Collaboration (EMC) and the neutron results of E-142, the Bjoerken sum rule test is carried at high Q{sup 2} where higher order Perturbative Quantum Chromodynamics (PQCD) corrections and higher-twist corrections are smaller. The sum rule is saturated to within one standard deviation.« less
A novel approach of ensuring layout regularity correct by construction in advanced technologies
NASA Astrophysics Data System (ADS)
Ahmed, Shafquat Jahan; Vaderiya, Yagnesh; Gupta, Radhika; Parthasarathy, Chittoor; Marin, Jean-Claude; Robert, Frederic
2017-03-01
In advanced technology nodes, layout regularity has become a mandatory prerequisite to create robust designs less sensitive to variations in manufacturing process in order to improve yield and minimizing electrical variability. In this paper we describe a method for designing regular full custom layouts based on design and process co-optimization. The method includes various design rule checks that can be used on-the-fly during leaf-cell layout development. We extract a Layout Regularity Index (LRI) from the layouts based on the jogs, alignments and pitches used in the design for any given metal layer. Regularity Index of a layout is the direct indicator of manufacturing yield and is used to compare the relative health of different layout blocks in terms of process friendliness. The method has been deployed for 28nm and 40nm technology nodes for Memory IP and is being extended to other IPs (IO, standard-cell). We have quantified the gain of layout regularity with the deployed method on printability and electrical characteristics by process-variation (PV) band simulation analysis and have achieved up-to 5nm reduction in PV band.
NASA Astrophysics Data System (ADS)
Chen, Duxin; Xu, Bowen; Zhu, Tao; Zhou, Tao; Zhang, Hai-Tao
2017-08-01
Coordination shall be deemed to the result of interindividual interaction among natural gregarious animal groups. However, revealing the underlying interaction rules and decision-making strategies governing highly coordinated motion in bird flocks is still a long-standing challenge. Based on analysis of high spatial-temporal resolution GPS data of three pigeon flocks, we extract the hidden interaction principle by using a newly emerging machine learning method, namely the sparse Bayesian learning. It is observed that the interaction probability has an inflection point at pairwise distance of 3-4 m closer than the average maximum interindividual distance, after which it decays strictly with rising pairwise metric distances. Significantly, the density of spatial neighbor distribution is strongly anisotropic, with an evident lack of interactions along individual velocity. Thus, it is found that in small-sized bird flocks, individuals reciprocally cooperate with a variational number of neighbors in metric space and tend to interact with closer time-varying neighbors, rather than interacting with a fixed number of topological ones. Finally, extensive numerical investigation is conducted to verify both the revealed interaction and decision-making principle during circular flights of pigeon flocks.
Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree.
Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad
2015-01-01
MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen-host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules.
Xu, Hua; AbdelRahman, Samir; Lu, Yanxin; Denny, Joshua C.; Doan, Son
2011-01-01
Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently result in two or more parse trees. One possible solution, which has not been extensively explored previously, is to augment productions in medical sublanguage grammars with probabilities to resolve the ambiguity. In this study, we associated probabilities with production rules in a semantic-based grammar for medication findings and evaluated its performance on reducing parsing ambiguity. Using the existing data set from 2009 i2b2 NLP (Natural Language Processing) challenge for medication extraction, we developed a semantic-based CFG (Context Free Grammar) for parsing medication sentences and manually created a Treebank of 4,564 medication sentences from discharge summaries. Using the Treebank, we derived a semantic-based PCFG (probabilistic Context Free Grammar) for parsing medication sentences. Our evaluation using a 10-fold cross validation showed that the PCFG parser dramatically improved parsing performance when compared to the CFG parser. PMID:21856440
Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree
Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad
2015-01-01
MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen–host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules. PMID:26649272
McDaniel, Mark A; Cahill, Michael J; Robbins, Mathew; Wiener, Chelsea
2014-04-01
We hypothesize that during training some learners may focus on acquiring the particular exemplars and responses associated with the exemplars (termed exemplar learners), whereas other learners attempt to abstract underlying regularities reflected in the particular exemplars linked to an appropriate response (termed rule learners). Supporting this distinction, after training (on a function-learning task), participants displayed an extrapolation profile reflecting either acquisition of the trained cue-criterion associations (exemplar learners) or abstraction of the function rule (rule learners; Studies 1a and 1b). Further, working memory capacity (measured by operation span [Ospan]) was associated with the tendency to rely on rule versus exemplar processes. Studies 1c and 2 examined the persistence of these learning tendencies on several categorization tasks. Study 1c showed that rule learners were more likely than exemplar learners (indexed a priori by extrapolation profiles) to resist using idiosyncratic features (exemplar similarity) in generalization (transfer) of the trained category. Study 2 showed that the rule learners but not the exemplar learners performed well on a novel categorization task (transfer) after training on an abstract coherent category. These patterns suggest that in complex conceptual tasks, (a) individuals tend to either focus on exemplars during learning or on extracting some abstraction of the concept, (b) this tendency might be a relatively stable characteristic of the individual, and (c) transfer patterns are determined by that tendency.
Sum Rules of Charm CP Asymmetries beyond the SU(3)_{F} Limit.
Müller, Sarah; Nierste, Ulrich; Schacht, Stefan
2015-12-18
We find new sum rules between direct CP asymmetries in D meson decays with coefficients that can be determined from a global fit to branching ratio data. Our sum rules eliminate the penguin topologies P and PA, which cannot be determined from branching ratios. In this way, we can make predictions about direct CP asymmetries in the standard model without ad hoc assumptions on the sizes of penguin diagrams. We consistently include first-order SU(3)_{F} breaking in the topological amplitudes extracted from the branching ratios. By confronting our sum rules with future precise data from LHCb and Belle II, one will identify or constrain new-physics contributions to P or PA. The first sum rule correlates the CP asymmetries a_{CP}^{dir} in D^{0}→K^{+}K^{-}, D^{0}→π^{+}π^{-}, and D^{0}→π^{0}π^{0}. We study the region of the a_{CP}^{dir}(D^{0}→π^{+}π^{-})-a_{CP}^{dir}(D^{0}→π^{0}π^{0}) plane allowed by current data and find that our sum rule excludes more than half of the allowed region at 95% C.L. Our second sum rule correlates the direct CP asymmetries in D^{+}→K[over ¯]^{0}K^{+}, D_{s}^{+}→K^{0}π^{+}, and D_{s}^{+}→K^{+}π^{0}.
McDaniel, Mark A.; Cahill, Michael J.; Robbins, Mathew; Wiener, Chelsea
2013-01-01
We hypothesize that during training some learners may focus on acquiring the particular exemplars and responses associated with the exemplars (termed exemplar learners), whereas other learners attempt to abstract underlying regularities reflected in the particular exemplars linked to an appropriate response (termed rule learners). Supporting this distinction, after training (on a function-learning task), participants either displayed an extrapolation profile reflecting acquisition of the trained cue-criterion associations (exemplar learners) or abstraction of the function rule (rule learners; Studies 1a and 1b). Further, working memory capacity (measured by Ospan) was associated with the tendency to rely on rule versus exemplar processes. Studies 1c and 2 examined the persistence of these learning tendencies on several categorization tasks. Study 1c showed that rule learners were more likely than exemplar learners (indexed a priori by extrapolation profiles) to resist using idiosyncratic features (exemplar similarity) in generalization (transfer) of the trained category. Study 2 showed that the rule learners but not the exemplar learners performed well on a novel categorization task (transfer) after training on an abstract coherent category. These patterns suggest that in complex conceptual tasks, (a) individuals tend to either focus on exemplars during learning or on extracting some abstraction of the concept, (b) this tendency might be a relatively stable characteristic of the individual, and (c) transfer patterns are determined by that tendency. PMID:23750912
Automatic 3D high-fidelity traffic interchange modeling using 2D road GIS data
NASA Astrophysics Data System (ADS)
Wang, Jie; Shen, Yuzhong
2011-03-01
3D road models are widely used in many computer applications such as racing games and driving simulations. However, almost all high-fidelity 3D road models were generated manually by professional artists at the expense of intensive labor. There are very few existing methods for automatically generating 3D high-fidelity road networks, especially for those existing in the real world. Real road network contains various elements such as road segments, road intersections and traffic interchanges. Among them, traffic interchanges present the most challenges to model due to their complexity and the lack of height information (vertical position) of traffic interchanges in existing road GIS data. This paper proposes a novel approach that can automatically produce 3D high-fidelity road network models, including traffic interchange models, from real 2D road GIS data that mainly contain road centerline information. The proposed method consists of several steps. The raw road GIS data are first preprocessed to extract road network topology, merge redundant links, and classify road types. Then overlapped points in the interchanges are detected and their elevations are determined based on a set of level estimation rules. Parametric representations of the road centerlines are then generated through link segmentation and fitting, and they have the advantages of arbitrary levels of detail with reduced memory usage. Finally a set of civil engineering rules for road design (e.g., cross slope, superelevation) are selected and used to generate realistic road surfaces. In addition to traffic interchange modeling, the proposed method also applies to other more general road elements. Preliminary results show that the proposed method is highly effective and useful in many applications.
Bimodal emotion congruency is critical to preverbal infants' abstract rule learning.
Tsui, Angeline Sin Mei; Ma, Yuen Ki; Ho, Anna; Chow, Hiu Mei; Tseng, Chia-huei
2016-05-01
Extracting general rules from specific examples is important, as we must face the same challenge displayed in various formats. Previous studies have found that bimodal presentation of grammar-like rules (e.g. ABA) enhanced 5-month-olds' capacity to acquire a rule that infants failed to learn when the rule was presented with visual presentation of the shapes alone (circle-triangle-circle) or auditory presentation of the syllables (la-ba-la) alone. However, the mechanisms and constraints for this bimodal learning facilitation are still unknown. In this study, we used audio-visual relation congruency between bimodal stimulation to disentangle possible facilitation sources. We exposed 8- to 10-month-old infants to an AAB sequence consisting of visual faces with affective expressions and/or auditory voices conveying emotions. Our results showed that infants were able to distinguish the learned AAB rule from other novel rules under bimodal stimulation when the affects in audio and visual stimuli were congruently paired (Experiments 1A and 2A). Infants failed to acquire the same rule when audio-visual stimuli were incongruently matched (Experiment 2B) and when only the visual (Experiment 1B) or the audio (Experiment 1C) stimuli were presented. Our results highlight that bimodal facilitation in infant rule learning is not only dependent on better statistical probability and redundant sensory information, but also the relational congruency of audio-visual information. A video abstract of this article can be viewed at https://m.youtube.com/watch?v=KYTyjH1k9RQ. © 2015 John Wiley & Sons Ltd.
Hardware independence checkout software
NASA Technical Reports Server (NTRS)
Cameron, Barry W.; Helbig, H. R.
1990-01-01
ACSI has developed a program utilizing CLIPS to assess compliance with various programming standards. Essentially the program parses C code to extract the names of all function calls. These are asserted as CLIPS facts which also include information about line numbers, source file names, and called functions. Rules have been devised to establish functions called that have not been defined in any of the source parsed. These are compared against lists of standards (represented as facts) using rules that check intersections and/or unions of these. By piping the output into other processes the source is appropriately commented by generating and executing parsed scripts.
Product Recommendation System Based on Personal Preference Model Using CAM
NASA Astrophysics Data System (ADS)
Murakami, Tomoko; Yoshioka, Nobukazu; Orihara, Ryohei; Furukawa, Koichi
Product recommendation system is realized by applying business rules acquired by data maining techniques. Business rules such as demographical patterns of purchase, are able to cover the groups of users that have a tendency to purchase products, but it is difficult to recommend products adaptive to various personal preferences only by utilizing them. In addition to that, it is very costly to gather the large volume of high quality survey data, which is necessary for good recommendation based on personal preference model. A method collecting kansei information automatically without questionnaire survey is required. The constructing personal preference model from less favor data is also necessary, since it is costly for the user to input favor data. In this paper, we propose product recommendation system based on kansei information extracted by text mining and user's preference model constructed by Category-guided Adaptive Modeling, CAM for short. CAM is a feature construction method that can generate new features constructing the space where same labeled examples are close and different labeled examples are far away from some labeled examples. It is possible to construct personal preference model by CAM despite less information of likes and dislikes categories. In the system, retrieval agent gathers the products' specification and user agent manages preference model, user's likes and dislikes. Kansei information of the products is gained by applying text mining technique to the reputation documents about the products on the web site. We carry out some experimental studies to make sure that prefrence model obtained by our method performs effectively.
GDRMS: a system for automatic extraction of the disease-centre relation
NASA Astrophysics Data System (ADS)
Yang, Ronggen; Zhang, Yue; Gong, Lejun
2012-01-01
With the rapidly increasing of biomedical literature, the deluge of new articles is leading to information overload. Extracting the available knowledge from the huge amount of biomedical literature has become a major challenge. GDRMS is developed as a tool that extracts the relationship between disease and gene, gene and gene from biomedical literatures using text mining technology. It is a ruled-based system which also provides disease-centre network visualization, constructs the disease-gene database, and represents a gene engine for understanding the function of the gene. The main focus of GDRMS is to provide a valuable opportunity to explore the relationship between disease and gene for the research community about etiology of disease.
NASA Astrophysics Data System (ADS)
Li, Nan; Zhu, Xiufang
2017-04-01
Cultivated land resources is the key to ensure food security. Timely and accurate access to cultivated land information is conducive to a scientific planning of food production and management policies. The GaoFen 1 (GF-1) images have high spatial resolution and abundant texture information and thus can be used to identify fragmentized cultivated land. In this paper, an object-oriented artificial bee colony algorithm was proposed for extracting cultivated land from GF-1 images. Firstly, the GF-1 image was segmented by eCognition software and some samples from the segments were manually identified into 2 types (cultivated land and non-cultivated land). Secondly, the artificial bee colony (ABC) algorithm was used to search for classification rules based on the spectral and texture information extracted from the image objects. Finally, the extracted classification rules were used to identify the cultivated land area on the image. The experiment was carried out in Hongze area, Jiangsu Province using wide field-of-view sensor on the GF-1 satellite image. The total precision of classification result was 94.95%, and the precision of cultivated land was 92.85%. The results show that the object-oriented ABC algorithm can overcome the defect of insufficient spectral information in GF-1 images and obtain high precision in cultivated identification.
Sleep facilitates learning a new linguistic rule.
Batterink, Laura J; Oudiette, Delphine; Reber, Paul J; Paller, Ken A
2014-12-01
Natural languages contain countless regularities. Extraction of these patterns is an essential component of language acquisition. Here we examined the hypothesis that memory processing during sleep contributes to this learning. We exposed participants to a hidden linguistic rule by presenting a large number of two-word phrases, each including a noun preceded by one of four novel words that functioned as an article (e.g., gi rhino). These novel words (ul, gi, ro and ne) were presented as obeying an explicit rule: two words signified that the noun referent was relatively near, and two that it was relatively far. Undisclosed to participants was the fact that the novel articles also predicted noun animacy, with two of the articles preceding animate referents and the other two preceding inanimate referents. Rule acquisition was tested implicitly using a task in which participants responded to each phrase according to whether the noun was animate or inanimate. Learning of the hidden rule was evident in slower responses to phrases that violated the rule. Responses were delayed regardless of whether rule-knowledge was consciously accessible. Brain potentials provided additional confirmation of implicit and explicit rule-knowledge. An afternoon nap was interposed between two 20-min learning sessions. Participants who obtained greater amounts of both slow-wave and rapid-eye-movement sleep showed increased sensitivity to the hidden linguistic rule in the second session. We conclude that during sleep, reactivation of linguistic information linked with the rule was instrumental for stabilizing learning. The combination of slow-wave and rapid-eye-movement sleep may synergistically facilitate the abstraction of complex patterns in linguistic input. Copyright © 2014 Elsevier Ltd. All rights reserved.
Data-Driven Information Extraction from Chinese Electronic Medical Records
Zhao, Tianwan; Ge, Chen; Gao, Weiguo; Wei, Jia; Zhu, Kenny Q.
2015-01-01
Objective This study aims to propose a data-driven framework that takes unstructured free text narratives in Chinese Electronic Medical Records (EMRs) as input and converts them into structured time-event-description triples, where the description is either an elaboration or an outcome of the medical event. Materials and Methods Our framework uses a hybrid approach. It consists of constructing cross-domain core medical lexica, an unsupervised, iterative algorithm to accrue more accurate terms into the lexica, rules to address Chinese writing conventions and temporal descriptors, and a Support Vector Machine (SVM) algorithm that innovatively utilizes Normalized Google Distance (NGD) to estimate the correlation between medical events and their descriptions. Results The effectiveness of the framework was demonstrated with a dataset of 24,817 de-identified Chinese EMRs. The cross-domain medical lexica were capable of recognizing terms with an F1-score of 0.896. 98.5% of recorded medical events were linked to temporal descriptors. The NGD SVM description-event matching achieved an F1-score of 0.874. The end-to-end time-event-description extraction of our framework achieved an F1-score of 0.846. Discussion In terms of named entity recognition, the proposed framework outperforms state-of-the-art supervised learning algorithms (F1-score: 0.896 vs. 0.886). In event-description association, the NGD SVM is superior to SVM using only local context and semantic features (F1-score: 0.874 vs. 0.838). Conclusions The framework is data-driven, weakly supervised, and robust against the variations and noises that tend to occur in a large corpus. It addresses Chinese medical writing conventions and variations in writing styles through patterns used for discovering new terms and rules for updating the lexica. PMID:26295801
The Use of Object-Oriented Analysis Methods in Surety Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Craft, Richard L.; Funkhouser, Donald R.; Wyss, Gregory D.
1999-05-01
Object-oriented analysis methods have been used in the computer science arena for a number of years to model the behavior of computer-based systems. This report documents how such methods can be applied to surety analysis. By embodying the causality and behavior of a system in a common object-oriented analysis model, surety analysts can make the assumptions that underlie their models explicit and thus better communicate with system designers. Furthermore, given minor extensions to traditional object-oriented analysis methods, it is possible to automatically derive a wide variety of traditional risk and reliability analysis methods from a single common object model. Automaticmore » model extraction helps ensure consistency among analyses and enables the surety analyst to examine a system from a wider variety of viewpoints in a shorter period of time. Thus it provides a deeper understanding of a system's behaviors and surety requirements. This report documents the underlying philosophy behind the common object model representation, the methods by which such common object models can be constructed, and the rules required to interrogate the common object model for derivation of traditional risk and reliability analysis models. The methodology is demonstrated in an extensive example problem.« less
Fout, G. Shay; Cashdollar, Jennifer L.; Griffin, Shannon M.; Brinkman, Nichole E.; Varughese, Eunice A.; Parshionikar, Sandhya U.
2016-01-01
EPA Method 1615 measures enteroviruses and noroviruses present in environmental and drinking waters. This method was developed with the goal of having a standardized method for use in multiple analytical laboratories during monitoring period 3 of the Unregulated Contaminant Monitoring Rule. Herein we present the protocol for extraction of viral ribonucleic acid (RNA) from water sample concentrates and for quantitatively measuring enterovirus and norovirus concentrations using reverse transcription-quantitative PCR (RT-qPCR). Virus concentrations for the molecular assay are calculated in terms of genomic copies of viral RNA per liter based upon a standard curve. The method uses a number of quality controls to increase data quality and to reduce interlaboratory and intralaboratory variation. The method has been evaluated by examining virus recovery from ground and reagent grade waters seeded with poliovirus type 3 and murine norovirus as a surrogate for human noroviruses. Mean poliovirus recoveries were 20% in groundwaters and 44% in reagent grade water. Mean murine norovirus recoveries with the RT-qPCR assay were 30% in groundwaters and 4% in reagent grade water. PMID:26862985
Monitoring the Depth of Anesthesia Using a New Adaptive Neurofuzzy System.
Shalbaf, Ahmad; Saffar, Mohsen; Sleigh, Jamie W; Shalbaf, Reza
2018-05-01
Accurate and noninvasive monitoring of the depth of anesthesia (DoA) is highly desirable. Since the anesthetic drugs act mainly on the central nervous system, the analysis of brain activity using electroencephalogram (EEG) is very useful. This paper proposes a novel automated method for assessing the DoA using EEG. First, 11 features including spectral, fractal, and entropy are extracted from EEG signal and then, by applying an algorithm according to exhaustive search of all subsets of features, a combination of the best features (Beta-index, sample entropy, shannon permutation entropy, and detrended fluctuation analysis) is selected. Accordingly, we feed these extracted features to a new neurofuzzy classification algorithm, adaptive neurofuzzy inference system with linguistic hedges (ANFIS-LH). This structure can successfully model systems with nonlinear relationships between input and output, and also classify overlapped classes accurately. ANFIS-LH, which is based on modified classical fuzzy rules, reduces the effects of the insignificant features in input space, which causes overlapping and modifies the output layer structure. The presented method classifies EEG data into awake, light, general, and deep states during anesthesia with sevoflurane in 17 patients. Its accuracy is 92% compared to a commercial monitoring system (response entropy index) successfully. Moreover, this method reaches the classification accuracy of 93% to categorize EEG signal to awake and general anesthesia states by another database of propofol and volatile anesthesia in 50 patients. To sum up, this method is potentially applicable to a new real-time monitoring system to help the anesthesiologist with continuous assessment of DoA quickly and accurately.
Friesen, Melissa C.; Locke, Sarah J.; Tornow, Carina; Chen, Yu-Cheng; Koh, Dong-Hee; Stewart, Patricia A.; Purdue, Mark; Colt, Joanne S.
2014-01-01
Objectives: Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants’ jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios. Methods: Our study population comprised 2408 subjects, reporting 11991 jobs, from a case–control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert’s independently assigned probability ratings to evaluate whether we missed identifying possibly exposed jobs. Results: Our process added exposure variables for 52 occupation groups, 43 industry groups, and 46 task/tool/chemical scenarios to the data set of OH responses. Across all four agents, we identified possibly exposed task/tool/chemical exposure scenarios in 44–51% of the jobs in possibly exposed occupations. Possibly exposed task/tool/chemical exposure scenarios were found in a nontrivial 9–14% of the jobs not in possibly exposed occupations, suggesting that our process identified important information that would not be captured using occupation alone. Our extraction process was sensitive: for jobs where our extraction of OH responses identified no exposure scenarios and for which the sole source of information was the OH responses, only 0.1% were assessed as possibly exposed to TCE by the expert. Conclusions: Our systematic extraction of OH information found useful information in the task/chemicals/tools responses that was relatively easy to extract and that was not available from the occupational or industry information. The extracted variables can be used as inputs in the development of decision rules, especially for jobs where no additional information, such as job- and industry-specific questionnaires, is available. PMID:24590110
26 CFR 1.482-8 - Examples of the best method rule.
Code of Federal Regulations, 2011 CFR
2011-04-01
... illustrate the comparative analysis required to apply this rule. As with all of the examples in these... case. Example 10. Cost of services plus method preferred to other methods. (i) FP designs and...
26 CFR 1.482-8 - Examples of the best method rule.
Code of Federal Regulations, 2013 CFR
2013-04-01
... illustrate the comparative analysis required to apply this rule. As with all of the examples in these... case. Example 10. Cost of services plus method preferred to other methods. (i) FP designs and...
26 CFR 1.482-8 - Examples of the best method rule.
Code of Federal Regulations, 2012 CFR
2012-04-01
... illustrate the comparative analysis required to apply this rule. As with all of the examples in these... case. Example 10. Cost of services plus method preferred to other methods. (i) FP designs and...
26 CFR 1.482-8 - Examples of the best method rule.
Code of Federal Regulations, 2010 CFR
2010-04-01
... illustrate the comparative analysis required to apply this rule. As with all of the examples in these... case. Example 10. Cost of services plus method preferred to other methods. (i) FP designs and...
Defining Alcohol-Specific Rules Among Parents of Older Adolescents: Moving Beyond No Tolerance.
Bourdeau, Beth; Miller, Brenda; Vanya, Magdalena; Duke, Michael; Ames, Genevieve
2012-01-01
Parental beliefs and rules regarding their teen's use of alcohol influence teen decisions regarding alcohol use. However, measurement of parental rules regarding adolescent alcohol use has not been thoroughly studied. This study used qualitative interviews with 174 parents of older teens from 100 families. From open-ended questions, themes emerged that describe explicit rules tied to circumscribed use, no tolerance, and "call me." There was some inconsistency in explicit rules with and between parents. Responses also generated themes relating to implicit rules such as expectations and preferences. Parents described their methods of communicating their position via conversational methods, role modeling their own behavior, teaching socially appropriate use of alcohol by offering their teen alcohol, and monitoring their teens' social activities. Findings indicate that alcohol rules are not adequately captured by current assessment measures.
Defining Alcohol-Specific Rules Among Parents of Older Adolescents: Moving Beyond No Tolerance
Bourdeau, Beth; Miller, Brenda; Vanya, Magdalena; Duke, Michael; Ames, Genevieve
2012-01-01
Parental beliefs and rules regarding their teen’s use of alcohol influence teen decisions regarding alcohol use. However, measurement of parental rules regarding adolescent alcohol use has not been thoroughly studied. This study used qualitative interviews with 174 parents of older teens from 100 families. From open-ended questions, themes emerged that describe explicit rules tied to circumscribed use, no tolerance, and “call me.” There was some inconsistency in explicit rules with and between parents. Responses also generated themes relating to implicit rules such as expectations and preferences. Parents described their methods of communicating their position via conversational methods, role modeling their own behavior, teaching socially appropriate use of alcohol by offering their teen alcohol, and monitoring their teens’ social activities. Findings indicate that alcohol rules are not adequately captured by current assessment measures. PMID:23204931
Automated labeling of bibliographic data extracted from biomedical online journals
NASA Astrophysics Data System (ADS)
Kim, Jongwoo; Le, Daniel X.; Thoma, George R.
2003-01-01
A prototype system has been designed to automate the extraction of bibliographic data (e.g., article title, authors, abstract, affiliation and others) from online biomedical journals to populate the National Library of Medicine"s MEDLINE database. This paper describes a key module in this system: the labeling module that employs statistics and fuzzy rule-based algorithms to identify segmented zones in an article"s HTML pages as specific bibliographic data. Results from experiments conducted with 1,149 medical articles from forty-seven journal issues are presented.