Science.gov

Sample records for database-driven text classification

  1. Enhancing navigation in biomedical databases by community voting and database-driven text classification

    PubMed Central

    Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph

    2009-01-01

    Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled

  2. Free-Text Disease Classification

    DTIC Science & Technology

    2011-09-01

    NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS FREE-TEXT DISEASE CLASSIFICATION by Craig Maxey September 2011 Thesis Advisor: Lyn R. Whitaker...2104-10-31 Free-Text Disease Classification Craig Maxey Naval Postgraduate School Monterey, CA 93943 Department of the Navy Approved for public...POSTGRADUATE SCHOOL September 2011 Author: Craig Maxey Approved by: Lyn R. Whitaker Thesis Advisor Samuel E. Buttrey Second Reader Robert F. Dell Chair

  3. Text Classification for Organizational Researchers

    PubMed Central

    Kobayashi, Vladimer B.; Mol, Stefan T.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.

    2017-01-01

    Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output. PMID:29881249

  4. Transfer Learning beyond Text Classification

    NASA Astrophysics Data System (ADS)

    Yang, Qiang

    Transfer learning is a new machine learning and data mining framework that allows the training and test data to come from different distributions or feature spaces. We can find many novel applications of machine learning and data mining where transfer learning is necessary. While much has been done in transfer learning in text classification and reinforcement learning, there has been a lack of documented success stories of novel applications of transfer learning in other areas. In this invited article, I will argue that transfer learning is in fact quite ubiquitous in many real world applications. In this article, I will illustrate this point through an overview of a broad spectrum of applications of transfer learning that range from collaborative filtering to sensor based location estimation and logical action model learning for AI planning. I will also discuss some potential future directions of transfer learning.

  5. Research on Classification of Chinese Text Data Based on SVM

    NASA Astrophysics Data System (ADS)

    Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao

    2017-09-01

    Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.

  6. Text Classification for Intelligent Portfolio Management

    DTIC Science & Technology

    2002-05-01

    years including nearest neighbor classification [15], naive Bayes with EM (Ex- pectation Maximization) [11] [13], Winnow with active learning [10... Active Learning and Expectation Maximization (EM). In particular, active learning is used to actively select documents for labeling, then EM assigns...generalization with active learning . Machine Learning, 15(2):201–221, 1994. [3] I. Dagan and P. Engelson. Committee-based sampling for training

  7. Protein classification based on text document classification techniques.

    PubMed

    Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith

    2005-03-01

    The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.

  8. A Data Augmentation Approach to Short Text Classification

    ERIC Educational Resources Information Center

    Rosario, Ryan Robert

    2017-01-01

    Text classification typically performs best with large training sets, but short texts are very common on the World Wide Web. Can we use resampling and data augmentation to construct larger texts using similar terms? Several current methods exist for working with short text that rely on using external data and contexts, or workarounds. Our focus is…

  9. Just-in-time Database-Driven Web Applications

    PubMed Central

    2003-01-01

    "Just-in-time" database-driven Web applications are inexpensive, quickly-developed software that can be put to many uses within a health care organization. Database-driven Web applications garnered 73873 hits on our system-wide intranet in 2002. They enabled collaboration and communication via user-friendly Web browser-based interfaces for both mission-critical and patient-care-critical functions. Nineteen database-driven Web applications were developed. The application categories that comprised 80% of the hits were results reporting (27%), graduate medical education (26%), research (20%), and bed availability (8%). The mean number of hits per application was 3888 (SD = 5598; range, 14-19879). A model is described for just-in-time database-driven Web application development and an example given with a popular HTML editor and database program. PMID:14517109

  10. Comparative Analysis of Document level Text Classification Algorithms using R

    NASA Astrophysics Data System (ADS)

    Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.

    2017-08-01

    From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.

  11. Con-Text: Text Detection for Fine-grained Object Classification.

    PubMed

    Karaoglu, Sezer; Tao, Ran; van Gemert, Jan C; Gevers, Theo

    2017-05-24

    This work focuses on fine-grained object classification using recognized scene text in natural images. While the state-of-the-art relies on visual cues only, this paper is the first work which proposes to combine textual and visual cues. Another novelty is the textual cue extraction. Unlike the state-of-the-art text detection methods, we focus more on the background instead of text regions. Once text regions are detected, they are further processed by two methods to perform text recognition i.e. ABBYY commercial OCR engine and a state-of-the-art character recognition algorithm. Then, to perform textual cue encoding, bi- and trigrams are formed between the recognized characters by considering the proposed spatial pairwise constraints. Finally, extracted visual and textual cues are combined for fine-grained classification. The proposed method is validated on four publicly available datasets: ICDAR03, ICDAR13, Con-Text and Flickr-logo. We improve the state-of-the-art end-to-end character recognition by a large margin of 15% on ICDAR03. We show that textual cues are useful in addition to visual cues for fine-grained classification. We show that textual cues are also useful for logo retrieval. Adding textual cues outperforms visual- and textual-only in fine-grained classification (70.7% to 60.3%) and logo retrieval (57.4% to 54.8%).

  12. Simple-random-sampling-based multiclass text classification algorithm.

    PubMed

    Liu, Wuying; Wang, Lin; Yi, Mianzhu

    2014-01-01

    Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.

  13. PDF text classification to leverage information extraction from publication reports.

    PubMed

    Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha

    2016-06-01

    Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi

  14. Using statistical text classification to identify health information technology incidents

    PubMed Central

    Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah

    2013-01-01

    Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777

  15. Implementing a Dynamic Database-Driven Course Using LAMP

    ERIC Educational Resources Information Center

    Laverty, Joseph Packy; Wood, David; Turchek, John

    2011-01-01

    This paper documents the formulation of a database driven open source architecture web development course. The design of a web-based curriculum faces many challenges: a) relative emphasis of client and server-side technologies, b) choice of a server-side language, and c) the cost and efficient delivery of a dynamic web development, database-driven…

  16. Visual affective classification by combining visual and text features.

    PubMed

    Liu, Ningning; Wang, Kai; Jin, Xin; Gao, Boyang; Dellandréa, Emmanuel; Chen, Liming

    2017-01-01

    Affective analysis of images in social networks has drawn much attention, and the texts surrounding images are proven to provide valuable semantic meanings about image content, which can hardly be represented by low-level visual features. In this paper, we propose a novel approach for visual affective classification (VAC) task. This approach combines visual representations along with novel text features through a fusion scheme based on Dempster-Shafer (D-S) Evidence Theory. Specifically, we not only investigate different types of visual features and fusion methods for VAC, but also propose textual features to effectively capture emotional semantics from the short text associated to images based on word similarity. Experiments are conducted on three public available databases: the International Affective Picture System (IAPS), the Artistic Photos and the MirFlickr Affect set. The results demonstrate that the proposed approach combining visual and textual features provides promising results for VAC task.

  17. Visual affective classification by combining visual and text features

    PubMed Central

    Liu, Ningning; Wang, Kai; Jin, Xin; Gao, Boyang; Dellandréa, Emmanuel; Chen, Liming

    2017-01-01

    Affective analysis of images in social networks has drawn much attention, and the texts surrounding images are proven to provide valuable semantic meanings about image content, which can hardly be represented by low-level visual features. In this paper, we propose a novel approach for visual affective classification (VAC) task. This approach combines visual representations along with novel text features through a fusion scheme based on Dempster-Shafer (D-S) Evidence Theory. Specifically, we not only investigate different types of visual features and fusion methods for VAC, but also propose textual features to effectively capture emotional semantics from the short text associated to images based on word similarity. Experiments are conducted on three public available databases: the International Affective Picture System (IAPS), the Artistic Photos and the MirFlickr Affect set. The results demonstrate that the proposed approach combining visual and textual features provides promising results for VAC task. PMID:28850566

  18. Improving imbalanced scientific text classification using sampling strategies and dictionaries.

    PubMed

    Borrajo, L; Romero, R; Iglesias, E L; Redondo Marey, C M

    2011-09-15

    Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation. Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. In this work, we study their effects on three types of classifiers (Knn, SVM and Naive-Bayes) when they are applied to search on the PubMed scientific database. Another purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative, NLPBA, and an ad-hoc subset of the UniProt database named Protein) using the mentioned classifiers and sampling strategies. Best results were obtained with NLPBA and Protein dictionaries and the SVM classifier using the Subsampling balancing technique. These results were compared with those obtained by other authors using the TREC Genomics 2005 public corpus. Copyright 2011 The Author(s). Published by Journal of Integrative Bioinformatics.

  19. Text Classification for Assisting Moderators in Online Health Communities

    PubMed Central

    Huh, Jina; Yetisgen-Yildiz, Meliha; Pratt, Wanda

    2013-01-01

    Objectives Patients increasingly visit online health communities to get help on managing health. The large scale of these online communities makes it impossible for the moderators to engage in all conversations; yet, some conversations need their expertise. Our work explores low-cost text classification methods to this new domain of determining whether a thread in an online health forum needs moderators’ help. Methods We employed a binary classifier on WebMD’s online diabetes community data. To train the classifier, we considered three feature types: (1) word unigram, (2) sentiment analysis features, and (3) thread length. We applied feature selection methods based on χ2 statistics and under sampling to account for unbalanced data. We then performed a qualitative error analysis to investigate the appropriateness of the gold standard. Results Using sentiment analysis features, feature selection methods, and balanced training data increased the AUC value up to 0.75 and the F1-score up to 0.54 compared to the baseline of using word unigrams with no feature selection methods on unbalanced data (0.65 AUC and 0.40 F1-score). The error analysis uncovered additional reasons for why moderators respond to patients’ posts. Discussion We showed how feature selection methods and balanced training data can improve the overall classification performance. We present implications of weighing precision versus recall for assisting moderators of online health communities. Our error analysis uncovered social, legal, and ethical issues around addressing community members’ needs. We also note challenges in producing a gold standard, and discuss potential solutions for addressing these challenges. Conclusion Social media environments provide popular venues in which patients gain health-related information. Our work contributes to understanding scalable solutions for providing moderators’ expertise in these large-scale, social media environments. PMID:24025513

  20. Construction accident narrative classification: An evaluation of text mining techniques.

    PubMed

    Goh, Yang Miang; Ubeynarayana, C U

    2017-11-01

    Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and F1 score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Database Driven 6-DOF Trajectory Simulation for Debris Transport Analysis

    NASA Technical Reports Server (NTRS)

    West, Jeff

    2008-01-01

    Debris mitigation and risk assessment have been carried out by NASA and its contractors supporting Space Shuttle Return-To-Flight (RTF). As a part of this assessment, analysis of transport potential for debris that may be liberated from the vehicle or from pad facilities prior to tower clear (Lift-Off Debris) is being performed by MSFC. This class of debris includes plume driven and wind driven sources for which lift as well as drag are critical for the determination of the debris trajectory. As a result, NASA MSFC has a need for a debris transport or trajectory simulation that supports the computation of lift effect in addition to drag without the computational expense of fully coupled CFD with 6-DOF. A database driven 6-DOF simulation that uses aerodynamic force and moment coefficients for the debris shape that are interpolated from a database has been developed to meet this need. The design, implementation, and verification of the database driven six degree of freedom (6-DOF) simulation addition to the Lift-Off Debris Transport Analysis (LODTA) software are discussed in this paper.

  2. Investigation into Text Classification With Kernel Based Schemes

    DTIC Science & Technology

    2010-03-01

    Document Matrix TDMs Term-Document Matrices TMG Text to Matrix Generator TN True Negative TP True Positive VSM Vector Space Model xxii THIS PAGE...are represented as a term-document matrix, common evaluation metrics, and the software package Text to Matrix Generator ( TMG ). The classifier...AND METRICS This chapter introduces the indexing capabilities of the Text to Matrix Generator ( TMG ) Toolbox. Specific attention is placed on the

  3. The Form is the Substance: Classification of Genres in Text

    DTIC Science & Technology

    2001-01-01

    recipients and time of posting. This yielded improved results, but no results on use of domain specific features alone are presented. Pannu and Sycara (1996...Recognition. Pannu , Anandeep, Sycara. (1996) "Learning Text Filtering Preferences", Symposium on Machine Learning and Information Processing, AAAI

  4. Database-driven web interface automating gyrokinetic simulations for validation

    NASA Astrophysics Data System (ADS)

    Ernst, D. R.

    2010-11-01

    We are developing a web interface to connect plasma microturbulence simulation codes with experimental data. The website automates the preparation of gyrokinetic simulations utilizing plasma profile and magnetic equilibrium data from TRANSP analysis of experiments, read from MDSPLUS over the internet. This database-driven tool saves user sessions, allowing searches of previous simulations, which can be restored to repeat the same analysis for a new discharge. The website includes a multi-tab, multi-frame, publication quality java plotter Webgraph, developed as part of this project. Input files can be uploaded as templates and edited with context-sensitive help. The website creates inputs for GS2 and GYRO using a well-tested and verified back-end, in use for several years for the GS2 code [D. R. Ernst et al., Phys. Plasmas 11(5) 2637 (2004)]. A centralized web site has the advantage that users receive bug fixes instantaneously, while avoiding the duplicated effort of local compilations. Possible extensions to the database to manage run outputs, toward prototyping for the Fusion Simulation Project, are envisioned. Much of the web development utilized support from the DoE National Undergraduate Fellowship program [e.g., A. Suarez and D. R. Ernst, http://meetings.aps.org/link/BAPS.2005.DPP.GP1.57.

  5. 77 FR 60475 - Draft of SWGDOC Standard Classification of Typewritten Text

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-03

    ... DEPARTMENT OF JUSTICE Office of Justice Programs [OJP (NIJ) Docket No. 1607] Draft of SWGDOC Standard Classification of Typewritten Text AGENCY: National Institute of Justice, DOJ. ACTION: Notice and..., ``SWGDOC Standard Classification of Typewritten Text''. The opportunity to provide comments on this...

  6. Drug related webpages classification using images and text information based on multi-kernel learning

    NASA Astrophysics Data System (ADS)

    Hu, Ruiguang; Xiao, Liping; Zheng, Wenjuan

    2015-12-01

    In this paper, multi-kernel learning(MKL) is used for drug-related webpages classification. First, body text and image-label text are extracted through HTML parsing, and valid images are chosen by the FOCARSS algorithm. Second, text based BOW model is used to generate text representation, and image-based BOW model is used to generate images representation. Last, text and images representation are fused with a few methods. Experimental results demonstrate that the classification accuracy of MKL is higher than those of all other fusion methods in decision level and feature level, and much higher than the accuracy of single-modal classification.

  7. Healthcare Text Classification System and its Performance Evaluation: A Source of Better Intelligence by Characterizing Healthcare Text.

    PubMed

    Srivastava, Saurabh Kumar; Singh, Sandeep Kumar; Suri, Jasjit S

    2018-04-13

    A machine learning (ML)-based text classification system has several classifiers. The performance evaluation (PE) of the ML system is typically driven by the training data size and the partition protocols used. Such systems lead to low accuracy because the text classification systems lack the ability to model the input text data in terms of noise characteristics. This research study proposes a concept of misrepresentation ratio (MRR) on input healthcare text data and models the PE criteria for validating the hypothesis. Further, such a novel system provides a platform to amalgamate several attributes of the ML system such as: data size, classifier type, partitioning protocol and percentage MRR. Our comprehensive data analysis consisted of five types of text data sets (TwitterA, WebKB4, Disease, Reuters (R8), and SMS); five kinds of classifiers (support vector machine with linear kernel (SVM-L), MLP-based neural network, AdaBoost, stochastic gradient descent and decision tree); and five types of training protocols (K2, K4, K5, K10 and JK). Using the decreasing order of MRR, our ML system demonstrates the mean classification accuracies as: 70.13 ± 0.15%, 87.34 ± 0.06%, 93.73 ± 0.03%, 94.45 ± 0.03% and 97.83 ± 0.01%, respectively, using all the classifiers and protocols. The corresponding AUC is 0.98 for SMS data using Multi-Layer Perceptron (MLP) based neural network. All the classifiers, the best accuracy of 91.84 ± 0.04% is shown to be of MLP-based neural network and this is 6% better over previously published. Further we observed that as MRR decreases, the system robustness increases and validated by standard deviations. The overall text system accuracy using all data types, classifiers, protocols is 89%, thereby showing the entire ML system to be novel, robust and unique. The system is also tested for stability and reliability.

  8. Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa

    2018-07-01

    Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  9. Bibliographic Classification Theory and Text Linguistics: Aboutness Analysis, Intertextuality and the Cognitive Act of Classifying Documents.

    ERIC Educational Resources Information Center

    Beghtol, Clare

    1986-01-01

    Explicates a definition and theory of "aboutness" and aboutness analysis developed by text linguist van Dijk; explores implications of text linguistics for bibliographic classification theory; suggests the elements that a theory of the cognitive process of classifying documents needs to encompass; and delineates how people identify…

  10. Methodology for the Evaluation of the Algorithms for Text Line Segmentation Based on Extended Binary Classification

    NASA Astrophysics Data System (ADS)

    Brodic, D.

    2011-01-01

    Text line segmentation represents the key element in the optical character recognition process. Hence, testing of text line segmentation algorithms has substantial relevance. All previously proposed testing methods deal mainly with text database as a template. They are used for testing as well as for the evaluation of the text segmentation algorithm. In this manuscript, methodology for the evaluation of the algorithm for text segmentation based on extended binary classification is proposed. It is established on the various multiline text samples linked with text segmentation. Their results are distributed according to binary classification. Final result is obtained by comparative analysis of cross linked data. At the end, its suitability for different types of scripts represents its main advantage.

  11. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    PubMed

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  12. Automatic topic identification of health-related messages in online health community using text classification.

    PubMed

    Lu, Yingjie

    2013-01-01

    To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.

  13. Cue-based assertion classification for Swedish clinical text – developing a lexicon for pyConTextSwe

    PubMed Central

    Velupillai, Sumithra; Skeppstedt, Maria; Kvist, Maria; Mowery, Danielle; Chapman, Brian E.; Dalianis, Hercules; Chapman, Wendy W.

    2014-01-01

    Objective The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. Methods and material We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe’s performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system’s final performance. Results Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83% F-score, overall). The system’s final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. Conclusions We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available. PMID:24556644

  14. Classification of hepatocellular carcinoma stages from free-text clinical and radiology reports

    PubMed Central

    Yim, Wen-wai; Kwan, Sharon W; Johnson, Guy; Yetisgen, Meliha

    2017-01-01

    Cancer stage information is important for clinical research. However, they are not always explicitly noted in electronic medical records. In this paper, we present our work on automatic classification of hepatocellular carcinoma (HCC) stages from free-text clinical and radiology notes. To accomplish this, we defined 11 stage parameters used in the three HCC staging systems, American Joint Committee on Cancer (AJCC), Barcelona Clinic Liver Cancer (BCLC), and Cancer of the Liver Italian Program (CLIP). After aggregating stage parameters to the patient-level, the final stage classifications were achieved using an expert-created decision logic. Each stage parameter relevant for staging was extracted using several classification methods, e.g. sentence classification and automatic information structuring, to identify and normalize text as cancer stage parameter values. Stage parameter extraction for the test set performed at 0.81 F1. Cancer stage prediction for AJCC, BCLC, and CLIP stage classifications were 0.55, 0.50, and 0.43 F1.

  15. Using Computational Text Classification for Qualitative Research and Evaluation in Extension

    ERIC Educational Resources Information Center

    Smith, Justin G.; Tissing, Reid

    2018-01-01

    This article introduces a process for computational text classification that can be used in a variety of qualitative research and evaluation settings. The process leverages supervised machine learning based on an implementation of a multinomial Bayesian classifier. Applied to a community of inquiry framework, the algorithm was used to identify…

  16. Portable automatic text classification for adverse drug reaction detection via multi-corpus training.

    PubMed

    Sarker, Abeed; Gonzalez, Graciela

    2015-02-01

    Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments

  17. Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

    PubMed Central

    Gonzalez, Graciela

    2014-01-01

    Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing

  18. A Novel Feature Selection Technique for Text Classification Using Naïve Bayes.

    PubMed

    Dey Sarkar, Subhajit; Goswami, Saptarsi; Agarwal, Aman; Aktar, Javed

    2014-01-01

    With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

  19. A stylistic classification of Russian-language texts based on the random walk model

    NASA Astrophysics Data System (ADS)

    Kramarenko, A. A.; Nekrasov, K. A.; Filimonov, V. V.; Zhivoderov, A. A.; Amieva, A. A.

    2017-09-01

    A formal approach to text analysis is suggested that is based on the random walk model. The frequencies and reciprocal positions of the vowel letters are matched up by a process of quasi-particle migration. Statistically significant difference in the migration parameters for the texts of different functional styles is found. Thus, a possibility of classification of texts using the suggested method is demonstrated. Five groups of the texts are singled out that can be distinguished from one another by the parameters of the quasi-particle migration process.

  20. Applying Active Learning to Assertion Classification of Concepts in Clinical Text

    PubMed Central

    Chen, Yukun; Mani, Subramani; Xu, Hua

    2012-01-01

    Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC – 0.7715) than the passive learning method (random sampling) (ALC – 0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort. PMID:22127105

  1. Classification of Traffic Related Short Texts to Analyse Road Problems in Urban Areas

    NASA Astrophysics Data System (ADS)

    Saldana-Perez, A. M. M.; Moreno-Ibarra, M.; Tores-Ruiz, M.

    2017-09-01

    The Volunteer Geographic Information (VGI) can be used to understand the urban dynamics. In the classification of traffic related short texts to analyze road problems in urban areas, a VGI data analysis is done over a social media's publications, in order to classify traffic events at big cities that modify the movement of vehicles and people through the roads, such as car accidents, traffic and closures. The classification of traffic events described in short texts is done by applying a supervised machine learning algorithm. In the approach users are considered as sensors which describe their surroundings and provide their geographic position at the social network. The posts are treated by a text mining process and classified into five groups. Finally, the classified events are grouped in a data corpus and geo-visualized in the study area, to detect the places with more vehicular problems.

  2. Comparisons and Selections of Features and Classifiers for Short Text Classification

    NASA Astrophysics Data System (ADS)

    Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi

    2017-10-01

    Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.

  3. Symbolic rule-based classification of lung cancer stages from free-text pathology reports.

    PubMed

    Nguyen, Anthony N; Lawley, Michael J; Hansen, David P; Bowman, Rayleen V; Clarke, Belinda E; Duhig, Edwina E; Colquist, Shoni

    2010-01-01

    To classify automatically lung tumor-node-metastases (TNM) cancer stages from free-text pathology reports using symbolic rule-based classification. By exploiting report substructure and the symbolic manipulation of systematized nomenclature of medicine-clinical terms (SNOMED CT) concepts in reports, statements in free text can be evaluated for relevance against factors relating to the staging guidelines. Post-coordinated SNOMED CT expressions based on templates were defined and populated by concepts in reports, and tested for subsumption by staging factors. The subsumption results were used to build logic according to the staging guidelines to calculate the TNM stage. The accuracy measure and confusion matrices were used to evaluate the TNM stages classified by the symbolic rule-based system. The system was evaluated against a database of multidisciplinary team staging decisions and a machine learning-based text classification system using support vector machines. Overall accuracy on a corpus of pathology reports for 718 lung cancer patients against a database of pathological TNM staging decisions were 72%, 78%, and 94% for T, N, and M staging, respectively. The system's performance was also comparable to support vector machine classification approaches. A system to classify lung TNM stages from free-text pathology reports was developed, and it was verified that the symbolic rule-based approach using SNOMED CT can be used for the extraction of key lung cancer characteristics from free-text reports. Future work will investigate the applicability of using the proposed methodology for extracting other cancer characteristics and types.

  4. Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

    PubMed Central

    Dai, Jin; Liu, Xin

    2014-01-01

    The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC) is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers. PMID:24711737

  5. Short text sentiment classification based on feature extension and ensemble classifier

    NASA Astrophysics Data System (ADS)

    Liu, Yang; Zhu, Xie

    2018-05-01

    With the rapid development of Internet social media, excavating the emotional tendencies of the short text information from the Internet, the acquisition of useful information has attracted the attention of researchers. At present, the commonly used can be attributed to the rule-based classification and statistical machine learning classification methods. Although micro-blog sentiment analysis has made good progress, there still exist some shortcomings such as not highly accurate enough and strong dependence from sentiment classification effect. Aiming at the characteristics of Chinese short texts, such as less information, sparse features, and diverse expressions, this paper considers expanding the original text by mining related semantic information from the reviews, forwarding and other related information. First, this paper uses Word2vec to compute word similarity to extend the feature words. And then uses an ensemble classifier composed of SVM, KNN and HMM to analyze the emotion of the short text of micro-blog. The experimental results show that the proposed method can make good use of the comment forwarding information to extend the original features. Compared with the traditional method, the accuracy, recall and F1 value obtained by this method have been improved.

  6. Using complex networks for text classification: Discriminating informative and imaginative documents

    NASA Astrophysics Data System (ADS)

    de Arruda, Henrique F.; Costa, Luciano da F.; Amancio, Diego R.

    2016-01-01

    Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.

  7. Using clustering and a modified classification algorithm for automatic text summarization

    NASA Astrophysics Data System (ADS)

    Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar

    2013-01-01

    In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.

  8. Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification.

    PubMed

    Yi, Chucai; Tian, Yingli

    2012-09-01

    In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.

  9. Relevance popularity: A term event model based feature selection scheme for text classification.

    PubMed

    Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao

    2017-01-01

    Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.

  10. Automated Classification of Selected Data Elements from Free-text Diagnostic Reports for Clinical Research.

    PubMed

    Löpprich, Martin; Krauss, Felix; Ganzinger, Matthias; Senghas, Karsten; Riezler, Stefan; Knaup, Petra

    2016-08-05

    In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass classification of free-text diagnostic reports to automatically document the diagnosis and state of disease of myeloma patients. The first objective was to create a corpus consisting of free-text diagnosis paragraphs of patients with multiple myeloma from German diagnostic reports, and its manual annotation of relevant data elements by documentation specialists. The second objective was to construct and evaluate a framework using different NLP methods to enable automatic multiclass classification of relevant data elements from free-text diagnostic reports. The main diagnoses paragraph was extracted from the clinical report of one third randomly selected patients of the multiple myeloma research database from Heidelberg University Hospital (in total 737 selected patients). An EDC system was setup and two data entry specialists performed independently a manual documentation of at least nine specific data elements for multiple myeloma characterization. Both data entries were compared and assessed by a third specialist and an annotated text corpus was created. A framework was constructed, consisting of a self-developed package to split multiple diagnosis sequences into several subsequences, four different preprocessing steps to normalize the input data and two classifiers: a maximum entropy classifier (MEC) and a support vector machine (SVM). In total 15 different pipelines were examined and assessed by a ten-fold cross-validation, reiterated 100 times. For quality indication the average error rate and the average F1-score were conducted. For significance testing the approximate randomization test was used. The created annotated corpus consists of 737 different diagnoses paragraphs with a

  11. Automatic classification of diseases from free-text death certificates for real-time surveillance.

    PubMed

    Koopman, Bevan; Karimi, Sarvnaz; Nguyen, Anthony; McGuire, Rhydwyn; Muscatello, David; Kemp, Madonna; Truran, Donna; Zhang, Ming; Thackway, Sarah

    2015-07-15

    Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV. Two classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules. These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate. An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000-2007 in New South Wales, Australia. Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness. A detailed error analysis was performed on classification errors. Classification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 0.96). More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 0.80). The error analysis revealed that word variations as well as certain word combinations adversely affected classification. In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness. The high accuracy and low cost of the classification methods allow for an

  12. Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports

    PubMed Central

    Wilson, Richard A.; Chapman, Wendy W.; DeFries, Shawn J.; Becich, Michael J.; Chapman, Brian E.

    2010-01-01

    Background: Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents. Methods: Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system. Results: Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively. Conclusion: We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference

  13. Scaling up the evaluation of psychotherapy: evaluating motivational interviewing fidelity via statistical text classification

    PubMed Central

    2014-01-01

    Background Behavioral interventions such as psychotherapy are leading, evidence-based practices for a variety of problems (e.g., substance abuse), but the evaluation of provider fidelity to behavioral interventions is limited by the need for human judgment. The current study evaluated the accuracy of statistical text classification in replicating human-based judgments of provider fidelity in one specific psychotherapy—motivational interviewing (MI). Method Participants (n = 148) came from five previously conducted randomized trials and were either primary care patients at a safety-net hospital or university students. To be eligible for the original studies, participants met criteria for either problematic drug or alcohol use. All participants received a type of brief motivational interview, an evidence-based intervention for alcohol and substance use disorders. The Motivational Interviewing Skills Code is a standard measure of MI provider fidelity based on human ratings that was used to evaluate all therapy sessions. A text classification approach called a labeled topic model was used to learn associations between human-based fidelity ratings and MI session transcripts. It was then used to generate codes for new sessions. The primary comparison was the accuracy of model-based codes with human-based codes. Results Receiver operating characteristic (ROC) analyses of model-based codes showed reasonably strong sensitivity and specificity with those from human raters (range of area under ROC curve (AUC) scores: 0.62 – 0.81; average AUC: 0.72). Agreement with human raters was evaluated based on talk turns as well as code tallies for an entire session. Generated codes had higher reliability with human codes for session tallies and also varied strongly by individual code. Conclusion To scale up the evaluation of behavioral interventions, technological solutions will be required. The current study demonstrated preliminary, encouraging findings regarding the utility

  14. Active learning for clinical text classification: is it better than random sampling?

    PubMed

    Figueroa, Rosa L; Zeng-Treitler, Qing; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P

    2012-01-01

    This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty.

  15. Active learning for clinical text classification: is it better than random sampling?

    PubMed Central

    Figueroa, Rosa L; Ngo, Long H; Goryachev, Sergey; Wiechmann, Eduardo P

    2012-01-01

    Objective This study explores active learning algorithms as a way to reduce the requirements for large training sets in medical text classification tasks. Design Three existing active learning algorithms (distance-based (DIST), diversity-based (DIV), and a combination of both (CMB)) were used to classify text from five datasets. The performance of these algorithms was compared to that of passive learning on the five datasets. We then conducted a novel investigation of the interaction between dataset characteristics and the performance results. Measurements Classification accuracy and area under receiver operating characteristics (ROC) curves for each algorithm at different sample sizes were generated. The performance of active learning algorithms was compared with that of passive learning using a weighted mean of paired differences. To determine why the performance varies on different datasets, we measured the diversity and uncertainty of each dataset using relative entropy and correlated the results with the performance differences. Results The DIST and CMB algorithms performed better than passive learning. With a statistical significance level set at 0.05, DIST outperformed passive learning in all five datasets, while CMB was found to be better than passive learning in four datasets. We found strong correlations between the dataset diversity and the DIV performance, as well as the dataset uncertainty and the performance of the DIST algorithm. Conclusion For medical text classification, appropriate active learning algorithms can yield performance comparable to that of passive learning with considerably smaller training sets. In particular, our results suggest that DIV performs better on data with higher diversity and DIST on data with lower uncertainty. PMID:22707743

  16. Scene text detection via extremal region based double threshold convolutional network classification

    PubMed Central

    Zhu, Wei; Lou, Jing; Chen, Longtao; Xia, Qingyuan

    2017-01-01

    In this paper, we present a robust text detection approach in natural images which is based on region proposal mechanism. A powerful low-level detector named saliency enhanced-MSER extended from the widely-used MSER is proposed by incorporating saliency detection methods, which ensures a high recall rate. Given a natural image, character candidates are extracted from three channels in a perception-based illumination invariant color space by saliency-enhanced MSER algorithm. A discriminative convolutional neural network (CNN) is jointly trained with multi-level information including pixel-level and character-level information as character candidate classifier. Each image patch is classified as strong text, weak text and non-text by double threshold filtering instead of conventional one-step classification, leveraging confident scores obtained via CNN. To further prune non-text regions, we develop a recursive neighborhood search algorithm to track credible texts from weak text set. Finally, characters are grouped into text lines using heuristic features such as spatial location, size, color, and stroke width. We compare our approach with several state-of-the-art methods, and experiments show that our method achieves competitive performance on public datasets ICDAR 2011 and ICDAR 2013. PMID:28820891

  17. Generating Automated Text Complexity Classifications That Are Aligned with Targeted Text Complexity Standards. Research Report. ETS RR-10-28

    ERIC Educational Resources Information Center

    Sheehan, Kathleen M.; Kostin, Irene; Futagi, Yoko; Flor, Michael

    2010-01-01

    The Common Core Standards call for students to be exposed to a much greater level of text complexity than has been the norm in schools for the past 40 years. Textbook publishers, teachers, and assessment developers are being asked to refocus materials and methods to ensure that students are challenged to read texts at steadily increasing…

  18. A Feature Selection Method Based on Fisher's Discriminant Ratio for Text Sentiment Classification

    NASA Astrophysics Data System (ADS)

    Wang, Suge; Li, Deyu; Wei, Yingjie; Li, Hongxia

    With the rapid growth of e-commerce, product reviews on the Web have become an important information source for customers' decision making when they intend to buy some product. As the reviews are often too many for customers to go through, how to automatically classify them into different sentiment orientation categories (i.e. positive/negative) has become a research problem. In this paper, based on Fisher's discriminant ratio, an effective feature selection method is proposed for product review text sentiment classification. In order to validate the validity of the proposed method, we compared it with other methods respectively based on information gain and mutual information while support vector machine is adopted as the classifier. In this paper, 6 subexperiments are conducted by combining different feature selection methods with 2 kinds of candidate feature sets. Under 1006 review documents of cars, the experimental results indicate that the Fisher's discriminant ratio based on word frequency estimation has the best performance with F value 83.3% while the candidate features are the words which appear in both positive and negative texts.

  19. Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

    PubMed Central

    Shatkay, Hagit; Pan, Fengxia; Rzhetsky, Andrey; Wilbur, W. John

    2008-01-01

    Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ‘average biologist’ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. Results: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice. Contact: shatkay@cs.queensu.ca PMID:18718948

  20. An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics

    PubMed Central

    Torii, Manabu; Yin, Lanlan; Nguyen, Thang; Mazumdar, Chand T.; Liu, Hongfang; Hartley, David M.; Nelson, Noele P.

    2014-01-01

    Purpose Early detection of infectious disease outbreaks is crucial to protecting the public health of a society. Online news articles provide timely information on disease outbreaks worldwide. In this study, we investigated automated detection of articles relevant to disease outbreaks using machine learning classifiers. In a real-life setting, it is expensive to prepare a training data set for classifiers, which usually consists of manually labeled relevant and irrelevant articles. To mitigate this challenge, we examined the use of randomly sampled unlabeled articles as well as labeled relevant articles. Methods Naïve Bayes and Support Vector Machine (SVM) classifiers were trained on 149 relevant and 149 or more randomly sampled unlabeled articles. Diverse classifiers were trained by varying the number of sampled unlabeled articles and also the number of word features. The trained classifiers were applied to 15 thousand articles published over 15 days. Top-ranked articles from each classifier were pooled and the resulting set of 1337 articles was reviewed by an expert analyst to evaluate the classifiers. Results Daily averages of areas under ROC curves (AUCs) over the 15-day evaluation period were 0.841 and 0.836, respectively, for the naïve Bayes and SVM classifier. We referenced a database of disease outbreak reports to confirm that this evaluation data set resulted from the pooling method indeed covered incidents recorded in the database during the evaluation period. Conclusions The proposed text classification framework utilizing randomly sampled unlabeled articles can facilitate a cost-effective approach to training machine learning classifiers in a real-life Internet-based biosurveillance project. We plan to examine this framework further using larger data sets and using articles in non-English languages. PMID:21134784

  1. Natural Language Processing Based Instrument for Classification of Free Text Medical Records

    PubMed Central

    2016-01-01

    According to the Ministry of Labor, Health and Social Affairs of Georgia a new health management system has to be introduced in the nearest future. In this context arises the problem of structuring and classifying documents containing all the history of medical services provided. The present work introduces the instrument for classification of medical records based on the Georgian language. It is the first attempt of such classification of the Georgian language based medical records. On the whole 24.855 examination records have been studied. The documents were classified into three main groups (ultrasonography, endoscopy, and X-ray) and 13 subgroups using two well-known methods: Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The results obtained demonstrated that both machine learning methods performed successfully, with a little supremacy of SVM. In the process of classification a “shrink” method, based on features selection, was introduced and applied. At the first stage of classification the results of the “shrink” case were better; however, on the second stage of classification into subclasses 23% of all documents could not be linked to only one definite individual subclass (liver or binary system) due to common features characterizing these subclasses. The overall results of the study were successful. PMID:27668260

  2. Automatic Classification of Medical Text: The Influence of Publication Form1

    PubMed Central

    Cole, William G.; Michael, Patricia A.; Stewart, James G.; Blois, Marsden S.

    1988-01-01

    Previous research has shown that within the domain of medical journal abstracts the statistical distribution of words is neither random nor uniform, but is highly characteristic. Many words are used mainly or solely by one medical specialty or when writing about one particular level of description. Due to this regularity of usage, automatic classification within journal abstracts has proved quite successful. The present research asks two further questions. It investigates whether this statistical regularity and automatic classification success can also be achieved in medical textbook chapters. It then goes on to see whether the statistical distribution found in textbooks is sufficiently similar to that found in abstracts to permit accurate classification of abstracts based solely on previous knowledge of textbooks. 14 textbook chapters and 45 MEDLINE abstracts were submitted to an automatic classification program that had been trained only on chapters drawn from a standard textbook series. Statistical analysis of the properties of abstracts vs. chapters revealed important differences in word use. Automatic classification performance was good for chapters, but poor for abstracts.

  3. A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.

    ERIC Educational Resources Information Center

    Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald

    2002-01-01

    Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)

  4. Shared Features of L2 Writing: Intergroup Homogeneity and Text Classification

    ERIC Educational Resources Information Center

    Crossley, Scott A.; McNamara, Danielle S.

    2011-01-01

    This study investigates intergroup homogeneity within high intermediate and advanced L2 writers of English from Czech, Finnish, German, and Spanish first language backgrounds. A variety of linguistic features related to lexical sophistication, syntactic complexity, and cohesion were used to compare texts written by L1 speakers of English to L2…

  5. Facility Detection and Popularity Assessment from Text Classification of Social Media and Crowdsourced Data

    SciTech Connect

    Sparks, Kevin A; Li, Roger G; Thakur, Gautam

    Advances in technology have continually progressed our understanding of where people are, how they use the environment around them, and why they are at their current location. Having a better knowledge of when various locations become popular through space and time could have large impacts on research fields like urban dynamics and energy consumption. In this paper, we discuss the ability to identify and locate various facility types (e.g. restaurant, airport, stadiums) using social media, and assess methods in determining when these facilities become popular over time. We use natural language processing tools and machine learning classifiers to interpret geotaggedmore » Twitter text and determine if a user is seemingly at a location of interest when the tweet was sent. On average our classifiers are approximately 85% accurate varying across multiple facility types, with a peak precision of 98%. By using these methods to classify unstructured text, geotagged social media data can be an extremely useful tool to better understanding the composition of places and how and when people use them.« less

  6. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm.

    PubMed

    Chapman, Brian E; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W

    2011-10-01

    In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams. Copyright © 2011 Elsevier Inc. All rights reserved.

  7. Document-Level Classification of CT Pulmonary Angiography Reports based on an Extension of the ConText Algorithm

    PubMed Central

    Chapman, Brian E.; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W.

    2011-01-01

    In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes’ classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes’ classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes’ classifier using bigrams. PMID:21459155

  8. Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry.

    PubMed

    AAlAbdulsalam, Abdulrahman K; Garvin, Jennifer H; Redd, Andrew; Carter, Marjorie E; Sweeny, Carol; Meystre, Stephane M

    2018-01-01

    Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%-98.4% and classification sensitivity: 83.5%-87%).

  9. Classification

    ERIC Educational Resources Information Center

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  10. Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry

    PubMed Central

    AAlAbdulsalam, Abdulrahman K.; Garvin, Jennifer H.; Redd, Andrew; Carter, Marjorie E.; Sweeny, Carol; Meystre, Stephane M.

    2018-01-01

    Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%–98.4% and classification sensitivity: 83.5%–87%). PMID:29888032

  11. Assessing Unmet Information Needs of Breast Cancer Survivors: Exploratory Study of Online Health Forums Using Text Classification and Retrieval.

    PubMed

    McRoy, Susan; Rastegar-Mojarad, Majid; Wang, Yanshan; Ruddy, Kathryn J; Haddad, Tufia C; Liu, Hongfang

    2018-05-15

    Patient education materials given to breast cancer survivors may not be a good fit for their information needs. Needs may change over time, be forgotten, or be misreported, for a variety of reasons. An automated content analysis of survivors' postings to online health forums can identify expressed information needs over a span of time and be repeated regularly at low cost. Identifying these unmet needs can guide improvements to existing education materials and the creation of new resources. The primary goals of this project are to assess the unmet information needs of breast cancer survivors from their own perspectives and to identify gaps between information needs and current education materials. This approach employs computational methods for content modeling and supervised text classification to data from online health forums to identify explicit and implicit requests for health-related information. Potential gaps between needs and education materials are identified using techniques from information retrieval. We provide a new taxonomy for the classification of sentences in online health forum data. 260 postings from two online health forums were selected, yielding 4179 sentences for coding. After annotation of data and training alternative one-versus-others classifiers, a random forest-based approach achieved F1 scores from 66% (Other, dataset2) to 90% (Medical, dataset1) on the primary information types. 136 expressions of need were used to generate queries to indexed education materials. Upon examination of the best two pages retrieved for each query, 12% (17/136) of queries were found to have relevant content by all coders, and 33% (45/136) were judged to have relevant content by at least one. Text from online health forums can be analyzed effectively using automated methods. Our analysis confirms that breast cancer survivors have many information needs that are not covered by the written documents they typically receive, as our results suggest that at most

  12. Assessing Unmet Information Needs of Breast Cancer Survivors: Exploratory Study of Online Health Forums Using Text Classification and Retrieval

    PubMed Central

    Rastegar-Mojarad, Majid; Wang, Yanshan; Ruddy, Kathryn J; Haddad, Tufia C; Liu, Hongfang

    2018-01-01

    Background Patient education materials given to breast cancer survivors may not be a good fit for their information needs. Needs may change over time, be forgotten, or be misreported, for a variety of reasons. An automated content analysis of survivors' postings to online health forums can identify expressed information needs over a span of time and be repeated regularly at low cost. Identifying these unmet needs can guide improvements to existing education materials and the creation of new resources. Objective The primary goals of this project are to assess the unmet information needs of breast cancer survivors from their own perspectives and to identify gaps between information needs and current education materials. Methods This approach employs computational methods for content modeling and supervised text classification to data from online health forums to identify explicit and implicit requests for health-related information. Potential gaps between needs and education materials are identified using techniques from information retrieval. Results We provide a new taxonomy for the classification of sentences in online health forum data. 260 postings from two online health forums were selected, yielding 4179 sentences for coding. After annotation of data and training alternative one-versus-others classifiers, a random forest-based approach achieved F1 scores from 66% (Other, dataset2) to 90% (Medical, dataset1) on the primary information types. 136 expressions of need were used to generate queries to indexed education materials. Upon examination of the best two pages retrieved for each query, 12% (17/136) of queries were found to have relevant content by all coders, and 33% (45/136) were judged to have relevant content by at least one. Conclusions Text from online health forums can be analyzed effectively using automated methods. Our analysis confirms that breast cancer survivors have many information needs that are not covered by the written documents they

  13. TEXT CLASSIFICATION FOR AUTOMATIC DETECTION OF E-CIGARETTE USE AND USE FOR SMOKING CESSATION FROM TWITTER: A FEASIBILITY PILOT.

    PubMed

    Aphinyanaphongs, Yin; Lulejian, Armine; Brown, Duncan Penfold; Bonneau, Richard; Krebs, Paul

    2016-01-01

    Rapid increases in e-cigarette use and potential exposure to harmful byproducts have shifted public health focus to e-cigarettes as a possible drug of abuse. Effective surveillance of use and prevalence would allow appropriate regulatory responses. An ideal surveillance system would collect usage data in real time, focus on populations of interest, include populations unable to take the survey, allow a breadth of questions to answer, and enable geo-location analysis. Social media streams may provide this ideal system. To realize this use case, a foundational question is whether we can detect e-cigarette use at all. This work reports two pilot tasks using text classification to identify automatically Tweets that indicate e-cigarette use and/or e-cigarette use for smoking cessation. We build and define both datasets and compare performance of 4 state of the art classifiers and a keyword search for each task. Our results demonstrate excellent classifier performance of up to 0.90 and 0.94 area under the curve in each category. These promising initial results form the foundation for further studies to realize the ideal surveillance solution.

  14. SYRIAC: The systematic review information automated collection system a data warehouse for facilitating automated biomedical text classification.

    PubMed

    Yang, Jianji J; Cohen, Aaron M; Cohen, Aaron; McDonagh, Marian S

    2008-11-06

    Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent.To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation data sets for SR text mining research.

  15. SYRIAC: The SYstematic Review Information Automated Collection System A Data Warehouse for Facilitating Automated Biomedical Text Classification

    PubMed Central

    Yang, Jianji J.; Cohen, Aaron M.; McDonagh, Marian S.

    2008-01-01

    Automatic document classification can be valuable in increasing the efficiency in updating systematic reviews (SR). In order for the machine learning process to work well, it is critical to create and maintain high-quality training datasets consisting of expert SR inclusion/exclusion decisions. This task can be laborious, especially when the number of topics is large and source data format is inconsistent. To approach this problem, we build an automated system to streamline the required steps, from initial notification of update in source annotation files to loading the data warehouse, along with a web interface to monitor the status of each topic. In our current collection of 26 SR topics, we were able to standardize almost all of the relevance judgments and recovered PMIDs for over 80% of all articles. Of those PMIDs, over 99% were correct in a manual random sample study. Our system performs an essential function in creating training and evaluation datasets for SR text mining research. PMID:18999194

  16. Negation handling in sentiment classification using rule-based adapted from Indonesian language syntactic for Indonesian text in Twitter

    NASA Astrophysics Data System (ADS)

    Amalia, Rizkiana; Arif Bijaksana, Moch; Darmantoro, Dhinta

    2018-03-01

    The presence of the word negation is able to change the polarity of the text if it is not handled properly it will affect the performance of the sentiment classification. Negation words in Indonesian are ‘tidak’, ‘bukan’, ‘belum’ and ‘jangan’. Also, there is a conjunction word that able to reverse the actual values, as the word ‘tetapi’, or ‘tapi’. Unigram has shortcomings in dealing with the existence of negation because it treats negation word and the negated words as separate words. A general approach for negation handling in English text gives the tag ‘NEG_’ for following words after negation until the first punctuation. But this may gives the tag to un-negated, and this approach does not handle negation and conjunction in one sentences. The rule-based method to determine what words negated by adapting the rules of Indonesian language syntactic of negation to determine the scope of negation was proposed in this study. With adapting syntactic rules and tagging “NEG_” using SVM classifier with RBF kernel has better performance results than the other experiments. Considering the average F1-score value, the performance of this proposed method can be improved against baseline equal to 1.79% (baseline without negation handling) and 5% (baseline with existing negation handling) for a dataset that all tweets contain negation words. And also for the second dataset that has the various number of negation words in document tweet. It can be improved against baseline at 2.69% (without negation handling) and 3.17% (with existing negation handling).

  17. Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles.

    PubMed

    Xu, Rong; Wang, QuanQiu

    2015-02-01

    Anticancer drug-associated side effect knowledge often exists in multiple heterogeneous and complementary data sources. A comprehensive anticancer drug-side effect (drug-SE) relationship knowledge base is important for computation-based drug target discovery, drug toxicity predication and drug repositioning. In this study, we present a two-step approach by combining table classification and relationship extraction to extract drug-SE pairs from a large number of high-profile oncological full-text articles. The data consists of 31,255 tables downloaded from the Journal of Oncology (JCO). We first trained a statistical classifier to classify tables into SE-related and -unrelated categories. We then extracted drug-SE pairs from SE-related tables. We compared drug side effect knowledge extracted from JCO tables to that derived from FDA drug labels. Finally, we systematically analyzed relationships between anti-cancer drug-associated side effects and drug-associated gene targets, metabolism genes, and disease indications. The statistical table classifier is effective in classifying tables into SE-related and -unrelated (precision: 0.711; recall: 0.941; F1: 0.810). We extracted a total of 26,918 drug-SE pairs from SE-related tables with a precision of 0.605, a recall of 0.460, and a F1 of 0.520. Drug-SE pairs extracted from JCO tables is largely complementary to those derived from FDA drug labels; as many as 84.7% of the pairs extracted from JCO tables have not been included a side effect database constructed from FDA drug labels. Side effects associated with anticancer drugs positively correlate with drug target genes, drug metabolism genes, and disease indications. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. Classification

    NASA Astrophysics Data System (ADS)

    Oza, Nikunj

    2012-03-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. A set of training examples— examples with known output values—is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate’s measurements. The generalization performance of a learned model (how closely the target outputs and the model’s predicted outputs agree for patterns that have not been presented to the learning algorithm) would provide an indication of how well the model has learned the desired mapping. More formally, a classification learning algorithm L takes a training set T as its input. The training set consists of |T| examples or instances. It is assumed that there is a probability distribution D from which all training examples are drawn independently—that is, all the training examples are independently and identically distributed (i.i.d.). The ith training example is of the form (x_i, y_i), where x_i is a vector of values of several features and y_i represents the class to be predicted.* In the sunspot classification example given above, each training example

  19. Classification

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2011-01-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. In supervised learning, a set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. This chapter discusses methods to perform machine learning, with examples involving astronomy.

  20. Five-way smoking status classification using text hot-spot identification and error-correcting output codes.

    PubMed

    Cohen, Aaron M

    2008-01-01

    We participated in the i2b2 smoking status classification challenge task. The purpose of this task was to evaluate the ability of systems to automatically identify patient smoking status from discharge summaries. Our submission included several techniques that we compared and studied, including hot-spot identification, zero-vector filtering, inverse class frequency weighting, error-correcting output codes, and post-processing rules. We evaluated our approaches using the same methods as the i2b2 task organizers, using micro- and macro-averaged F1 as the primary performance metric. Our best performing system achieved a micro-F1 of 0.9000 on the test collection, equivalent to the best performing system submitted to the i2b2 challenge. Hot-spot identification, zero-vector filtering, classifier weighting, and error correcting output coding contributed additively to increased performance, with hot-spot identification having by far the largest positive effect. High performance on automatic identification of patient smoking status from discharge summaries is achievable with the efficient and straightforward machine learning techniques studied here.

  1. Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents.

    PubMed

    Agnihotri, Deepak; Verma, Kesari; Tripathi, Priyanka

    2016-01-01

    The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature selection method named as the symmetrical strength of the N-Grams (SSNG) is proposed using a two pass filtering based feature selection (TPF) approach. Initially, in the first pass of the TPF, the SSNG method chooses various informative N-Grams from the entire extracted N-Grams of the corpus. Subsequently, in the second pass the well-known Chi Square (χ(2)) method is being used to select few most informative N-Grams. Further, to classify the documents the two standard classifiers Multinomial Naive Bayes and Linear Support Vector Machine have been applied on the ten standard text data sets. In most of the datasets, the experimental results state the performance and success rate of SSNG method using TPF approach is superior to the state-of-the-art methods viz. Mutual Information, Information Gain, Odds Ratio, Discriminating Feature Selection and χ(2).

  2. The contribution of the vaccine adverse event text mining system to the classification of possible Guillain-Barré syndrome reports.

    PubMed

    Botsis, T; Woo, E J; Ball, R

    2013-01-01

    We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority.

  3. The Contribution of the Vaccine Adverse Event Text Mining System to the Classification of Possible Guillain-Barré Syndrome Reports

    PubMed Central

    Botsis, T.; Woo, E. J.; Ball, R.

    2013-01-01

    Background We previously demonstrated that a general purpose text mining system, the Vaccine adverse event Text Mining (VaeTM) system, could be used to automatically classify reports of an-aphylaxis for post-marketing safety surveillance of vaccines. Objective To evaluate the ability of VaeTM to classify reports to the Vaccine Adverse Event Reporting System (VAERS) of possible Guillain-Barré Syndrome (GBS). Methods We used VaeTM to extract the key diagnostic features from the text of reports in VAERS. Then, we applied the Brighton Collaboration (BC) case definition for GBS, and an information retrieval strategy (i.e. the vector space model) to quantify the specific information that is included in the key features extracted by VaeTM and compared it with the encoded information that is already stored in VAERS as Medical Dictionary for Regulatory Activities (MedDRA) Preferred Terms (PTs). We also evaluated the contribution of the primary (diagnosis and cause of death) and secondary (second level diagnosis and symptoms) diagnostic VaeTM-based features to the total VaeTM-based information. Results MedDRA captured more information and better supported the classification of reports for GBS than VaeTM (AUC: 0.904 vs. 0.777); the lower performance of VaeTM is likely due to the lack of extraction by VaeTM of specific laboratory results that are included in the BC criteria for GBS. On the other hand, the VaeTM-based classification exhibited greater specificity than the MedDRA-based approach (94.96% vs. 87.65%). Most of the VaeTM-based information was contained in the secondary diagnostic features. Conclusion For GBS, clinical signs and symptoms alone are not sufficient to match MedDRA coding for purposes of case classification, but are preferred if specificity is the priority. PMID:23650490

  4. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

    PubMed Central

    2011-01-01

    Background Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them. Results A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were

  5. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.

    PubMed

    Krallinger, Martin; Vazquez, Miguel; Leitner, Florian; Salgado, David; Chatr-Aryamontri, Andrew; Winter, Andrew; Perfetto, Livia; Briganti, Leonardo; Licata, Luana; Iannuccelli, Marta; Castagnoli, Luisa; Cesareni, Gianni; Tyers, Mike; Schneider, Gerold; Rinaldi, Fabio; Leaman, Robert; Gonzalez, Graciela; Matos, Sergio; Kim, Sun; Wilbur, W John; Rocha, Luis; Shatkay, Hagit; Tendulkar, Ashish V; Agarwal, Shashank; Liu, Feifan; Wang, Xinglong; Rak, Rafal; Noto, Keith; Elkan, Charles; Lu, Zhiyong; Dogan, Rezarta Islamaj; Fontaine, Jean-Fred; Andrade-Navarro, Miguel A; Valencia, Alfonso

    2011-10-03

    Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them. A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89% and the best AUC iP/R was 68%. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing

  6. Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes.

    PubMed

    Marafino, Ben J; Boscardin, W John; Dudley, R Adams

    2015-04-01

    Sparsity is often a desirable property of statistical models, and various feature selection methods exist so as to yield sparser and interpretable models. However, their application to biomedical text classification, particularly to mortality risk stratification among intensive care unit (ICU) patients, has not been thoroughly studied. To develop and characterize sparse classifiers based on the free text of nursing notes in order to predict ICU mortality risk and to discover text features most strongly associated with mortality. We selected nursing notes from the first 24h of ICU admission for 25,826 adult ICU patients from the MIMIC-II database. We then developed a pair of stochastic gradient descent-based classifiers with elastic-net regularization. We also studied the performance-sparsity tradeoffs of both classifiers as their regularization parameters were varied. The best-performing classifier achieved a 10-fold cross-validated AUC of 0.897 under the log loss function and full L2 regularization, while full L1 regularization used just 0.00025% of candidate input features and resulted in an AUC of 0.889. Using the log loss (range of AUCs 0.889-0.897) yielded better performance compared to the hinge loss (0.850-0.876), but the latter yielded even sparser models. Most features selected by both classifiers appear clinically relevant and correspond to predictors already present in existing ICU mortality models. The sparser classifiers were also able to discover a number of informative - albeit nonclinical - features. The elastic-net-regularized classifiers perform reasonably well and are capable of reducing the number of features required by over a thousandfold, with only a modest impact on performance. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit.

    PubMed

    Marafino, Ben J; Davies, Jason M; Bardach, Naomi S; Dean, Mitzi L; Dudley, R Adams

    2014-01-01

    Existing risk adjustment models for intensive care unit (ICU) outcomes rely on manual abstraction of patient-level predictors from medical charts. Developing an automated method for abstracting these data from free text might reduce cost and data collection times. To develop a support vector machine (SVM) classifier capable of identifying a range of procedures and diagnoses in ICU clinical notes for use in risk adjustment. We selected notes from 2001-2008 for 4191 neonatal ICU (NICU) and 2198 adult ICU patients from the MIMIC-II database from the Beth Israel Deaconess Medical Center. Using these notes, we developed an implementation of the SVM classifier to identify procedures (mechanical ventilation and phototherapy in NICU notes) and diagnoses (jaundice in NICU and intracranial hemorrhage (ICH) in adult ICU). On the jaundice classification task, we also compared classifier performance using n-gram features to unigrams with application of a negation algorithm (NegEx). Our classifier accurately identified mechanical ventilation (accuracy=0.982, F1=0.954) and phototherapy use (accuracy=0.940, F1=0.912), as well as jaundice (accuracy=0.898, F1=0.884) and ICH diagnoses (accuracy=0.938, F1=0.943). Including bigram features improved performance on the jaundice (accuracy=0.898 vs 0.865) and ICH (0.938 vs 0.927) tasks, and outperformed NegEx-derived unigram features (accuracy=0.898 vs 0.863) on the jaundice task. Overall, a classifier using n-gram support vectors displayed excellent performance characteristics. The classifier generalizes to diverse patient populations, diagnoses, and procedures. SVM-based classifiers can accurately identify procedure status and diagnoses among ICU patients, and including n-gram features improves performance, compared to existing methods. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  8. Development of a database-driven system for simulating water temperature in the lower Yakima River main stem, Washington, for various climate scenarios

    USGS Publications Warehouse

    Voss, Frank; Maule, Alec

    2013-01-01

    A model for simulating daily maximum and mean water temperatures was developed by linking two existing models: one developed by the U.S. Geological Survey and one developed by the Bureau of Reclamation. The study area included the lower Yakima River main stem between the Roza Dam and West Richland, Washington. To automate execution of the labor-intensive models, a database-driven model automation program was developed to decrease operation costs, to reduce user error, and to provide the capability to perform simulations quickly for multiple management and climate change scenarios. Microsoft© SQL Server 2008 R2 Integration Services packages were developed to (1) integrate climate, flow, and stream geometry data from diverse sources (such as weather stations, a hydrologic model, and field measurements) into a single relational database; (2) programmatically generate heavily formatted model input files; (3) iteratively run water temperature simulations; (4) process simulation results for export to other models; and (5) create a database-driven infrastructure that facilitated experimentation with a variety of scenarios, node permutations, weather data, and hydrologic conditions while minimizing costs of running the model with various model configurations. As a proof-of-concept exercise, water temperatures were simulated for a "Current Conditions" scenario, where local weather data from 1980 through 2005 were used as input, and for "Plus 1" and "Plus 2" climate warming scenarios, where the average annual air temperatures used in the Current Conditions scenario were increased by 1degree Celsius (°C) and by 2°C, respectively. Average monthly mean daily water temperatures simulated for the Current Conditions scenario were compared to measured values at the Bureau of Reclamation Hydromet gage at Kiona, Washington, for 2002-05. Differences ranged between 1.9° and 1.1°C for February, March, May, and June, and were less than 0.8°C for the remaining months of the year

  9. Utilising database-driven interactive software to enhance independent home-study in a flipped classroom setting: going beyond visualising engineering concepts to ensuring formative assessment

    NASA Astrophysics Data System (ADS)

    Comerford, Liam; Mannis, Adam; DeAngelis, Marco; Kougioumtzoglou, Ioannis A.; Beer, Michael

    2018-07-01

    The concept of formative assessment is considered by many to play an important role in enhancing teaching in higher engineering education. In this paper, the concept of the flipped classroom as part of a blended learning curriculum is highlighted as an ideal medium through which formative assessment practices arise. Whilst the advantages of greater interaction between students and lecturers in classes are numerous, there are often clear disadvantages associated with the independent home-study component that complements timetabled sessions in a flipped classroom setting, specifically, the popular method of replacing traditional classroom teaching with video lectures. This leads to a clear lack of assurances that the cited benefits of a flipped classroom approach are echoed in the home-study arena. Over the past three years, the authors have sought to address identified deficiencies in this area of blended learning through the development of database-driven e-learning software with the capability of introducing formative assessment practices to independent home-study. This paper maps out aspects of two specific evolving practices at separate institutions, from which guiding principles of incorporating formative assessment aspects into e-learning software are identified and highlighted in the context of independent home-study as part of a flipped classroom approach.

  10. Role of a database-driven web site in the immediate disaster response and recovery of Academic Health Center: the Katrina experience.

    PubMed

    Fordis, Michael; Alexander, J Douglas; McKellar, Julie

    2007-08-01

    In the wake of Hurricane Katrina's landfall on August 29, 2005, and the subsequent levee failures, operations of Tulane University School of Medicine became unsustainable. As New Orleans collapsed, faculty, students, residents, and staff were scattered nationwide. In response, four Texas medical schools created an alliance to assist Tulane in temporarily relocating operations to south Texas. Resuming operations in a three- to four-week time span required developing and implementing a coordinated communication plan in the face of widespread communication infrastructure disruptions. A keystone of the strategy involved rapidly creating a "recovery Web site" to provide essential information on immediate recovery plans, mechanisms for reestablishing communications with displaced persons, housing relocation options (over 200 students, faculty, and staff were relocated using Web site resources), classes and residency training, and other issues (e.g., financial services, counseling support) vitally important to affected individuals. The database-driven Web site was launched in four days on September 11, 2005, by modifying an existing system and completing new programming. Additional functions were added during the next week, and the site operated continuously until March 2006, providing about 890,000 pages of information in over 100,000 visitor sessions. The site proved essential in disseminating announcements, reestablishing communications among the Tulane family, and supporting relocation and recovery. This experience shows the importance of information technology in collaborative efforts of academic health centers in early disaster response and recovery, reinforcing recommendations published recently by the Association of Academic Health Centers and the National Academy of Sciences.

  11. A Preliminary Statistical Investigation into the Impace of an N-Gram Analysis Approach Based on World Syntactic Categories Toward Text Author Classification

    DTIC Science & Technology

    2000-06-01

    of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources...words: N-gram, Shakespeare , Middleton, Wardigo, Funeral Elegy, Author Classification Introduction Literary experts refer to the style of the ...signed W. S. at the time of the investigation; the jury was still out as to the identity of its author. It has been noted as of late

  12. Design and Prototype Implementation of non-Triggered Database-driven Real-time Tsunami Forecast System using Multi-index Method

    NASA Astrophysics Data System (ADS)

    Yamamoto, N.; Aoi, S.; Suzuki, W.; Hirata, K.; Takahashi, N.; Kunugi, T.; Nakamura, H.

    2016-12-01

    We have launched a new project to develop real-time tsunami inundation forecast system for the Pacific coast of Chiba prefecture (Kujukuri-Sotobo region), Japan (Aoi et al., 2015, AGU). In this study, we design a database-driven real-time tsunami forecast system using the multi-index method (Yamamoto et al., 2016, EPS) and implement a prototype system. In the previous study (Yamamoto et al., 2015, AGU), we assumed that the origin-time of tsunami was known before a forecast based on comparing observed and calculated ocean-bottom pressure waveforms stored in the Tsunami Scenario Bank (TSB). As shown in the figure, we assume the scenario origin-times by defining the scenario elapsed timeτp to compare observed and calculated waveforms. In this design, when several appropriate tsunami scenarios were selected by multiple indices (two variance reductions and correlation coefficient), the system could make tsunami forecast using the selected tsunami scenarios for the target coastal region without any triggered information derived from observed seismic and/or tsunami data. In addition, we define the time range Tq shown in the figure for masking perturbations contaminated by ocean-acoustic and seismic waves on the observed pressure records (Saito, 2015, JpGU). Following the proposed design, we implement a prototype system of real-time tsunami inundation forecast system for the exclusive use of the target coastal region using ocean-bottom pressure data from the Seafloor Observation Network for Earthquakes and Tsunamis along the Japan Trench (S-net) (Kanazawa et al., 2012, JpGU; Uehira et al., 2015, IUGG), which is constructed by National Research institute for Earth Science and Disaster Resilience (NIED). For the prototype system, we construct a prototype TSB using interplate earthquake fault models located along the Japan Trench (Mw 7.6-9.8), the Sagami Trough (Mw 7.6-8.6), and the Nankai Trough (Mw 7.6-8.6) as well as intraplate earthquake fault models (Mw 7.6-8.6) within

  13. Text Mining.

    ERIC Educational Resources Information Center

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  14. Text Sets.

    ERIC Educational Resources Information Center

    Giorgis, Cyndi; Johnson, Nancy J.

    2002-01-01

    Presents annotations of approximately 30 titles grouped in text sets. Defines a text set as five to ten books on a particular topic or theme. Discusses books on the following topics: living creatures; pirates; physical appearance; natural disasters; and the Irish potato famine. (SG)

  15. Text Prep

    ERIC Educational Resources Information Center

    Buehl, Doug

    2017-01-01

    To understand complex disciplinary texts, students need to possess a rich store of background knowledge. But what happens if students don't have that knowledge? In this article, Doug Buehl explores frontloading strategies that can bridge the gap between what students know and what they need to know to comprehend a disciplinary text. He outlines…

  16. Reorganized text.

    PubMed

    2015-05-01

    Reorganized Text: In the Original Investigation titled “Patterns of Hospital Utilization for Head and Neck Cancer Care: Changing Demographics” posted online in the January 29, 2015, issue of JAMA Otolaryngology–Head & Neck Surgery (doi:10.1001 /jamaoto.2014.3603), information was copied within sections and text rearranged to accommodate Continuing Medical Education quiz formatting. The information from the topic statements of each paragraph in the Hypothesis Testing subsection of the Methods section was collected in a new first paragraph for that subsection, which reads as follows: “Several hypotheses regarding the causes of regionalization of HNCA care were tested using the NIS data: (1) increasing patient comorbidities over time, causing a shift in care to teaching institutions that would theoretically be better equipped to handle such increased comorbidities; (2) shifting of payer status; (3) increased proportion of prior radiation therapy; and (4) a higher fraction of more complex procedures being referred and performed at teaching institutions.” In addition, the phrase "As summarized in Table3," was added to the beginning of paragraph 6 of the Discussion section, and the call-out to Table 3 in the middle of that paragraph was deleted. Finally, paragraphs 6 and 7 of the Discussion section were combined.

  17. Database-Driven Analyses of Astronomical Spectra

    NASA Astrophysics Data System (ADS)

    Cami, Jan

    2012-03-01

    Spectroscopy is one of the most powerful tools to study the physical properties and chemical composition of very diverse astrophysical environments. In principle, each nuclide has a unique set of spectral features; thus, establishing the presence of a specific material at astronomical distances requires no more than finding a laboratory spectrum of the right material that perfectly matches the astronomical observations. Once the presence of a substance is established, a careful analysis of the observational characteristics (wavelengths or frequencies, intensities, and line profiles) allows one to determine many physical parameters of the environment in which the substance resides, such as temperature, density, velocity, and so on. Because of this great diagnostic potential, ground-based and space-borne astronomical observatories often include instruments to carry out spectroscopic analyses of various celestial objects and events. Of particular interest is molecular spectroscopy at infrared wavelengths. From the spectroscopic point of view, molecules differ from atoms in their ability to vibrate and rotate, and quantum physics inevitably causes those motions to be quantized. The energies required to excite vibrations or rotations are such that vibrational transitions generally occur at infrared wavelengths, whereas pure rotational transitions typically occur at sub-mm wavelengths. Molecular vibration and rotation are coupled though, and thus at infrared wavelengths, one commonly observes a multitude of ro-vibrational transitions (see Figure 13.1). At lower spectral resolution, all transitions blend into one broad ro-vibrational molecular band. The isotope. Molecular spectroscopy thus allows us to see a difference of one neutron in an atomic nucleus that is located at astronomical distances! Since the detection of the first interstellar molecules (the CH [21] and CN [14] radicals), more than 150 species have been detected in space, ranging in size from diatomic species to the fullerene species C60 and C70 [4]. Given the large number and variety of molecules detected in space, molecular infrared spectroscopy can be used to study pretty much any astrophysical environment that is not too energetic to dissociate the molecules. At the lowest energies, it is interesting to note that molecules such as CN have been used to measure the temperature of the Cosmic Microwave Background (see e.g., Ref. 15). The great diagnostic potential of infrared molecular spectroscopy comes at a price though. Extracting the physical parameters from the observations requires expertise in knowing how various physical processes and instrumental characteristics play together in producing the observed spectra. In addition to the astronomical aspects, this often includes interpreting and understanding the limitations of laboratory data and quantum-chemical calculations; the study of the interaction of matter with radiation at microscopic scales (called radiative transfer, akin to ray tracing) and the effects of observing (e.g., smoothing and resampling) on the resulting spectra and possible instrumental effects (e.g., fringes). All this is not trivial. To make matters worse, observational spectra often contain many components, and might include spectral contributions stemming from very different physical conditions. Fully analyzing such observations is thus a time-consuming task that requires mastery of several techniques. And with ever-increasing rates of observational data acquisition, it seems clear that in the near future, some form of automation is required to handle the data stream. It is thus appealing to consider what part of such analyses could be done without too much human intervention. Two different aspects can be separated: the first step involves simply identifying the molecular species present in the observations. Once the molecular inventory is known, we can try to extract the physical parameters from the observed spectral properties. For both steps, good databases of molecular spectroscopic information is vital; the second step furthermore requires a good understanding of how these spectral properties change for each of the physical parameters of interest. Here, we will first present a general strategy using a convenient and powerful computational tool. We will then show two different, specific examples of analyses where some fundamental spectroscopic information is available in (a) database(s), and where that information has been successfully used to analyze observations. Whereas these examples do not directly address the question of automated analyses, they offer some insight into the possibilities and difficulties one can expect to encounter on this quest. In Section 13.2, we first describe the general formalism and method that we will be using, and we offer a few general considerations about what is involved in modeling. In Section 13.3, we illustrate our method for the case of polycyclic aromatic hydrocarbons (PAHs). In Section 13.4,we show how the same approach can be slightly modified to extract physical parameters from infrared observations of red giant stars. We then come back to the question of automated spectral analyses in Section 13.5. precise shape of molecular bands depends primarily on the temperature of the gas, and infrared observations of molecular bands can thus be used to probe gas at temperatures as low as 10 K and up to a few thousand K. Similarly, molecular band shapes contain some information about the gas densities. The resulting ro-vibrational spectrum is truly unique for each molecular species, just as is the case for atoms.As a telling example, Figure 13.2 illustrates how we can even discriminate between different isotopologues - two species with the same chemical formula and structure, but where one (or more) of the atoms is a different

  18. Text Generation: The Problem of Text Structure.

    ERIC Educational Resources Information Center

    Mann, William C.

    One of the major problems in artificial intelligence (AI) text generation is text organization; a poorly organized text can be unreadable or even misleading. A comparison of two AI approaches to text organization--McKeown's TEXT system and Rhetorical Structure Theory (RST)--shows that, although they share many assumptions about the nature of text,…

  19. Directed Activities Related to Text: Text Analysis and Text Reconstruction.

    ERIC Educational Resources Information Center

    Davies, Florence; Greene, Terry

    This paper describes Directed Activities Related to Text (DART), procedures that were developed and are used in the Reading for Learning Project at the University of Nottingham (England) to enhance learning from texts and that fall into two broad categories: (1) text analysis procedures, which require students to engage in some form of analysis of…

  20. On Dataless Hierarchical Text Classification (Author’s Manuscript)

    DTIC Science & Technology

    2014-07-27

    compound talk.politics.mideast politics mideast israel arab jews jewish muslim talk.politics.misc politics gay homosexual sexual alt.atheism atheism...tion in NLP tasks; it was further used in several NLP works, such as by Liang (2005), to measure words’ distributional similarity. This method...embedding trained by neural networks has been used widely in the NLP community and has become a hot trend recently. In this pa- per, we test the suitability

  1. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification

    DTIC Science & Technology

    1999-05-17

    Experimental Results In this section, we compare kNN -mut which uses the weight vector obtained using mutual information as the fi- nal weight vector and...WAKNN against kNN , C4.5 [Qui93], RIPPER [Coh95], PEBLS [CS93], Rainbow [McC96], VSM [Low95] on several synthetic and real data sets. VSM is another k...obtained without this option. 3 C4.5 RIPPER PEBLS Rainbow kNN WAKNN Syn-1 100.0 100.0 100.0 100.0 77.3 100.0 Syn-2 67.5 69.5 62.0 50.0 66.0 68.8 Syn

  2. Subject Classification.

    ERIC Educational Resources Information Center

    Thompson, Gayle; And Others

    Three newspaper librarians described how they manage the files of newspaper clippings which are a necessary part of their collections. The development of a new subject classification system for the clippings files was outlined. The new subject headings were based on standard subject heading lists and on local need. It was decided to use a computer…

  3. Text Mining in Organizational Research

    PubMed Central

    Kobayashi, Vladimer B.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.

    2017-01-01

    Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies. PMID:29881248

  4. Text Mining in Organizational Research.

    PubMed

    Kobayashi, Vladimer B; Mol, Stefan T; Berkers, Hannah A; Kismihók, Gábor; Den Hartog, Deanne N

    2018-07-01

    Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.

  5. Contextual Text Mining

    ERIC Educational Resources Information Center

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  6. Vietnamese Document Representation and Classification

    NASA Astrophysics Data System (ADS)

    Nguyen, Giang-Son; Gao, Xiaoying; Andreae, Peter

    Vietnamese is very different from English and little research has been done on Vietnamese document classification, or indeed, on any kind of Vietnamese language processing, and only a few small corpora are available for research. We created a large Vietnamese text corpus with about 18000 documents, and manually classified them based on different criteria such as topics and styles, giving several classification tasks of different difficulty levels. This paper introduces a new syllable-based document representation at the morphological level of the language for efficient classification. We tested the representation on our corpus with different classification tasks using six classification algorithms and two feature selection techniques. Our experiments show that the new representation is effective for Vietnamese categorization, and suggest that best performance can be achieved using syllable-pair document representation, an SVM with a polynomial kernel as the learning algorithm, and using Information gain and an external dictionary for feature selection.

  7. Palliative Care Texts

    MedlinePlus

    ... page: https://medlineplus.gov/palliativecaretexts.html Palliative Care Texts To use the sharing features on this page, please enable JavaScript. Free text messages to support you and your family during ...

  8. Text Coherence in Translation

    ERIC Educational Resources Information Center

    Zheng, Yanping

    2009-01-01

    In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can…

  9. Creating Vocative Texts

    ERIC Educational Resources Information Center

    Nicol, Jennifer J.

    2008-01-01

    Vocative texts are expressive poetic texts that strive to show rather than tell, that communicate felt knowledge, and that appeal to the senses. They are increasingly used by researchers to present qualitative findings, but little has been written about how to create such texts. To this end, excerpts from an inquiry into the experience and meaning…

  10. Classifications for Cesarean Section: A Systematic Review

    PubMed Central

    Torloni, Maria Regina; Betran, Ana Pilar; Souza, Joao Paulo; Widmer, Mariana; Allen, Tomas; Gulmezoglu, Metin; Merialdi, Mario

    2011-01-01

    Background Rising cesarean section (CS) rates are a major public health concern and cause worldwide debates. To propose and implement effective measures to reduce or increase CS rates where necessary requires an appropriate classification. Despite several existing CS classifications, there has not yet been a systematic review of these. This study aimed to 1) identify the main CS classifications used worldwide, 2) analyze advantages and deficiencies of each system. Methods and Findings Three electronic databases were searched for classifications published 1968–2008. Two reviewers independently assessed classifications using a form created based on items rated as important by international experts. Seven domains (ease, clarity, mutually exclusive categories, totally inclusive classification, prospective identification of categories, reproducibility, implementability) were assessed and graded. Classifications were tested in 12 hypothetical clinical case-scenarios. From a total of 2948 citations, 60 were selected for full-text evaluation and 27 classifications identified. Indications classifications present important limitations and their overall score ranged from 2–9 (maximum grade = 14). Degree of urgency classifications also had several drawbacks (overall scores 6–9). Woman-based classifications performed best (scores 5–14). Other types of classifications require data not routinely collected and may not be relevant in all settings (scores 3–8). Conclusions This review and critical appraisal of CS classifications is a methodologically sound contribution to establish the basis for the appropriate monitoring and rational use of CS. Results suggest that women-based classifications in general, and Robson's classification, in particular, would be in the best position to fulfill current international and local needs and that efforts to develop an internationally applicable CS classification would be most appropriately placed in building upon this

  11. The Perfect Text.

    ERIC Educational Resources Information Center

    Russo, Ruth

    1998-01-01

    A chemistry teacher describes the elements of the ideal chemistry textbook. The perfect text is focused and helps students draw a coherent whole out of the myriad fragments of information and interpretation. The text would show chemistry as the central science necessary for understanding other sciences and would also root chemistry firmly in the…

  12. Composing Texts, Composing Lives.

    ERIC Educational Resources Information Center

    Perl, Sondra

    1994-01-01

    Using composition, reader response, critical, and feminist theories, a teacher demonstrates how adult students respond critically to literary texts and how teachers must critically analyze the texts of their teaching practice. Both students and teachers can use writing to bring their experiences to interpretation. (SK)

  13. Text File Comparator

    NASA Technical Reports Server (NTRS)

    Kotler, R. S.

    1983-01-01

    File Comparator program IFCOMP, is text file comparator for IBM OS/VScompatable systems. IFCOMP accepts as input two text files and produces listing of differences in pseudo-update form. IFCOMP is very useful in monitoring changes made to software at the source code level.

  14. Making Sense of Texts

    ERIC Educational Resources Information Center

    Harper, Rebecca G.

    2014-01-01

    This article addresses the triadic nature regarding meaning construction of texts. Grounded in Rosenblatt's (1995; 1998; 2004) Transactional Theory, research conducted in an undergraduate Language Arts curriculum course revealed that when presented with unfamiliar texts, students used prior experiences, social interactions, and literary strategies…

  15. Texting "boosts" felt security.

    PubMed

    Otway, Lorna J; Carnelley, Katherine B; Rowe, Angela C

    2014-01-01

    Attachment security can be induced in laboratory settings (e.g., Rowe & Carnelley, 2003) and the beneficial effects of repeated security priming can last for a number of days (e.g., Carnelley & Rowe, 2007). The priming process, however, can be costly in terms of time. We explored the effectiveness of security priming via text message. Participants completed a visualisation task (a secure attachment experience or neutral experience) in the laboratory. On three consecutive days following the laboratory task, participants received (secure or neutral) text message visualisation tasks. Participants in the secure condition reported significantly higher felt security than those in the neutral condition, immediately after the laboratory prime, after the last text message prime and one day after the last text prime. These findings suggest that security priming via text messages is an innovative methodological advancement that effectively induces felt security, representing a potential direction forward for security priming research.

  16. Classification in Australia.

    ERIC Educational Resources Information Center

    McKinlay, John

    Despite some inroads by the Library of Congress Classification and short-lived experimentation with Universal Decimal Classification and Bliss Classification, Dewey Decimal Classification, with its ability in recent editions to be hospitable to local needs, remains the most widely used classification system in Australia. Although supplemented at…

  17. EST: Evading Scientific Text.

    ERIC Educational Resources Information Center

    Ward, Jeremy

    2001-01-01

    Examines chemical engineering students' attitudes to text and other parts of English language textbooks. A questionnaire was administered to a group of undergraduates. Results reveal one way students get around the problem of textbook reading. (Author/VWL)

  18. Noisy text categorization.

    PubMed

    Vinciarelli, Alessandro

    2005-12-01

    This work presents categorization experiments performed over noisy texts. By noisy, we mean any text obtained through an extraction process (affected by errors) from media other than digital texts (e.g., transcriptions of speech recordings extracted with a recognition system). The performance of a categorization system over the clean and noisy (Word Error Rate between approximately 10 and approximately 50 percent) versions of the same documents is compared. The noisy texts are obtained through handwriting recognition and simulation of optical character recognition. The results show that the performance loss is acceptable for Recall values up to 60-70 percent depending on the noise sources. New measures of the extraction process performance, allowing a better explanation of the categorization results, are proposed.

  19. Machine Translation from Text

    NASA Astrophysics Data System (ADS)

    Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John

    Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.

  20. Remote Sensing Information Classification

    NASA Technical Reports Server (NTRS)

    Rickman, Douglas L.

    2008-01-01

    This viewgraph presentation reviews the classification of Remote Sensing data in relation to epidemiology. Classification is a way to reduce the dimensionality and precision to something a human can understand. Classification changes SCALAR data into NOMINAL data.

  1. Text Exchange System

    NASA Technical Reports Server (NTRS)

    Snyder, W. V.; Hanson, R. J.

    1986-01-01

    Text Exchange System (TES) exchanges and maintains organized textual information including source code, documentation, data, and listings. System consists of two computer programs and definition of format for information storage. Comprehensive program used to create, read, and maintain TES files. TES developed to meet three goals: First, easy and efficient exchange of programs and other textual data between similar and dissimilar computer systems via magnetic tape. Second, provide transportable management system for textual information. Third, provide common user interface, over wide variety of computing systems, for all activities associated with text exchange.

  2. Aphasia and text writing.

    PubMed

    Behrns, Ingrid; Ahlsén, Elisabeth; Wengelin, Asa

    2010-01-01

    Good writing skills are needed in almost every aspect of life today, and there is a growing interest in research into acquired writing difficulties. Most of the findings reported so far, however, are based on words produced in isolation. The present study deals with the production of entire texts. The aim was to characterize written narratives produced by a group of participants with aphasia. Eight persons aged 28-63 years with aphasia took part in the study. They were compared with a reference group consisting of ten participants aged 21-30 years. All participants were asked to write a personal narrative titled 'I have never been so afraid' and to perform a picture-based story-generation task called the 'Frog Story'. The texts were written on a computer. The group could be divided into participants with low, moderate, and high general performance, respectively. The texts written by the participants in the group with moderate and high writing performance had comparatively good narrative structure despite indications of difficulties on other linguistic levels. Aphasia appeared to influence text writing on different linguistic levels. The impact on overall structure and coherence was in line with earlier findings from the analysis of spoken and written discourse and the implication of this is that the written modality should also be included in language rehabilitation. 2010 Royal College of Speech & Language Therapists.

  3. Summarizing Expository Texts

    ERIC Educational Resources Information Center

    Westby, Carol; Culatta, Barbara; Lawrence, Barbara; Hall-Kenyon, Kendra

    2010-01-01

    Purpose: This article reviews the literature on students' developing skills in summarizing expository texts and describes strategies for evaluating students' expository summaries. Evaluation outcomes are presented for a professional development project aimed at helping teachers develop new techniques for teaching summarization. Methods: Strategies…

  4. Taming the Wild Text

    ERIC Educational Resources Information Center

    Allyn, Pam

    2012-01-01

    As a well-known advocate for promoting wider reading and reading engagement among all children--and founder of a reading program for foster children--Pam Allyn knows that struggling readers often face any printed text with fear and confusion, like Max in the book Where the Wild Things Are. She argues that teachers need to actively create a…

  5. Owning Corporate Texts.

    ERIC Educational Resources Information Center

    Winsor, Dorothy A.

    1993-01-01

    Examines the relationship between technical writers and their texts. Suggests the amount of ownership any writer has varies depending on context; therefore, in technical writing, the more a document represents an organization, the less likely the words and ideas are to be solely under the control of the writer. (NH)

  6. Teaching Expository Text Structures

    ERIC Educational Resources Information Center

    Montelongo, Jose; Berber-Jimenez, Lola; Hernandez, Anita C.; Hosking, David

    2006-01-01

    Many students enter high school unskilled in the art of reading to learn from science textbooks. Even students who can read full-length novels often find science books difficult to read because students have relatively little practice with the various types of expository text structures used by such textbooks (Armbruster, 1991). Expository text…

  7. Processing of Text.

    ERIC Educational Resources Information Center

    Arkansas State Dept. of Education, Little Rock.

    This curriculum guide for schools in Arkansas provides course outlines in the general subject area of text processing. Each course outline indicates grade level and prerequisites, and provides a brief description of course goals. In addition, a list of skills is included for each course, with each skill divided into three instructional levels:…

  8. Text Mining for Neuroscience

    NASA Astrophysics Data System (ADS)

    Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis

    2011-06-01

    Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in

  9. Knowledge Based Text Generation

    DTIC Science & Technology

    1989-08-01

    Number 4, October-December, 1985, pp. 219-242. de Joia , A. and Stenton, A., Terms in Linguistics: A Guide to Halliday, London: Batsford Academic and...extraction of text schemata and their corresponding rhetorical predicates; design of a system motivated by the desire for domain and language independence...semantics and semantics effects syntax. Functional Linguistic Framework Page 19 The design of GENNY was guided by the functional paradigm. Provided a

  10. Direct methanol fuel cells: A database-driven design procedure

    NASA Astrophysics Data System (ADS)

    Flipsen, S. F. J.; Spitas, C.

    2011-10-01

    To test the feasibility of DMFC systems in preliminary stages of the design process the design engineer can make use of heuristic models identifying the opportunity of DMFC systems in a specific application. In general these models are to generic and have a low accuracy. To improve the accuracy a second-order model is proposed in this paper. The second-order model consists of an evolutionary algorithm written in Mathematica, which selects a component-set satisfying the fuel-cell systems' performance requirements, places the components in 3D space and optimizes for volume. The results are presented as a 3D draft proposal together with a feasibility metric. To test the algorithm the design of DMFC system applied in the MP3 player is evaluated. The results show that volume and costs are an issue for the feasibility of the fuel-cell power-system applied in the MP3 player. The generated designs and the algorithm are evaluated and recommendations are given.

  11. Case Histories of Landslide Impact: A Database-driven Approach

    NASA Astrophysics Data System (ADS)

    Klose, Martin; Damm, Bodo

    2015-04-01

    Fundamental understanding of landslide risk requires in-depth knowledge of how landslides have impacted society in the past (e.g., Corominas et al., 2014). A key to obtain insights into the evolution of landslide risk at single facilities of critical infrastructures are case histories of landslide impact. The purpose of such historical analyses is to inform about the site-specific interactions between landslides and land-use activity. Case histories support correlating landslide events and associated damages with multiple control variables of landslide risk, including (i) previous construction works, (ii) hazard awareness, (iii) the type of structure or its material properties, and (iv) measures of post-disaster mitigation. It is a key advantage of case histories to provide an overview of the changes in the exposure and vulnerability of infrastructures over time. Their application helps to learn more about changing patterns in risk culture and the effectiveness of repair or prevention measures (e.g., Klose et al., 2014). Case histories of landslide impact are developed on the basis of information extracted from landslide databases. The use of path diagrams and illustrated flowcharts as data modeling techniques is aimed at structuring, condensing, and visualizing complex historical data sets on landslide activity and land-use. Much of the scientific potential of case histories simply depends on the quality of available database information. Landslide databases relying on a bottom-up approach characterized by targeted local data specification are optimally suited for historical impact analyses. Combined with systematic retrieval, extraction, and integration of data from multiple sources, landslide databases constitute a valuable tool for developing case histories that enable to open a whole new window on the study of landslide impacts (e.g., Damm and Klose, 2014). The present contribution introduces such a case history for a well-known landslide site at a heavily frequented highway in NW Germany. Landslide problems at this site started with road construction in the early 1880s and were related to multiple event clusters, especially those in the years 1936-1937 (n = 4), 1961 (n = 2), 1970-1974 (n = 5), and 1999-2001 (n = 7). The most frequently applied mitigation measures were rudimentary and less expensive, including (i) removal of loose rock and vegetation (1924, 1936, 1961-1962, 1994), (ii) rock blasting (1936), (iii) catch barriers (1937, 1994), and (iv) temporary or perpetual closure of traffic lanes (1982, 1994). A series of destructive landslides forced decision-makers to launch an expensive slope stabilization project in 2001 that resulted in costs of USD 7.1 million. After finalization of the project no further landslide problems have been reported for this site. References Corominas, J., van Westen, C., Frattini, P., Cascini, L., Malet, J.-P., Fotopoulou, S., Catani, F., Van Den Eeckhaut, M., Mavrouli, O., Agliardi, F., Pitilakis, K., Winter, M.G., Pastor, M., Ferlisi, S., Tofani, V., Hervás, J., Smith, J.T., 2014. Recommendations for the quantitative analysis of landslide risk. Bulletin of Engineering Geology and the Environment 73, 209-263. Damm, B., Klose, M., 2014. Landslide database for the Federal Republic of Germany: a tool for analysis of mass movement processes and impacts. In: Sassa, K., Canuti, P., Yin, Y. (Eds.), Landslide Science for a Safer Geoenvironment. Volume 2: Methods of Landslide Studies. Springer, Berlin, pp. 787-792. Klose, M., Damm, B., Terhorst, B., 2014. Landslide cost modeling for transportation infrastructures: a methodological approach. Landslides, DOI 10.1007/s10346-014-0481-1.

  12. Sequential neural text compression.

    PubMed

    Schmidhuber, J; Heil, S

    1996-01-01

    The purpose of this paper is to show that neural networks may be promising tools for data compression without loss of information. We combine predictive neural nets and statistical coding techniques to compress text files. We apply our methods to certain short newspaper articles and obtain compression ratios exceeding those of the widely used Lempel-Ziv algorithms (which build the basis of the UNIX functions "compress" and "gzip"). The main disadvantage of our methods is that they are about three orders of magnitude slower than standard methods.

  13. Physician. A metapaedogogical text.

    PubMed

    Dean-Jones, Lesley

    2010-01-01

    It has generally been thought that the short treatise Physician was written for the beginning medical student and as such it has been criticized for being so superficial as to be worthless for producing anything but an empty charade of a physician. There are also numerous cruces in the text on which scholars have failed to come to any consensus. This paper argues that by taking the audience of the treatise to be the beginning instructor rather than the beginning student the tone of and information included in the treatise can be seen to be appropriate and the textual cruces can all be explained with little or no amendment by the same hypothesis.

  14. TRMM Gridded Text Products

    NASA Technical Reports Server (NTRS)

    Stocker, Erich Franz

    2007-01-01

    NASA's Tropical Rainfall Measuring Mission (TRMM) has many products that contain instantaneous or gridded rain rates often among many other parameters. However, these products because of their completeness can often seem intimidating to users just desiring surface rain rates. For example one of the gridded monthly products contains well over 200 parameters. It is clear that if only rain rates are desired, this many parameters might prove intimidating. In addition, for many good reasons these products are archived and currently distributed in HDF format. This also can be an inhibiting factor in using TRMM rain rates. To provide a simple format and isolate just the rain rates from the many other parameters, the TRMM product created a series of gridded products in ASCII text format. This paper describes the various text rain rate products produced. It provides detailed information about parameters and how they are calculated. It also gives detailed format information. These products are used in a number of applications with the TRMM processing system. The products are produced from the swath instantaneous rain rates and contain information from the three major TRMM instruments: radar, radiometer, and combined. They are simple to use, human readable, and small for downloading.

  15. Sentiment classification technology based on Markov logic networks

    NASA Astrophysics Data System (ADS)

    He, Hui; Li, Zhigang; Yao, Chongchong; Zhang, Weizhe

    2016-07-01

    With diverse online media emerging, there is a growing concern of sentiment classification problem. At present, text sentiment classification mainly utilizes supervised machine learning methods, which feature certain domain dependency. On the basis of Markov logic networks (MLNs), this study proposed a cross-domain multi-task text sentiment classification method rooted in transfer learning. Through many-to-one knowledge transfer, labeled text sentiment classification, knowledge was successfully transferred into other domains, and the precision of the sentiment classification analysis in the text tendency domain was improved. The experimental results revealed the following: (1) the model based on a MLN demonstrated higher precision than the single individual learning plan model. (2) Multi-task transfer learning based on Markov logical networks could acquire more knowledge than self-domain learning. The cross-domain text sentiment classification model could significantly improve the precision and efficiency of text sentiment classification.

  16. Information Gain Based Dimensionality Selection for Classifying Text Documents

    SciTech Connect

    Dumidu Wijayasekara; Milos Manic; Miles McQueen

    2013-06-01

    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexitymore » is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.« less

  17. Reading Text While Driving

    PubMed Central

    Horrey, William J.; Hoffman, Joshua D.

    2015-01-01

    Objective In this study, we investigated how drivers adapt secondary-task initiation and time-sharing behavior when faced with fluctuating driving demands. Background Reading text while driving is particularly detrimental; however, in real-world driving, drivers actively decide when to perform the task. Method In a test track experiment, participants were free to decide when to read messages while driving along a straight road consisting of an area with increased driving demands (demand zone) followed by an area with low demands. A message was made available shortly before the vehicle entered the demand zone. We manipulated the type of driving demands (baseline, narrow lane, pace clock, combined), message format (no message, paragraph, parsed), and the distance from the demand zone when the message was available (near, far). Results In all conditions, drivers started reading messages (drivers’ first glance to the display) before entering or before leaving the demand zone but tended to wait longer when faced with increased driving demands. While reading messages, drivers looked more or less off road, depending on types of driving demands. Conclusions For task initiation, drivers avoid transitions from low to high demands; however, they are not discouraged when driving demands are already elevated. Drivers adjust time-sharing behavior according to driving demands while performing secondary tasks. Nonetheless, such adjustment may be less effective when total demands are high. Application This study helps us to understand a driver’s role as an active controller in the context of distracted driving and provides insights for developing distraction interventions. PMID:25850162

  18. Semantic classification of business images

    NASA Astrophysics Data System (ADS)

    Erol, Berna; Hull, Jonathan J.

    2006-01-01

    Digital cameras are becoming increasingly common for capturing information in business settings. In this paper, we describe a novel method for classifying images into the following semantic classes: document, whiteboard, business card, slide, and regular images. Our method is based on combining low-level image features, such as text color, layout, and handwriting features with high-level OCR output analysis. Several Support Vector Machine Classifiers are combined for multi-class classification of input images. The system yields 95% accuracy in classification.

  19. Statistical text classifier to detect specific type of medical incidents.

    PubMed

    Wong, Zoie Shui-Yee; Akiyama, Masanori

    2013-01-01

    WHO Patient Safety has put focus to increase the coherence and expressiveness of patient safety classification with the foundation of International Classification for Patient Safety (ICPS). Text classification and statistical approaches has showed to be successful to identifysafety problems in the Aviation industryusing incident text information. It has been challenging to comprehend the taxonomy of medical incidents in a structured manner. Independent reporting mechanisms for patient safety incidents have been established in the UK, Canada, Australia, Japan, Hong Kong etc. This research demonstrates the potential to construct statistical text classifiers to detect specific type of medical incidents using incident text data. An illustrative example for classifying look-alike sound-alike (LASA) medication incidents using structured text from 227 advisories related to medication errors from Global Patient Safety Alerts (GPSA) is shown in this poster presentation. The classifier was built using logistic regression model. ROC curve and the AUC value indicated that this is a satisfactory good model.

  20. Classification Shell Game.

    ERIC Educational Resources Information Center

    Etzold, Carol

    1983-01-01

    Discusses shell classification exercises. Through keying students advanced from the "I know what a shell looks like" stage to become involved in the classification process: observing, labeling, making decisions about categories, and identifying marine animals. (Author/JN)

  1. Government Classification: An Overview.

    ERIC Educational Resources Information Center

    Brown, Karen M.

    Classification of government documents (confidential, secret, top secret) is a system used by the executive branch to, in part, protect national security and foreign policy interests. The systematic use of classification markings with precise definitions was established during World War I, and since 1936 major changes in classification have…

  2. Automated Diagnosis Coding with Combined Text Representations.

    PubMed

    Berndorfer, Stefan; Henriksson, Aron

    2017-01-01

    Automated diagnosis coding can be provided efficiently by learning predictive models from historical data; however, discriminating between thousands of codes while allowing a variable number of codes to be assigned is extremely difficult. Here, we explore various text representations and classification models for assigning ICD-9 codes to discharge summaries in MIMIC-III. It is shown that the relative effectiveness of the investigated representations depends on the frequency of the diagnosis code under consideration and that the best performance is obtained by combining models built using different representations.

  3. Tri-Texts: A Potential Next Step for Paired Texts

    ERIC Educational Resources Information Center

    Ciecierski, Lisa M.; Bintz, William P.

    2018-01-01

    This article presents the concept of tri-texts as a potential next step from paired texts following a collaborative inquiry with fifth-grade students. Paired texts are two texts intertextually connected, whereas tri-texts are three texts connected this way. The authors begin the article with a short literature review highlighting some of the…

  4. Teaching Text Structure: Examining the Affordances of Children's Informational Texts

    ERIC Educational Resources Information Center

    Jones, Cindy D.; Clark, Sarah K.; Reutzel, D. Ray

    2016-01-01

    This study investigated the affordances of informational texts to serve as model texts for teaching text structure to elementary school children. Content analysis of a random sampling of children's informational texts from top publishers was conducted on text structure organization and on the inclusion of text features as signals of text…

  5. Classification, disease, and diagnosis.

    PubMed

    Jutel, Annemarie

    2011-01-01

    Classification shapes medicine and guides its practice. Understanding classification must be part of the quest to better understand the social context and implications of diagnosis. Classifications are part of the human work that provides a foundation for the recognition and study of illness: deciding how the vast expanse of nature can be partitioned into meaningful chunks, stabilizing and structuring what is otherwise disordered. This article explores the aims of classification, their embodiment in medical diagnosis, and the historical traditions of medical classification. It provides a brief overview of the aims and principles of classification and their relevance to contemporary medicine. It also demonstrates how classifications operate as social framing devices that enable and disable communication, assert and refute authority, and are important items for sociological study.

  6. Automatic Text Summarization for Indonesian Language Using TextTeaser

    NASA Astrophysics Data System (ADS)

    Gunawan, D.; Pasaribu, A.; Rahmat, R. F.; Budiarto, R.

    2017-04-01

    Text summarization is one of the solution for information overload. Reducing text without losing the meaning not only can save time to read, but also maintain the reader’s understanding. One of many algorithms to summarize text is TextTeaser. Originally, this algorithm is intended to be used for text in English. However, due to TextTeaser algorithm does not consider the meaning of the text, we implement this algorithm for text in Indonesian language. This algorithm calculates four elements, such as title feature, sentence length, sentence position and keyword frequency. We utilize TextRank, an unsupervised and language independent text summarization algorithm, to evaluate the summarized text yielded by TextTeaser. The result shows that the TextTeaser algorithm needs more improvement to obtain better accuracy.

  7. Important Text Characteristics for Early-Grades Text Complexity

    ERIC Educational Resources Information Center

    Fitzgerald, Jill; Elmore, Jeff; Koons, Heather; Hiebert, Elfrieda H.; Bowen, Kimberly; Sanford-Moore, Eleanor E.; Stenner, A. Jackson

    2015-01-01

    The Common Core set a standard for all children to read increasingly complex texts throughout schooling. The purpose of the present study was to explore text characteristics specifically in relation to early-grades text complexity. Three hundred fifty primary-grades texts were selected and digitized. Twenty-two text characteristics were identified…

  8. Learning From Short Text Streams With Topic Drifts.

    PubMed

    Li, Peipei; He, Lu; Wang, Haiyan; Hu, Xuegang; Zhang, Yuhong; Li, Lei; Wu, Xindong

    2017-09-18

    Short text streams such as search snippets and micro blogs have been popular on the Web with the emergence of social media. Unlike traditional normal text streams, these data present the characteristics of short length, weak signal, high volume, high velocity, topic drift, etc. Short text stream classification is hence a very challenging and significant task. However, this challenge has received little attention from the research community. Therefore, a new feature extension approach is proposed for short text stream classification with the help of a large-scale semantic network obtained from a Web corpus. It is built on an incremental ensemble classification model for efficiency. First, more semantic contexts based on the senses of terms in short texts are introduced to make up of the data sparsity using the open semantic network, in which all terms are disambiguated by their semantics to reduce the noise impact. Second, a concept cluster-based topic drifting detection method is proposed to effectively track hidden topic drifts. Finally, extensive studies demonstrate that as compared to several well-known concept drifting detection methods in data stream, our approach can detect topic drifts effectively, and it enables handling short text streams effectively while maintaining the efficiency as compared to several state-of-the-art short text classification approaches.

  9. Recursive heuristic classification

    NASA Technical Reports Server (NTRS)

    Wilkins, David C.

    1994-01-01

    The author will describe a new problem-solving approach called recursive heuristic classification, whereby a subproblem of heuristic classification is itself formulated and solved by heuristic classification. This allows the construction of more knowledge-intensive classification programs in a way that yields a clean organization. Further, standard knowledge acquisition and learning techniques for heuristic classification can be used to create, refine, and maintain the knowledge base associated with the recursively called classification expert system. The method of recursive heuristic classification was used in the Minerva blackboard shell for heuristic classification. Minerva recursively calls itself every problem-solving cycle to solve the important blackboard scheduler task, which involves assigning a desirability rating to alternative problem-solving actions. Knowing these ratings is critical to the use of an expert system as a component of a critiquing or apprenticeship tutoring system. One innovation of this research is a method called dynamic heuristic classification, which allows selection among dynamically generated classification categories instead of requiring them to be prenumerated.

  10. Using ontology network structure in text mining.

    PubMed

    Berndt, Donald J; McCart, James A; Luther, Stephen L

    2010-11-13

    Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.

  11. Automatic Text Decomposition and Structuring.

    ERIC Educational Resources Information Center

    Salton, Gerard; And Others

    1996-01-01

    Text similarity measurements are used to determine relationships between natural-language texts and text excerpts. The resulting linked hypertext maps can be broken down into text segments and themes used to identify different text types and structures, leading to improved information access and utilization. Examples are provided for text…

  12. Guiding Students through Expository Text with Text Feature Walks

    ERIC Educational Resources Information Center

    Kelley, Michelle J.; Clausen-Grace, Nicki

    2010-01-01

    The Text Feature Walk is a structure created and employed by the authors that guides students in the reading of text features in order to access prior knowledge, make connections, and set a purpose for reading expository text. Results from a pilot study are described in order to illustrate the benefits of using the Text Feature Walk over…

  13. Emotion models for textual emotion classification

    NASA Astrophysics Data System (ADS)

    Bruna, O.; Avetisyan, H.; Holub, J.

    2016-11-01

    This paper deals with textual emotion classification which gained attention in recent years. Emotion classification is used in user experience, product evaluation, national security, and tutoring applications. It attempts to detect the emotional content in the input text and based on different approaches establish what kind of emotional content is present, if any. Textual emotion classification is the most difficult to handle, since it relies mainly on linguistic resources and it introduces many challenges to assignment of text to emotion represented by a proper model. A crucial part of each emotion detector is emotion model. Focus of this paper is to introduce emotion models used for classification. Categorical and dimensional models of emotion are explained and some more advanced approaches are mentioned.

  14. Text analysis methods, text analysis apparatuses, and articles of manufacture

    DOEpatents

    Whitney, Paul D; Willse, Alan R; Lopresti, Charles A; White, Amanda M

    2014-10-28

    Text analysis methods, text analysis apparatuses, and articles of manufacture are described according to some aspects. In one aspect, a text analysis method includes accessing information indicative of data content of a collection of text comprising a plurality of different topics, using a computing device, analyzing the information indicative of the data content, and using results of the analysis, identifying a presence of a new topic in the collection of text.

  15. Classroom Texting in College Students

    ERIC Educational Resources Information Center

    Pettijohn, Terry F.; Frazier, Erik; Rieser, Elizabeth; Vaughn, Nicholas; Hupp-Wilds, Bobbi

    2015-01-01

    A 21-item survey on texting in the classroom was given to 235 college students. Overall, 99.6% of students owned a cellphone and 98% texted daily. Of the 138 students who texted in the classroom, most texted friends or significant others, and indicate the reason for classroom texting is boredom or work. Students who texted sent a mean of 12.21…

  16. Observation of [Formula: see text] and [Formula: see text] decays.

    PubMed

    Aaij, R; Adeva, B; Adinolfi, M; Ajaltouni, Z; Akar, S; Albrecht, J; Alessio, F; Alexander, M; Ali, S; Alkhazov, G; Alvarez Cartelle, P; Alves, A A; Amato, S; Amerio, S; Amhis, Y; An, L; Anderlini, L; Andreassi, G; Andreotti, M; Andrews, J E; Appleby, R B; Archilli, F; d'Argent, P; Arnau Romeu, J; Artamonov, A; Artuso, M; Aslanides, E; Auriemma, G; Baalouch, M; Babuschkin, I; Bachmann, S; Back, J J; Badalov, A; Baesso, C; Baker, S; Baldini, W; Barlow, R J; Barschel, C; Barsuk, S; Barter, W; Baszczyk, M; Batozskaya, V; Batsukh, B; Battista, V; Bay, A; Beaucourt, L; Beddow, J; Bedeschi, F; Bediaga, I; Bel, L J; Bellee, V; Belloli, N; Belous, K; Belyaev, I; Ben-Haim, E; Bencivenni, G; Benson, S; Benton, J; Berezhnoy, A; Bernet, R; Bertolin, A; Betancourt, C; Betti, F; Bettler, M-O; van Beuzekom, M; Bezshyiko, Ia; Bifani, S; Billoir, P; Bird, T; Birnkraut, A; Bitadze, A; Bizzeti, A; Blake, T; Blanc, F; Blouw, J; Blusk, S; Bocci, V; Boettcher, T; Bondar, A; Bondar, N; Bonivento, W; Bordyuzhin, I; Borgheresi, A; Borghi, S; Borisyak, M; Borsato, M; Bossu, F; Boubdir, M; Bowcock, T J V; Bowen, E; Bozzi, C; Braun, S; Britsch, M; Britton, T; Brodzicka, J; Buchanan, E; Burr, C; Bursche, A; Buytaert, J; Cadeddu, S; Calabrese, R; Calvi, M; Calvo Gomez, M; Camboni, A; Campana, P; Campora Perez, D H; Capriotti, L; Carbone, A; Carboni, G; Cardinale, R; Cardini, A; Carniti, P; Carson, L; Carvalho Akiba, K; Casse, G; Cassina, L; Castillo Garcia, L; Cattaneo, M; Cauet, Ch; Cavallero, G; Cenci, R; Charles, M; Charpentier, Ph; Chatzikonstantinidis, G; Chefdeville, M; Chen, S; Cheung, S-F; Chobanova, V; Chrzaszcz, M; Cid Vidal, X; Ciezarek, G; Clarke, P E L; Clemencic, M; Cliff, H V; Closier, J; Coco, V; Cogan, J; Cogneras, E; Cogoni, V; Cojocariu, L; Collazuol, G; Collins, P; Comerma-Montells, A; Contu, A; Cook, A; Coombs, G; Coquereau, S; Corti, G; Corvo, M; Costa Sobral, C M; Couturier, B; Cowan, G A; Craik, D C; Crocombe, A; Cruz Torres, M; Cunliffe, S; Currie, R; D'Ambrosio, C; Da Cunha Marinho, F; Dall'Occo, E; Dalseno, J; David, P N Y; Davis, A; De Aguiar Francisco, O; De Bruyn, K; De Capua, S; De Cian, M; De Miranda, J M; De Paula, L; De Serio, M; De Simone, P; Dean, C-T; Decamp, D; Deckenhoff, M; Del Buono, L; Demmer, M; Dendek, A; Derkach, D; Deschamps, O; Dettori, F; Dey, B; Di Canto, A; Dijkstra, H; Dordei, F; Dorigo, M; Dosil Suárez, A; Dovbnya, A; Dreimanis, K; Dufour, L; Dujany, G; Dungs, K; Durante, P; Dzhelyadin, R; Dziurda, A; Dzyuba, A; Déléage, N; Easo, S; Ebert, M; Egede, U; Egorychev, V; Eidelman, S; Eisenhardt, S; Eitschberger, U; Ekelhof, R; Eklund, L; Ely, S; Esen, S; Evans, H M; Evans, T; Falabella, A; Farley, N; Farry, S; Fay, R; Fazzini, D; Ferguson, D; Fernandez Prieto, A; Ferrari, F; Ferreira Rodrigues, F; Ferro-Luzzi, M; Filippov, S; Fini, R A; Fiore, M; Fiorini, M; Firlej, M; Fitzpatrick, C; Fiutowski, T; Fleuret, F; Fohl, K; Fontana, M; Fontanelli, F; Forshaw, D C; Forty, R; Franco Lima, V; Frank, M; Frei, C; Fu, J; Furfaro, E; Färber, C; Gallas Torreira, A; Galli, D; Gallorini, S; Gambetta, S; Gandelman, M; Gandini, P; Gao, Y; Garcia Martin, L M; García Pardiñas, J; Garra Tico, J; Garrido, L; Garsed, P J; Gascon, D; Gaspar, C; Gavardi, L; Gazzoni, G; Gerick, D; Gersabeck, E; Gersabeck, M; Gershon, T; Ghez, Ph; Gianì, S; Gibson, V; Girard, O G; Giubega, L; Gizdov, K; Gligorov, V V; Golubkov, D; Golutvin, A; Gomes, A; Gorelov, I V; Gotti, C; Govorkova, E; Grabalosa Gándara, M; Graciani Diaz, R; Granado Cardoso, L A; Graugés, E; Graverini, E; Graziani, G; Grecu, A; Griffith, P; Grillo, L; Gruberg Cazon, B R; Grünberg, O; Gushchin, E; Guz, Yu; Gys, T; Göbel, C; Hadavizadeh, T; Hadjivasiliou, C; Haefeli, G; Haen, C; Haines, S C; Hall, S; Hamilton, B; Han, X; Hansmann-Menzemer, S; Harnew, N; Harnew, S T; Harrison, J; Hatch, M; He, J; Head, T; Heister, A; Hennessy, K; Henrard, P; Henry, L; Hernando Morata, J A; van Herwijnen, E; Heß, M; Hicheur, A; Hill, D; Hombach, C; Hopchev, H; Hulsbergen, W; Humair, T; Hushchyn, M; Hussain, N; Hutchcroft, D; Idzik, M; Ilten, P; Jacobsson, R; Jaeger, A; Jalocha, J; Jans, E; Jawahery, A; Jiang, F; John, M; Johnson, D; Jones, C R; Joram, C; Jost, B; Jurik, N; Kandybei, S; Kanso, W; Karacson, M; Kariuki, J M; Karodia, S; Kecke, M; Kelsey, M; Kenyon, I R; Kenzie, M; Ketel, T; Khairullin, E; Khanji, B; Khurewathanakul, C; Kirn, T; Klaver, S; Klimaszewski, K; Koliiev, S; Kolpin, M; Komarov, I; Koopman, R F; Koppenburg, P; Kosmyntseva, A; Kozachuk, A; Kozeiha, M; Kravchuk, L; Kreplin, K; Kreps, M; Krokovny, P; Kruse, F; Krzemien, W; Kucewicz, W; Kucharczyk, M; Kudryavtsev, V; Kuonen, A K; Kurek, K; Kvaratskheliya, T; Lacarrere, D; Lafferty, G; Lai, A; Lanfranchi, G; Langenbruch, C; Latham, T; Lazzeroni, C; Le Gac, R; van Leerdam, J; Lees, J-P; Leflat, A; Lefrançois, J; Lefèvre, R; Lemaitre, F; Lemos Cid, E; Leroy, O; Lesiak, T; Leverington, B; Li, Y; Likhomanenko, T; Lindner, R; Linn, C; Lionetto, F; Liu, B; Liu, X; Loh, D; Longstaff, I; Lopes, J H; Lucchesi, D; Lucio Martinez, M; Luo, H; Lupato, A; Luppi, E; Lupton, O; Lusiani, A; Lyu, X; Machefert, F; Maciuc, F; Maev, O; Maguire, K; Malde, S; Malinin, A; Maltsev, T; Manca, G; Mancinelli, G; Manning, P; Maratas, J; Marchand, J F; Marconi, U; Marin Benito, C; Marino, P; Marks, J; Martellotti, G; Martin, M; Martinelli, M; Martinez Santos, D; Martinez Vidal, F; Martins Tostes, D; Massacrier, L M; Massafferri, A; Matev, R; Mathad, A; Mathe, Z; Matteuzzi, C; Mauri, A; Maurin, B; Mazurov, A; McCann, M; McCarthy, J; McNab, A; McNulty, R; Meadows, B; Meier, F; Meissner, M; Melnychuk, D; Merk, M; Merli, A; Michielin, E; Milanes, D A; Minard, M-N; Mitzel, D S; Mogini, A; Molina Rodriguez, J; Monroy, I A; Monteil, S; Morandin, M; Morawski, P; Mordà, A; Morello, M J; Moron, J; Morris, A B; Mountain, R; Muheim, F; Mulder, M; Mussini, M; Müller, D; Müller, J; Müller, K; Müller, V; Naik, P; Nakada, T; Nandakumar, R; Nandi, A; Nasteva, I; Needham, M; Neri, N; Neubert, S; Neufeld, N; Neuner, M; Nguyen, A D; Nguyen, T D; Nguyen-Mau, C; Nieswand, S; Niet, R; Nikitin, N; Nikodem, T; Novoselov, A; O'Hanlon, D P; Oblakowska-Mucha, A; Obraztsov, V; Ogilvy, S; Oldeman, R; Onderwater, C J G; Otalora Goicochea, J M; Otto, A; Owen, P; Oyanguren, A; Pais, P R; Palano, A; Palombo, F; Palutan, M; Panman, J; Papanestis, A; Pappagallo, M; Pappalardo, L L; Parker, W; Parkes, C; Passaleva, G; Pastore, A; Patel, G D; Patel, M; Patrignani, C; Pearce, A; Pellegrino, A; Penso, G; Pepe Altarelli, M; Perazzini, S; Perret, P; Pescatore, L; Petridis, K; Petrolini, A; Petrov, A; Petruzzo, M; Picatoste Olloqui, E; Pietrzyk, B; Pikies, M; Pinci, D; Pistone, A; Piucci, A; Playfer, S; Plo Casasus, M; Poikela, T; Polci, F; Poluektov, A; Polyakov, I; Polycarpo, E; Pomery, G J; Popov, A; Popov, D; Popovici, B; Poslavskii, S; Potterat, C; Price, E; Price, J D; Prisciandaro, J; Pritchard, A; Prouve, C; Pugatch, V; Puig Navarro, A; Punzi, G; Qian, W; Quagliani, R; Rachwal, B; Rademacker, J H; Rama, M; Ramos Pernas, M; Rangel, M S; Raniuk, I; Ratnikov, F; Raven, G; Redi, F; Reichert, S; Dos Reis, A C; Remon Alepuz, C; Renaudin, V; Ricciardi, S; Richards, S; Rihl, M; Rinnert, K; Rives Molina, V; Robbe, P; Rodrigues, A B; Rodrigues, E; Rodriguez Lopez, J A; Rodriguez Perez, P; Rogozhnikov, A; Roiser, S; Rollings, A; Romanovskiy, V; Romero Vidal, A; Ronayne, J W; Rotondo, M; Rudolph, M S; Ruf, T; Ruiz Valls, P; Saborido Silva, J J; Sadykhov, E; Sagidova, N; Saitta, B; Salustino Guimaraes, V; Sanchez Mayordomo, C; Sanmartin Sedes, B; Santacesaria, R; Santamarina Rios, C; Santimaria, M; Santovetti, E; Sarti, A; Satriano, C; Satta, A; Saunders, D M; Savrina, D; Schael, S; Schellenberg, M; Schiller, M; Schindler, H; Schlupp, M; Schmelling, M; Schmelzer, T; Schmidt, B; Schneider, O; Schopper, A; Schubert, K; Schubiger, M; Schune, M-H; Schwemmer, R; Sciascia, B; Sciubba, A; Semennikov, A; Sergi, A; Serra, N; Serrano, J; Sestini, L; Seyfert, P; Shapkin, M; Shapoval, I; Shcheglov, Y; Shears, T; Shekhtman, L; Shevchenko, V; Siddi, B G; Silva Coutinho, R; Silva de Oliveira, L; Simi, G; Simone, S; Sirendi, M; Skidmore, N; Skwarnicki, T; Smith, E; Smith, I T; Smith, J; Smith, M; Snoek, H; Sokoloff, M D; Soler, F J P; Souza De Paula, B; Spaan, B; Spradlin, P; Sridharan, S; Stagni, F; Stahl, M; Stahl, S; Stefko, P; Stefkova, S; Steinkamp, O; Stemmle, S; Stenyakin, O; Stevenson, S; Stoica, S; Stone, S; Storaci, B; Stracka, S; Straticiuc, M; Straumann, U; Sun, L; Sutcliffe, W; Swientek, K; Syropoulos, V; Szczekowski, M; Szumlak, T; T'Jampens, S; Tayduganov, A; Tekampe, T; Tellarini, G; Teubert, F; Thomas, E; van Tilburg, J; Tilley, M J; Tisserand, V; Tobin, M; Tolk, S; Tomassetti, L; Tonelli, D; Topp-Joergensen, S; Toriello, F; Tournefier, E; Tourneur, S; Trabelsi, K; Traill, M; Tran, M T; Tresch, M; Trisovic, A; Tsaregorodtsev, A; Tsopelas, P; Tully, A; Tuning, N; Ukleja, A; Ustyuzhanin, A; Uwer, U; Vacca, C; Vagnoni, V; Valassi, A; Valat, S; Valenti, G; Vallier, A; Vazquez Gomez, R; Vazquez Regueiro, P; Vecchi, S; van Veghel, M; Velthuis, J J; Veltri, M; Veneziano, G; Venkateswaran, A; Vernet, M; Vesterinen, M; Viaud, B; Vieira, D; Vieites Diaz, M; Viemann, H; Vilasis-Cardona, X; Vitti, M; Volkov, V; Vollhardt, A; Voneki, B; Vorobyev, A; Vorobyev, V; Voß, C; de Vries, J A; Vázquez Sierra, C; Waldi, R; Wallace, C; Wallace, R; Walsh, J; Wang, J; Ward, D R; Wark, H M; Watson, N K; Websdale, D; Weiden, A; Whitehead, M; Wicht, J; Wilkinson, G; Wilkinson, M; Williams, M; Williams, M P; Williams, M; Williams, T; Wilson, F F; Wimberley, J; Wishahi, J; Wislicki, W; Witek, M; Wormser, G; Wotton, S A; Wraight, K; Wyllie, K; Xie, Y; Xing, Z; Xu, Z; Yang, Z; Yin, H; Yu, J; Yuan, X; Yushchenko, O; Zarebski, K A; Zavertyaev, M; Zhang, L; Zhang, Y; Zhang, Y; Zhelezov, A; Zheng, Y; Zhokhov, A; Zhu, X; Zhukov, V; Zucchelli, S

    2017-01-01

    The decays [Formula: see text] and [Formula: see text] are observed for the first time using a data sample corresponding to an integrated luminosity of 3.0 fb[Formula: see text], collected by the LHCb experiment in proton-proton collisions at the centre-of-mass energies of 7 and 8[Formula: see text]. The branching fractions relative to that of [Formula: see text] are measured to be [Formula: see text]where the first uncertainties are statistical and the second are systematic.

  17. Mining the Text: 34 Text Features that Can Ease or Obstruct Text Comprehension and Use

    ERIC Educational Resources Information Center

    White, Sheida

    2012-01-01

    This article presents 34 characteristics of texts and tasks ("text features") that can make continuous (prose), noncontinuous (document), and quantitative texts easier or more difficult for adolescents and adults to comprehend and use. The text features were identified by examining the assessment tasks and associated texts in the national…

  18. Technical Vocabulary in Specialised Texts.

    ERIC Educational Resources Information Center

    Chung, Teresa Mihwa; Nation, Paul

    2003-01-01

    Describes two studies of technical vocabulary, one using an anatomy text and the other an applied linguistics text. Technical vocabulary was found by rating words in the texts on a four-step scale. Found that technical vocabulary made up a very substantial proportion of both the different words and the running words in texts. (Author/VWL)

  19. The Challenge of Challenging Text

    ERIC Educational Resources Information Center

    Shanahan, Timothy; Fisher, Douglas; Frey, Nancy

    2012-01-01

    The Common Core State Standards emphasize the value of teaching students to engage with complex text. But what exactly makes a text complex, and how can teachers help students develop their ability to learn from such texts? The authors of this article discuss five factors that determine text complexity: vocabulary, sentence structure, coherence,…

  20. Texts in Homes and Communities.

    ERIC Educational Resources Information Center

    Pahl, Kate

    This paper considers how children's text making is shaped by the environment in which the texts are made. By considering texts made in classrooms and texts made in homes, the paper explores how classrooms and homes interact with children's (6-7 year old boys) reflective processes as they create artifacts--drawings, models, and writings. The paper…

  1. Text analysis devices, articles of manufacture, and text analysis methods

    DOEpatents

    Turner, Alan E; Hetzler, Elizabeth G; Nakamura, Grant C

    2013-05-28

    Text analysis devices, articles of manufacture, and text analysis methods are described according to some aspects. In one aspect, a text analysis device includes processing circuitry configured to analyze initial text to generate a measurement basis usable in analysis of subsequent text, wherein the measurement basis comprises a plurality of measurement features from the initial text, a plurality of dimension anchors from the initial text and a plurality of associations of the measurement features with the dimension anchors, and wherein the processing circuitry is configured to access a viewpoint indicative of a perspective of interest of a user with respect to the analysis of the subsequent text, and wherein the processing circuitry is configured to use the viewpoint to generate the measurement basis.

  2. Classification of anemia for gastroenterologists

    PubMed Central

    Moreno Chulilla, Jose Antonio; Romero Colás, Maria Soledad; Gutiérrez Martín, Martín

    2009-01-01

    Most anemia is related to the digestive system by dietary deficiency, malabsorption, or chronic bleeding. We review the World Health Organization definition of anemia, its morphological classification (microcytic, macrocytic and normocytic) and pathogenic classification (regenerative and hypo regenerative), and integration of these classifications. Interpretation of laboratory tests is included, from the simplest (blood count, routine biochemistry) to the more specific (iron metabolism, vitamin B12, folic acid, reticulocytes, erythropoietin, bone marrow examination and Schilling test). In the text and various algorithms, we propose a hierarchical and logical way to reach a diagnosis as quickly as possible, by properly managing the medical interview, physical examination, appropriate laboratory tests, bone marrow examination, and other complementary tests. The prevalence is emphasized in all sections so that the gastroenterologist can direct the diagnosis to the most common diseases, although the tables also include rare diseases. Digestive diseases potentially causing anemia have been studied in preference, but other causes of anemia have been included in the text and tables. Primitive hematological diseases that cause anemia are only listed, but are not discussed in depth. The last section is dedicated to simplifying all items discussed above, using practical rules to guide diagnosis and medical care with the greatest economy of resources and time. PMID:19787825

  3. Text-Dependent Questions: Reflecting and Transcending the Text

    ERIC Educational Resources Information Center

    Boelé, Amy L.

    2016-01-01

    Posing text-dependent questions is crucial for facilitating students' comprehension of the text. However, text-dependent questions should not merely ask students to reflect the author's literal or even inferential meaning. The author's message is the starting place for comprehension, rather than the end goal or object of comprehension. The text…

  4. The Impact of Text Browsing on Text Retrieval Performance.

    ERIC Educational Resources Information Center

    Bodner, Richard C.; Chignell, Mark H.; Charoenkitkarn, Nipon; Golovchinsky, Gene; Kopak, Richard W.

    2001-01-01

    Compares empirical results from three experiments using Text Retrieval Conference (TREC) data and search topics that involved three different user interfaces. Results show that marking Boolean queries on text, which encourages browsing, and hypertext interfaces to text retrieval systems can benefit recall and can also benefit novice users.…

  5. Litterature: Retour au texte (Literature: Return to the Text).

    ERIC Educational Resources Information Center

    Noe, Alfred

    1993-01-01

    Choice of texts for use in French language instruction is discussed. It is argued that the text's format (e.g., advertising, figurative poetry, journal article, play, prose, etc.) is instrumental in bringing attention to the language in it, and this has implications for the best uses of different text types. (MSE)

  6. Library Classification 2020

    ERIC Educational Resources Information Center

    Harris, Christopher

    2013-01-01

    In this article the author explores how a new library classification system might be designed using some aspects of the Dewey Decimal Classification (DDC) and ideas from other systems to create something that works for school libraries in the year 2020. By examining what works well with the Dewey Decimal System, what features should be carried…

  7. The Alaska vegetation classification.

    Treesearch

    L.A. Viereck; C.T. Dyrness; A.R. Batten; K.J. Wenzlick

    1992-01-01

    The Alaska vegetation classification presented here is a comprehensive, statewide system that has been under development since 1976. The classification is based, as much as possible, on the characteristics of the vegetation itself and is designed to categorize existing vegetation, not potential vegetation. A hierarchical system with five levels of resolution is used...

  8. Grounded Classification: Grounded Theory and Faceted Classification.

    ERIC Educational Resources Information Center

    Star, Susan Leigh

    1998-01-01

    Compares the qualitative method of grounded theory (GT) with Ranganathan's construction of faceted classifications (FC) in library and information science. Both struggle with a core problem--the representation of vernacular words and processes, empirically discovered, which will, although ethnographically faithful, be powerful beyond the single…

  9. Dangers of Texting While Driving

    MedlinePlus

    ... it be shared with students and parents. State laws Currently there is no national ban on texting ... driving, but a number of states have passed laws banning texting or wireless phones or requiring hands- ...

  10. Segmental Rescoring in Text Recognition

    DTIC Science & Technology

    2014-02-04

    description relates to rescoring text hypotheses in text recognition based on segmental features. Offline printed text and handwriting recognition (OHR) can... Handwriting , College Park, Md., 2006, which is incorporated by reference here. For the set of training images 202, a character modeler 208 receives

  11. Informational Text and the CCSS

    ERIC Educational Resources Information Center

    Aspen Institute, 2012

    2012-01-01

    What constitutes an informational text covers a broad swath of different types of texts. Biographies & memoirs, speeches, opinion pieces & argumentative essays, and historical, scientific or technical accounts of a non-narrative nature are all included in what the Common Core State Standards (CCSS) envisions as informational text. Also included…

  12. Automatic Text Structuring and Summarization.

    ERIC Educational Resources Information Center

    Salton, Gerard; And Others

    1997-01-01

    Discussion of the use of information retrieval techniques for automatic generation of semantic hypertext links focuses on automatic text summarization. Topics include World Wide Web links, text segmentation, and evaluation of text summarization by comparing automatically generated abstracts with manually prepared abstracts. (Author/LRW)

  13. Text Signals Influence Team Artifacts

    ERIC Educational Resources Information Center

    Clariana, Roy B.; Rysavy, Monica D.; Taricani, Ellen

    2015-01-01

    This exploratory quasi-experimental investigation describes the influence of text signals on team visual map artifacts. In two course sections, four-member teams were given one of two print-based text passage versions on the course-related topic "Social influence in groups" downloaded from Wikipedia; this text had two paragraphs, each…

  14. The Only Safe SMS Texting Is No SMS Texting.

    PubMed

    Toth, Cheryl; Sacopulos, Michael J

    2015-01-01

    Many physicians and practice staff use short messaging service (SMS) text messaging to communicate with patients. But SMS text messaging is unencrypted, insecure, and does not meet HIPAA requirements. In addition, the short and abbreviated nature of text messages creates opportunities for misinterpretation, and can negatively impact patient safety and care. Until recently, asking patients to sign a statement that they understand and accept these risks--as well as having policies, device encryption, and cyber insurance in place--would have been enough to mitigate the risk of using SMS text in a medical practice. But new trends and policies have made SMS text messaging unsafe under any circumstance. This article explains these trends and policies, as well as why only secure texting or secure messaging should be used for physician-patient communication.

  15. Text recycling: acceptable or misconduct?

    PubMed

    Harriman, Stephanie; Patel, Jigisha

    2014-08-16

    Text recycling, also referred to as self-plagiarism, is the reproduction of an author's own text from a previous publication in a new publication. Opinions on the acceptability of this practice vary, with some viewing it as acceptable and efficient, and others as misleading and unacceptable. In light of the lack of consensus, journal editors often have difficulty deciding how to act upon the discovery of text recycling. In response to these difficulties, we have created a set of guidelines for journal editors on how to deal with text recycling. In this editorial, we discuss some of the challenges of developing these guidelines, and how authors can avoid undisclosed text recycling.

  16. Facilitating text reading in posterior cortical atrophy

    PubMed Central

    Rajdev, Kishan; Shakespeare, Timothy J.; Leff, Alexander P.; Crutch, Sebastian J.

    2015-01-01

    Objective: We report (1) the quantitative investigation of text reading in posterior cortical atrophy (PCA), and (2) the effects of 2 novel software-based reading aids that result in dramatic improvements in the reading ability of patients with PCA. Methods: Reading performance, eye movements, and fixations were assessed in patients with PCA and typical Alzheimer disease and in healthy controls (experiment 1). Two reading aids (single- and double-word) were evaluated based on the notion that reducing the spatial and oculomotor demands of text reading might support reading in PCA (experiment 2). Results: Mean reading accuracy in patients with PCA was significantly worse (57%) compared with both patients with typical Alzheimer disease (98%) and healthy controls (99%); spatial aspects of passages were the primary determinants of text reading ability in PCA. Both aids led to considerable gains in reading accuracy (PCA mean reading accuracy: single-word reading aid = 96%; individual patient improvement range: 6%–270%) and self-rated measures of reading. Data suggest a greater efficiency of fixations and eye movements under the single-word reading aid in patients with PCA. Conclusions: These findings demonstrate how neurologic characterization of a neurodegenerative syndrome (PCA) and detailed cognitive analysis of an important everyday skill (reading) can combine to yield aids capable of supporting important everyday functional abilities. Classification of evidence: This study provides Class III evidence that for patients with PCA, 2 software-based reading aids (single-word and double-word) improve reading accuracy. PMID:26138948

  17. Pedoinformatics Approach to Soil Text Analytics

    NASA Astrophysics Data System (ADS)

    Furey, J.; Seiter, J.; Davis, A.

    2017-12-01

    The several extant schema for the classification of soils rely on differing criteria, but the major soil science taxonomies, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources systems, are based principally on inferred pedogenic properties. These taxonomies largely result from compiled individual observations of soil morphologies within soil profiles, and the vast majority of this pedologic information is contained in qualitative text descriptions. We present text mining analyses of hundreds of gigabytes of parsed text and other data in the digitally available USDA soil taxonomy documentation, the Soil Survey Geographic (SSURGO) database, and the National Cooperative Soil Survey (NCSS) soil characterization database. These analyses implemented iPython calls to Gensim modules for topic modelling, with latent semantic indexing completed down to the lowest taxon level (soil series) paragraphs. Via a custom extension of the Natural Language Toolkit (NLTK), approximately one percent of the USDA soil series descriptions were used to train a classifier for the remainder of the documents, essentially by treating soil science words as comprising a novel language. While location-specific descriptors at the soil series level are amenable to geomatics methods, unsupervised clustering of the occurrence of other soil science words did not closely follow the usual hierarchy of soil taxa. We present preliminary phrasal analyses that may account for some of these effects.

  18. Text against Text: Counterbalancing the Hegemony of Assessment.

    ERIC Educational Resources Information Center

    Cosgrove, Cornelius

    A study examined whether composition specialists can counterbalance the potential privileging of the assessment perspective, or of self-appointed interpreters of that perspective, through the study of assessment discourse as text. Fourteen assessment texts were examined, most of them journal articles and most of them featuring the common…

  19. Classification of movement disorders.

    PubMed

    Fahn, Stanley

    2011-05-01

    The classification of movement disorders has evolved. Even the terminology has shifted, from an anatomical one of extrapyramidal disorders to a phenomenological one of movement disorders. The history of how this shift came about is described. The history of both the definitions and the classifications of the various neurologic conditions is then reviewed. First is a review of movement disorders as a group; then, the evolving classifications for 3 of them--parkinsonism, dystonia, and tremor--are covered in detail. Copyright © 2011 Movement Disorder Society.

  20. Detection of text strings from mixed text/graphics images

    NASA Astrophysics Data System (ADS)

    Tsai, Chien-Hua; Papachristou, Christos A.

    2000-12-01

    A robust system for text strings separation from mixed text/graphics images is presented. Based on a union-find (region growing) strategy the algorithm is thus able to classify the text from graphics and adapts to changes in document type, language category (e.g., English, Chinese and Japanese), text font style and size, and text string orientation within digital images. In addition, it allows for a document skew that usually occurs in documents, without skew correction prior to discrimination while these proposed methods such a projection profile or run length coding are not always suitable for the condition. The method has been tested with a variety of printed documents from different origins with one common set of parameters, and the experimental results of the performance of the algorithm in terms of computational efficiency are demonstrated by using several tested images from the evaluation.

  1. Text analysis devices, articles of manufacture, and text analysis methods

    DOEpatents

    Turner, Alan E; Hetzler, Elizabeth G; Nakamura, Grant C

    2015-03-31

    Text analysis devices, articles of manufacture, and text analysis methods are described according to some aspects. In one aspect, a text analysis device includes a display configured to depict visible images, and processing circuitry coupled with the display and wherein the processing circuitry is configured to access a first vector of a text item and which comprises a plurality of components, to access a second vector of the text item and which comprises a plurality of components, to weight the components of the first vector providing a plurality of weighted values, to weight the components of the second vector providing a plurality of weighted values, and to combine the weighted values of the first vector with the weighted values of the second vector to provide a third vector.

  2. Texting while driving: is speech-based text entry less risky than handheld text entry?

    PubMed

    He, J; Chaparro, A; Nguyen, B; Burge, R J; Crandall, J; Chaparro, B; Ni, R; Cao, S

    2014-11-01

    Research indicates that using a cell phone to talk or text while maneuvering a vehicle impairs driving performance. However, few published studies directly compare the distracting effects of texting using a hands-free (i.e., speech-based interface) versus handheld cell phone, which is an important issue for legislation, automotive interface design and driving safety training. This study compared the effect of speech-based versus handheld text entries on simulated driving performance by asking participants to perform a car following task while controlling the duration of a secondary text-entry task. Results showed that both speech-based and handheld text entries impaired driving performance relative to the drive-only condition by causing more variation in speed and lane position. Handheld text entry also increased the brake response time and increased variation in headway distance. Text entry using a speech-based cell phone was less detrimental to driving performance than handheld text entry. Nevertheless, the speech-based text entry task still significantly impaired driving compared to the drive-only condition. These results suggest that speech-based text entry disrupts driving, but reduces the level of performance interference compared to text entry with a handheld device. In addition, the difference in the distraction effect caused by speech-based and handheld text entry is not simply due to the difference in task duration. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. SparkText: Biomedical Text Mining on Big Data Framework.

    PubMed

    Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M

    Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.

  4. SparkText: Biomedical Text Mining on Big Data Framework

    PubMed Central

    He, Karen Y.; Wang, Kai

    2016-01-01

    Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652

  5. Text Recycling in Scientific Writing.

    PubMed

    Moskovitz, Cary

    2018-03-15

    Text recycling, often called "self-plagiarism", is the practice of reusing textual material from one's prior documents in a new work. The practice presents a complex set of ethical and practical challenges to the scientific community, many of which have not been addressed in prior discourse on the subject. This essay identifies and discusses these factors in a systematic fashion, concluding with a new definition of text recycling that takes these factors into account. Topics include terminology, what is not text recycling, factors affecting judgements about the appropriateness of text recycling, and visual materials.

  6. Postprocessing classification images

    NASA Technical Reports Server (NTRS)

    Kan, E. P.

    1979-01-01

    Program cleans up remote-sensing maps. It can be used with existing image-processing software. Remapped images closely resemble familiar resource information maps and can replace or supplement classification images not postprocessed by this program.

  7. Update on diabetes classification.

    PubMed

    Thomas, Celeste C; Philipson, Louis H

    2015-01-01

    This article highlights the difficulties in creating a definitive classification of diabetes mellitus in the absence of a complete understanding of the pathogenesis of the major forms. This brief review shows the evolving nature of the classification of diabetes mellitus. No classification scheme is ideal, and all have some overlap and inconsistencies. The only diabetes in which it is possible to accurately diagnose by DNA sequencing, monogenic diabetes, remains undiagnosed in more than 90% of the individuals who have diabetes caused by one of the known gene mutations. The point of classification, or taxonomy, of disease, should be to give insight into both pathogenesis and treatment. It remains a source of frustration that all schemes of diabetes mellitus continue to fall short of this goal. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Comparing drug classification systems.

    PubMed

    Mahoney, Anne; Evans, Jonathan

    2008-11-06

    An essential quality of drug classification systems is the ability to assign medications to a structured hierarchy for categories such as mechanism of action, physiological effects, and therapeutic indications. No single classification system can meet all of these needs; however, there should be consistency among those that group by the same underlying principals. We discovered discrepancies in how drugs with multiple therapeutic indications are classified among four widely used schemas.

  9. TES: A Text Extraction System.

    ERIC Educational Resources Information Center

    Goh, A.; Hui, S. C.

    1996-01-01

    Describes how TES, a text extraction system, is able to electronically retrieve a set of sentences from a document to form an indicative abstract. Discusses various text abstraction techniques and related work in the area, provides an overview of the TES system, and compares system results against manually produced abstracts. (LAM)

  10. The Case for Multiple Texts

    ERIC Educational Resources Information Center

    Cummins, Sunday

    2017-01-01

    Reading just one text on any topic, Cummins argues, isn't enough if we expect students to learn at deep levels about the topic, synthesize various sources of information, and gain the knowledge they need to write and speak seriously about the topic. Reading a second or third text expands a reader's knowledge on any topic or story--and the why…

  11. Understanding and Teaching Complex Texts

    ERIC Educational Resources Information Center

    Fisher, Douglas; Frey, Nancy

    2014-01-01

    Teachers in today's classrooms struggle every day to design instructional interventions that would build students' reading skills and strategies in order to ensure their comprehension of complex texts. Text complexity can be determined in both qualitative and quantitative ways. In this article, the authors describe various innovative…

  12. Improve Reading with Complex Texts

    ERIC Educational Resources Information Center

    Fisher, Douglas; Frey, Nancy

    2015-01-01

    The Common Core State Standards have cast a renewed light on reading instruction, presenting teachers with the new requirements to teach close reading of complex texts. Teachers and administrators should consider a number of essential features of close reading: They are short, complex texts; rich discussions based on worthy questions; revisiting…

  13. Text Genres in Information Organization

    ERIC Educational Resources Information Center

    Nahotko, Marek

    2016-01-01

    Introduction: Text genres used by so-called information organizers in the processes of information organization in information systems were explored in this research. Method: The research employed text genre socio-functional analysis. Five genre groups in information organization were distinguished. Every genre group used in information…

  14. Semantic Mapping: A Text Perspective.

    ERIC Educational Resources Information Center

    Harste, Jerome C.

    Children's early writing is analyzed in this paper according to different perspectives such as function, grapho-phonemics, syntax, and semantics. Emphasis is given to the semantic perspective of decoding the text and to the study of coherence in text as it is viewed by the reader. Proposition analysis is used to map the coherence of samples of…

  15. Supernova Photometric Lightcurve Classification

    NASA Astrophysics Data System (ADS)

    Zaidi, Tayeb; Narayan, Gautham

    2016-01-01

    This is a preliminary report on photometric supernova classification. We first explore the properties of supernova light curves, and attempt to restructure the unevenly sampled and sparse data from assorted datasets to allow for processing and classification. The data was primarily drawn from the Dark Energy Survey (DES) simulated data, created for the Supernova Photometric Classification Challenge. This poster shows a method for producing a non-parametric representation of the light curve data, and applying a Random Forest classifier algorithm to distinguish between supernovae types. We examine the impact of Principal Component Analysis to reduce the dimensionality of the dataset, for future classification work. The classification code will be used in a stage of the ANTARES pipeline, created for use on the Large Synoptic Survey Telescope alert data and other wide-field surveys. The final figure-of-merit for the DES data in the r band was 60% for binary classification (Type I vs II).Zaidi was supported by the NOAO/KPNO Research Experiences for Undergraduates (REU) Program which is funded by the National Science Foundation Research Experiences for Undergraduates Program (AST-1262829).

  16. Individual Profiling Using Text Analysis

    DTIC Science & Technology

    2016-04-15

    Mining a Text for Errors. . . . on Knowledge discovery in data mining , pages 624–628, 2005. [12] Michal Kosinski, David Stillwell, and Thore Graepel...AFRL-AFOSR-UK-TR-2016-0011 Individual Profiling using Text Analysis 140333 Mark Stevenson UNIVERSITY OF SHEFFIELD, DEPARTMENT OF PSYCHOLOGY Final...REPORT TYPE      Final 3.  DATES COVERED (From - To)      15 Sep 2014 to 14 Sep 2015 4.  TITLE AND SUBTITLE Individual Profiling using Text Analysis

  17. Generating Text from Functional Brain Images

    PubMed Central

    Pereira, Francisco; Detre, Greg; Botvinick, Matthew

    2011-01-01

    Recent work has shown that it is possible to take brain images acquired during viewing of a scene and reconstruct an approximation of the scene from those images. Here we show that it is also possible to generate text about the mental content reflected in brain images. We began with images collected as participants read names of concrete items (e.g., “Apartment’’) while also seeing line drawings of the item named. We built a model of the mental semantic representation of concrete concepts from text data and learned to map aspects of such representation to patterns of activation in the corresponding brain image. In order to validate this mapping, without accessing information about the items viewed for left-out individual brain images, we were able to generate from each one a collection of semantically pertinent words (e.g., “door,” “window” for “Apartment’’). Furthermore, we show that the ability to generate such words allows us to perform a classification task and thus validate our method quantitatively. PMID:21927602

  18. Text Mining in Biomedical Domain with Emphasis on Document Clustering

    PubMed Central

    2017-01-01

    Objectives With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. Methods This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Results Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Conclusions Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise. PMID:28875048

  19. Text Mining in Biomedical Domain with Emphasis on Document Clustering.

    PubMed

    Renganathan, Vinaitheerthan

    2017-07-01

    With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

  20. Why is Light Text Harder to Read Than Dark Text?

    NASA Technical Reports Server (NTRS)

    Scharff, Lauren V.; Ahumada, Albert J.

    2005-01-01

    Scharff and Ahumada (2002, 2003) measured text legibility for light text and dark text. For paragraph readability and letter identification, responses to light text were slower and less accurate for a given contrast. Was this polarity effect (1) an artifact of our apparatus, (2) a physiological difference in the separate pathways for positive and negative contrast or (3) the result of increased experience with dark text on light backgrounds? To rule out the apparatus-artifact hypothesis, all data were collected on one monitor. Its luminance was measured at all levels used, and the spatial effects of the monitor were reduced by pixel doubling and quadrupling (increasing the viewing distance to maintain constant angular size). Luminances of vertical and horizontal square-wave gratings were compared to assess display speed effects. They existed, even for 4-pixel-wide bars. Tests for polarity asymmetries in display speed were negative. Increased experience might develop full letter templates for dark text, while recognition of light letters is based on component features. Earlier, an observer ran all conditions at one polarity and then switched. If dark and light letters were intermixed, the observer might use component features on all trials and do worse on the dark letters, reducing the polarity effect. We varied polarity blocking (completely blocked, alternating smaller blocks, and intermixed blocks). Letter identification responses times showed polarity effects at all contrasts and display resolution levels. Observers were also more accurate with higher contrasts and more pixels per degree. Intermixed blocks increased the polarity effect by reducing performance on the light letters, but only if the randomized block occurred prior to the nonrandomized block. Perhaps observers tried to use poorly developed templates, or they did not work as hard on the more difficult items. The experience hypothesis and the physiological gain hypothesis remain viable explanations.

  1. Stemming Malay Text and Its Application in Automatic Text Categorization

    NASA Astrophysics Data System (ADS)

    Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

    In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.

  2. Text messaging during simulated driving.

    PubMed

    Drews, Frank A; Yazdani, Hina; Godfrey, Celeste N; Cooper, Joel M; Strayer, David L

    2009-10-01

    This research aims to identify the impact of text messaging on simulated driving performance. In the past decade, a number of on-road, epidemiological, and simulator-based studies reported the negative impact of talking on a cell phone on driving behavior. However, the impact of text messaging on simulated driving performance is still not fully understood. Forty participants engaged in both a single task (driving) and a dual task (driving and text messaging) in a high-fidelity driving simulator. Analysis of driving performance revealed that participants in the dual-task condition responded more slowly to the onset of braking lights and showed impairments in forward and lateral control compared with a driving-only condition. Moreover, text-messaging drivers were involved in more crashes than drivers not engaged in text messaging. Text messaging while driving has a negative impact on simulated driving performance. This negative impact appears to exceed the impact of conversing on a cell phone while driving. The results increase our understanding of driver distraction and have potential implications for public safety and device development.

  3. Nursing Classification Systems

    PubMed Central

    Henry, Suzanne Bakken; Mead, Charles N.

    1997-01-01

    Abstract Our premise is that from the perspective of maximum flexibility of data usage by computer-based record (CPR) systems, existing nursing classification systems are necessary, but not sufficient, for representing important aspects of “what nurses do.” In particular, we have focused our attention on those classification systems that represent nurses' clinical activities through the abstraction of activities into categories of nursing interventions. In this theoretical paper, we argue that taxonomic, combinatorial vocabularies capable of coding atomic-level nursing activities are required to effectively capture in a reproducible and reversible manner the clinical decisions and actions of nurses, and that, without such vocabularies and associated grammars, potentially important clinical process data is lost during the encoding process. Existing nursing intervention classification systems do not fulfill these criteria. As background to our argument, we first present an overview of the content, methods, and evaluation criteria used in previous studies whose focus has been to evaluate the effectiveness of existing coding and classification systems. Next, using the Ingenerf typology of taxonomic vocabularies, we categorize the formal type and structure of three existing nursing intervention classification systems—Nursing Interventions Classification, Omaha System, and Home Health Care Classification. Third, we use records from home care patients to show examples of lossy data transformation, the loss of potentially significant atomic data, resulting from encoding using each of the three systems. Last, we provide an example of the application of a formal representation methodology (conceptual graphs) which we believe could be used as a model to build the required combinatorial, taxonomic vocabulary for representing nursing interventions. PMID:9147341

  4. Concepts of Classification and Taxonomy Phylogenetic Classification

    NASA Astrophysics Data System (ADS)

    Fraix-Burnet, D.

    2016-05-01

    Phylogenetic approaches to classification have been heavily developed in biology by bioinformaticians. But these techniques have applications in other fields, in particular in linguistics. Their main characteristics is to search for relationships between the objects or species in study, instead of grouping them by similarity. They are thus rather well suited for any kind of evolutionary objects. For nearly fifteen years, astrocladistics has explored the use of Maximum Parsimony (or cladistics) for astronomical objects like galaxies or globular clusters. In this lesson we will learn how it works.

  5. Ageism in undergraduate psychology texts.

    PubMed

    Whitbourne, S K; Hulicka, I M

    1990-10-01

    A sample of 139 texts written over the past 40 years was analyzed for evidence of ageism (i.e., lack of attention to the psychology of later life and stereotyping of older adults). More recent texts cover the topic more comprehensively than in the past, but this coverage is limited in depth. Although textbook authors appear to be trying to communicate a positive message about aging and older persons, their efforts are compromised by ambivalence in the form of contradictory statements about the nature of the aging process. There is an unfortunate condensation of sources in recent texts, which draw heavily from a small cluster of authorities. Implications of these findings for the larger textbook enterprise are discussed.

  6. Finding text in color images

    NASA Astrophysics Data System (ADS)

    Zhou, Jiangying; Lopresti, Daniel P.; Tasdizen, Tolga

    1998-04-01

    In this paper, we consider the problem of locating and extracting text from WWW images. A previous algorithm based on color clustering and connected components analysis works well as long as the color of each character is relatively uniform and the typography is fairly simple. It breaks down quickly, however, when these assumptions are violated. In this paper, we describe more robust techniques for dealing with this challenging problem. We present an improved color clustering algorithm that measures similarity based on both RGB and spatial proximity. Layout analysis is also incorporated to handle more complex typography. THese changes significantly enhance the performance of our text detection procedure.

  7. Health Instruction Packages: Drug Dosage, Classification, and Mixing.

    ERIC Educational Resources Information Center

    Bracchi, Dorothy P.; And Others

    Text, illustrations, and exercises are utilized in a set of seven learning modules to instruct nursing students in the fundamentals of drug classification, dosage, and mixing. The first module, by Dorothy Bracchi, teaches the student to identify six classifications of medication often administered to orthopedic patients: anti-neurospasmolytic…

  8. Commission 45: Spectral Classification

    NASA Astrophysics Data System (ADS)

    Giridhar, Sunetra; Gray, Richard O.; Corbally, Christopher J.; Bailer-Jones, Coryn A. L.; Eyer, Laurent; Irwin, Michael J.; Kirkpatrick, J. Davy; Majewski, Steven; Minniti, Dante; Nordström, Birgitta

    This report gives an update of developments (since the last General Assembly at Prague) in the areas that are of relevance to the commission. In addition to numerous papers, a new monograph entitled Stellar Spectral Classification with Richard Gray and Chris Corbally as leading authors will be published by Princeton University Press as part of their Princeton Series in Astrophysics in April 2009. This book is an up-to-date and encyclopedic review of stellar spectral classification across the H-R diagram, including the traditional MK system in the blue-violet, recent extensions into the ultraviolet and infrared, the newly defined L-type and T-type spectral classes, as well as spectral classification of carbon stars, S-type stars, white dwarfs, novae, supernovae and Wolf-Rayet stars.

  9. Interpreting Texts in Classroom Contexts.

    ERIC Educational Resources Information Center

    Unrau, Norman J.; Ruddell, Robert B.

    1995-01-01

    Describes a series of instructional episodes in an 11th-grade classroom discussing J.D. Salinger's short story "The Laughing Man." Presents and discusses the "Text and Context" model for the negotiation of interpretations in classroom contexts. Offers suggestions for developing interpretive classroom communities. (SR)

  10. Text Processing Differences among Readers.

    ERIC Educational Resources Information Center

    Garner, Ruth

    Explanations for differences in reading proficiency should be constructed around an atlas of reading-related individual differences in cognition. Such an atlas should include well-documented "bottom-up", text-driven reading strategies and less thoroughly investigated "top-down", schema-driven reading strategies. Research…

  11. Controversial Texts and Public Education.

    ERIC Educational Resources Information Center

    Smith, David L.

    Because public schools are designed to serve the widest range of interests and are committed to the ideal of democracy, teachers cannot afford to avoid teaching works or presenting ideas that offend some members of communities. Students need to learn the value of controversy and of the challenges posed by a text. Richard Wright's "Native Son" and…

  12. FTP: Full-Text Publishing?

    ERIC Educational Resources Information Center

    Jul, Erik

    1992-01-01

    Describes the use of file transfer protocol (FTP) on the INTERNET computer network and considers its use as an electronic publishing system. The differing electronic formats of text files are discussed; the preparation and access of documents are described; and problems are addressed, including a lack of consistency. (LRW)

  13. Transformation and Text: Journal Pedagogy.

    ERIC Educational Resources Information Center

    Ellis, Carol

    One intention that an instructor had for her new course called "Writing and Healing: Women's Journal Writing" was to make apparent the power of self-written text to transform the writer. She asked her students--women studying women writing their lives and women writing their own lives--to write three pages a day and to focus on change.…

  14. Policy Discourses in School Texts

    ERIC Educational Resources Information Center

    Maguire, Meg; Hoskins, Kate; Ball, Stephen; Braun, Annette

    2011-01-01

    In this paper, we focus on some of the ways in which schools are both productive of and constituted by sets of "discursive practices, events and texts" that contribute to the process of policy enactment. As Colebatch (2002: 2) says, "policy involves the creation of order--that is, shared understandings about how the various participants will act…

  15. Solar Concepts: A Background Text.

    ERIC Educational Resources Information Center

    Gorham, Jonathan W.

    This text is designed to provide teachers, students, and the general public with an overview of key solar energy concepts. Various energy terms are defined and explained. Basic thermodynamic laws are discussed. Alternative energy production is described in the context of the present energy situation. Described are the principal contemporary solar…

  16. Casemix classification systems.

    PubMed

    Fetter, R B

    1999-01-01

    The idea of using casemix classification to manage hospital services is not new, but has been limited by available technology. It was not until after the introduction of Medicare in the United States in 1965 that serious attempts were made to measure hospital production in order to contain spiralling costs. This resulted in a system of casemix classification known as diagnosis related groups (DRGs). This paper traces the development of DRGs and their evolution from the initial version to the All Patient Refined DRGs developed in 1991.

  17. A New Method for Measuring Text Similarity in Learning Management Systems Using WordNet

    ERIC Educational Resources Information Center

    Alkhatib, Bassel; Alnahhas, Ammar; Albadawi, Firas

    2014-01-01

    As text sources are getting broader, measuring text similarity is becoming more compelling. Automatic text classification, search engines and auto answering systems are samples of applications that rely on text similarity. Learning management systems (LMS) are becoming more important since electronic media is getting more publicly available. As…

  18. CNN for breaking text-based CAPTCHA with noise

    NASA Astrophysics Data System (ADS)

    Liu, Kaixuan; Zhang, Rong; Qing, Ke

    2017-07-01

    A CAPTCHA ("Completely Automated Public Turing test to tell Computers and Human Apart") system is a program that most humans can pass but current computer programs could hardly pass. As the most common type of CAPTCHAs , text-based CAPTCHA has been widely used in different websites to defense network bots. In order to breaking textbased CAPTCHA, in this paper, two trained CNN models are connected for the segmentation and classification of CAPTCHA images. Then base on these two models, we apply sliding window segmentation and voting classification methods realize an end-to-end CAPTCHA breaking system with high success rate. The experiment results show that our method is robust and effective in breaking text-based CAPTCHA with noise.

  19. Linguistically informed digital fingerprints for text

    NASA Astrophysics Data System (ADS)

    Uzuner, Özlem

    2006-02-01

    Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.

  20. [On two antique medical texts].

    PubMed

    Rosa, Maria Carlota

    2005-01-01

    The two texts presented here--Regimento proueytoso contra ha pestenença [literally, "useful regime against pestilence"] and Modus curandi cum balsamo ["curing method using balm"]--represent the extent of Portugal's known medical library until circa 1530, produced in gothic letters by foreign printers: Germany's Valentim Fernandes, perhaps the era's most important printer, who worked in Lisbon between 1495 and 1518, and Germdo Galharde, a Frenchman who practiced his trade in Lisbon and Coimbra between 1519 and 1560. Modus curandi, which came to light in 1974 thanks to bibliophile José de Pina Martins, is anonymous. Johannes Jacobi is believed to be the author of Regimento proueytoso, which was translated into Latin (Regimen contra pestilentiam), French, and English. Both texts are presented here in facsimile and in modern Portuguese, while the first has also been reproduced in archaic Portuguese using modern typographical characters. This philological venture into sixteenth-century medicine is supplemented by a scholarly glossary which serves as a valuable tool in interpreting not only Regimento proueytoso but also other texts from the era. Two articles place these documents in historical perspective.

  1. Shark Teeth Classification

    ERIC Educational Resources Information Center

    Brown, Tom; Creel, Sally; Lee, Velda

    2009-01-01

    On a recent autumn afternoon at Harmony Leland Elementary in Mableton, Georgia, students in a fifth-grade science class investigated the essential process of classification--the act of putting things into groups according to some common characteristics or attributes. While they may have honed these skills earlier in the week by grouping their own…

  2. The classification of phocomelia.

    PubMed

    Tytherleigh-Strong, G; Hooper, G

    2003-06-01

    We studied 24 patients with 44 phocomelic upper limbs. Only 11 limbs could be grouped in the classification system of Frantz and O' Rahilly. The non-classifiable limbs were further studied and their characteristics identified. It is confirmed that phocomelia is not an intercalary defect.

  3. Homographs: Classification and Identification.

    ERIC Educational Resources Information Center

    Pacak, M.; Henisz, Bozena

    1968-01-01

    Homographs are defined in this study as sets of word forms which are spelled alike but which have entirely or partially different meanings and which may have different syntactic functions (that is, they belong to more than one form class or to more than one subclass of a form class). This report deals with the classification and identification of…

  4. Classification of cryocoolers

    NASA Technical Reports Server (NTRS)

    Walker, G.

    1985-01-01

    A great diversity of methods and mechanisms were devised to effect cryogenic refrigeration. The basic parameters and considerations affecting the selection of a particular system are reviewed. A classification scheme for mechanical cryocoolers is presented. An important distinguishing feature is the incorporation or not of a regenerative heat exchanger, of valves, and of the method for achieving a pressure variation.

  5. Equivalent Diagnostic Classification Models

    ERIC Educational Resources Information Center

    Maris, Gunter; Bechger, Timo

    2009-01-01

    Rupp and Templin (2008) do a good job at describing the ever expanding landscape of Diagnostic Classification Models (DCM). In many ways, their review article clearly points to some of the questions that need to be answered before DCMs can become part of the psychometric practitioners toolkit. Apart from the issues mentioned in this article that…

  6. An automated cirrus classification

    NASA Astrophysics Data System (ADS)

    Gryspeerdt, Edward; Quaas, Johannes; Sourdeval, Odran; Goren, Tom

    2017-04-01

    Cirrus clouds play an important role in determining the radiation budget of the earth, but our understanding of the lifecycle and controls on cirrus clouds remains incomplete. Cirrus clouds can have very different properties and development depending on their environment, particularly during their formation. However, the relevant factors often cannot be distinguished using commonly retrieved satellite data products (such as cloud optical depth). In particular, the initial cloud phase has been identified as an important factor in cloud development, but although back-trajectory based methods can provide information on the initial cloud phase, they are computationally expensive and depend on the cloud parametrisations used in re-analysis products. In this work, a classification system (Identification and Classification of Cirrus, IC-CIR) is introduced. Using re-analysis and satellite data, cirrus clouds are separated in four main types: frontal, convective, orographic and in-situ. The properties of these classes show that this classification is able to provide useful information on the properties and initial phase of cirrus clouds, information that could not be provided by instantaneous satellite retrieved cloud properties alone. This classification is designed to be easily implemented in global climate models, helping to improve future comparisons between observations and models and reducing the uncertainty in cirrus clouds properties, leading to improved cloud parametrisations.

  7. Ecosystem classification, Chapter 2

    Treesearch

    M.J. Robin-Abbott; L.H. Pardo

    2011-01-01

    The ecosystem classification in this report is based on the ecoregions developed through the Commission for Environmental Cooperation (CEC) for North America (CEC 1997). Only ecosystems that occur in the United States are included. CEC ecoregions are described, with slight modifications, below (CEC 1997) and shown in Figures 2.1 and 2.2. We chose this ecosystem...

  8. Principles for ecological classification

    Treesearch

    Dennis H. Grossman; Patrick Bourgeron; Wolf-Dieter N. Busch; David T. Cleland; William Platts; G. Ray; C. Robins; Gary Roloff

    1999-01-01

    The principal purpose of any classification is to relate common properties among different entities to facilitate understanding of evolutionary and adaptive processes. In the context of this volume, it is to facilitate ecosystem stewardship, i.e., to help support ecosystem conservation and management objectives.

  9. Improving Student Question Classification

    ERIC Educational Resources Information Center

    Heiner, Cecily; Zachary, Joseph L.

    2009-01-01

    Students in introductory programming classes often articulate their questions and information needs incompletely. Consequently, the automatic classification of student questions to provide automated tutorial responses is a challenging problem. This paper analyzes 411 questions from an introductory Java programming course by reducing the natural…

  10. Soil Classification and Treatment.

    ERIC Educational Resources Information Center

    Clemson Univ., SC. Vocational Education Media Center.

    This instructional unit was designed to enable students, primarily at the secondary level, to (1) classify soils according to current capability classifications of the Soil Conservation Service, (2) select treatments needed for a given soil class according to current recommendations provided by the Soil Conservation Service, and (3) interpret a…

  11. Text mining by Tsallis entropy

    NASA Astrophysics Data System (ADS)

    Jamaati, Maryam; Mehri, Ali

    2018-01-01

    Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.

  12. Identifying Issue Frames in Text

    PubMed Central

    Sagi, Eyal; Diermeier, Daniel; Kaufmann, Stefan

    2013-01-01

    Framing, the effect of context on cognitive processes, is a prominent topic of research in psychology and public opinion research. Research on framing has traditionally relied on controlled experiments and manually annotated document collections. In this paper we present a method that allows for quantifying the relative strengths of competing linguistic frames based on corpus analysis. This method requires little human intervention and can therefore be efficiently applied to large bodies of text. We demonstrate its effectiveness by tracking changes in the framing of terror over time and comparing the framing of abortion by Democrats and Republicans in the U.S. PMID:23874909

  13. Towards Automatic Classification of Wikipedia Content

    NASA Astrophysics Data System (ADS)

    Szymański, Julian

    Wikipedia - the Free Encyclopedia encounters the problem of proper classification of new articles everyday. The process of assignment of articles to categories is performed manually and it is a time consuming task. It requires knowledge about Wikipedia structure, which is beyond typical editor competence, which leads to human-caused mistakes - omitting or wrong assignments of articles to categories. The article presents application of SVM classifier for automatic classification of documents from The Free Encyclopedia. The classifier application has been tested while using two text representations: inter-documents connections (hyperlinks) and word content. The results of the performed experiments evaluated on hand crafted data show that the Wikipedia classification process can be partially automated. The proposed approach can be used for building a decision support system which suggests editors the best categories that fit new content entered to Wikipedia.

  14. Efficient Fingercode Classification

    NASA Astrophysics Data System (ADS)

    Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang

    In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.

  15. Multiple Sparse Representations Classification

    PubMed Central

    Plenge, Esben; Klein, Stefan S.; Niessen, Wiro J.; Meijering, Erik

    2015-01-01

    Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surrounding it. Using these patches, a dictionary is trained for each class in a supervised fashion. Commonly, redundant/overcomplete dictionaries are trained and image patches are sparsely represented by a linear combination of only a few of the dictionary elements. Given a set of trained dictionaries, a new patch is sparse coded using each of them, and subsequently assigned to the class whose dictionary yields the minimum residual energy. We propose a generalization of this scheme. The method, which we call multiple sparse representations classification (mSRC), is based on the observation that an overcomplete, class specific dictionary is capable of generating multiple accurate and independent estimates of a patch belonging to the class. So instead of finding a single sparse representation of a patch for each dictionary, we find multiple, and the corresponding residual energies provides an enhanced statistic which is used to improve classification. We demonstrate the efficacy of mSRC for three example applications: pixelwise classification of texture images, lumen segmentation in carotid artery magnetic resonance imaging (MRI), and bifurcation point detection in carotid artery MRI. We compare our method with conventional SRC, K-nearest neighbor, and support vector machine classifiers. The results show that mSRC outperforms SRC and the other reference methods. In addition, we present an extensive evaluation of the effect of the main mSRC parameters: patch size, dictionary size, and

  16. 78 FR 68983 - Cotton Futures Classification: Optional Classification Procedure

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-11-18

    ...-AD33 Cotton Futures Classification: Optional Classification Procedure AGENCY: Agricultural Marketing... regulations to allow for the addition of an optional cotton futures classification procedure--identified and known as ``registration'' by the U.S. cotton industry and the Intercontinental Exchange (ICE). In...

  17. LDA boost classification: boosting by topics

    NASA Astrophysics Data System (ADS)

    Lei, La; Qiao, Guo; Qimin, Cao; Qitao, Li

    2012-12-01

    AdaBoost is an efficacious classification algorithm especially in text categorization (TC) tasks. The methodology of setting up a classifier committee and voting on the documents for classification can achieve high categorization precision. However, traditional Vector Space Model can easily lead to the curse of dimensionality and feature sparsity problems; so it affects classification performance seriously. This article proposed a novel classification algorithm called LDABoost based on boosting ideology which uses Latent Dirichlet Allocation (LDA) to modeling the feature space. Instead of using words or phrase, LDABoost use latent topics as the features. In this way, the feature dimension is significantly reduced. Improved Naïve Bayes (NB) is designed as the weaker classifier which keeps the efficiency advantage of classic NB algorithm and has higher precision. Moreover, a two-stage iterative weighted method called Cute Integration in this article is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more rational way. Mutual Information is used as metrics of weights allocation. The voting information and the categorization decision made by basis classifiers are fully utilized for generating the strong classifier. Experimental results reveals LDABoost making categorization in a low-dimensional space, it has higher accuracy than traditional AdaBoost algorithms and many other classic classification algorithms. Moreover, its runtime consumption is lower than different versions of AdaBoost, TC algorithms based on support vector machine and Neural Networks.

  18. 32 CFR 2700.22 - Classification guides.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... SECURITY INFORMATION REGULATIONS Derivative Classification § 2700.22 Classification guides. OMSN shall... direct derivative classification, shall identify the information to be protected in specific and uniform...

  19. Experiments on Supervised Learning Algorithms for Text Categorization

    NASA Technical Reports Server (NTRS)

    Namburu, Setu Madhavi; Tu, Haiying; Luo, Jianhui; Pattipati, Krishna R.

    2005-01-01

    Modern information society is facing the challenge of handling massive volume of online documents, news, intelligence reports, and so on. How to use the information accurately and in a timely manner becomes a major concern in many areas. While the general information may also include images and voice, we focus on the categorization of text data in this paper. We provide a brief overview of the information processing flow for text categorization, and discuss two supervised learning algorithms, viz., support vector machines (SVM) and partial least squares (PLS), which have been successfully applied in other domains, e.g., fault diagnosis [9]. While SVM has been well explored for binary classification and was reported as an efficient algorithm for text categorization, PLS has not yet been applied to text categorization. Our experiments are conducted on three data sets: Reuter's- 21578 dataset about corporate mergers and data acquisitions (ACQ), WebKB and the 20-Newsgroups. Results show that the performance of PLS is comparable to SVM in text categorization. A major drawback of SVM for multi-class categorization is that it requires a voting scheme based on the results of pair-wise classification. PLS does not have this drawback and could be a better candidate for multi-class text categorization.

  20. Writing: the Quarterly as text.

    PubMed

    Locke, Lawrence F

    2005-06-01

    The purpose of this essay is to examine how writing has shaped the nature of the Quarterly over 75 years. Here I explore how stylistic elements have changed over time, how form has interacted with function and content, and how well the resulting text has served the several communities within physical education. I make the following assertions. First, the writing style that has become the model for research reports is needlessly dense and daunting for readers. Second, the desire to maintain a journal that serves both as an interdisciplinary resource for a broad audience of physical educators and as an outlet for reports directed to limited audiences of technical specialists has prevented full performance of either function. Those concerns notwithstanding, I find good cause for celebration--as well as for guarded optimism about the future.

  1. Text Mining for Protein Docking

    PubMed Central

    Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.

    2015-01-01

    The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set

  2. Building a common pipeline for rule-based document classification.

    PubMed

    Patterson, Olga V; Ginter, Thomas; DuVall, Scott L

    2013-01-01

    Instance-based classification of clinical text is a widely used natural language processing task employed as a step for patient classification, document retrieval, or information extraction. Rule-based approaches rely on concept identification and context analysis in order to determine the appropriate class. We propose a five-step process that enables even small research teams to develop simple but powerful rule-based NLP systems by taking advantage of a common UIMA AS based pipeline for classification. Our proposed methodology coupled with the general-purpose solution provides researchers with access to the data locked in clinical text in cases of limited human resources and compact timelines.

  3. Machine printed text and handwriting identification in noisy document images.

    PubMed

    Zheng, Yefeng; Li, Huiping; Doermann, David

    2004-03-01

    In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections.

  4. Audio Steganography with Embedded Text

    NASA Astrophysics Data System (ADS)

    Teck Jian, Chua; Chai Wen, Chuah; Rahman, Nurul Hidayah Binti Ab.; Hamid, Isredza Rahmi Binti A.

    2017-08-01

    Audio steganography is about hiding the secret message into the audio. It is a technique uses to secure the transmission of secret information or hide their existence. It also may provide confidentiality to secret message if the message is encrypted. To date most of the steganography software such as Mp3Stego and DeepSound use block cipher such as Advanced Encryption Standard or Data Encryption Standard to encrypt the secret message. It is a good practice for security. However, the encrypted message may become too long to embed in audio and cause distortion of cover audio if the secret message is too long. Hence, there is a need to encrypt the message with stream cipher before embedding the message into the audio. This is because stream cipher provides bit by bit encryption meanwhile block cipher provide a fixed length of bits encryption which result a longer output compare to stream cipher. Hence, an audio steganography with embedding text with Rivest Cipher 4 encryption cipher is design, develop and test in this project.

  5. Webcam classification using simple features

    NASA Astrophysics Data System (ADS)

    Pramoun, Thitiporn; Choe, Jeehyun; Li, He; Chen, Qingshuang; Amornraksa, Thumrongrat; Lu, Yung-Hsiang; Delp, Edward J.

    2015-03-01

    Thousands of sensors are connected to the Internet and many of these sensors are cameras. The "Internet of Things" will contain many "things" that are image sensors. This vast network of distributed cameras (i.e. web cams) will continue to exponentially grow. In this paper we examine simple methods to classify an image from a web cam as "indoor/outdoor" and having "people/no people" based on simple features. We use four types of image features to classify an image as indoor/outdoor: color, edge, line, and text. To classify an image as having people/no people we use HOG and texture features. The features are weighted based on their significance and combined. A support vector machine is used for classification. Our system with feature weighting and feature combination yields 95.5% accuracy.

  6. Tree Classification Software

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1993-01-01

    This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.

  7. Stretchy binary classification.

    PubMed

    Toh, Kar-Ann; Lin, Zhiping; Sun, Lei; Li, Zhengguo

    2018-01-01

    In this article, we introduce an analytic formulation for compressive binary classification. The formulation seeks to solve the least ℓ p -norm of the parameter vector subject to a classification error constraint. An analytic and stretchable estimation is conjectured where the estimation can be viewed as an extension of the pseudoinverse with left and right constructions. Our variance analysis indicates that the estimation based on the left pseudoinverse is unbiased and the estimation based on the right pseudoinverse is biased. Sparseness can be obtained for the biased estimation under certain mild conditions. The proposed estimation is investigated numerically using both synthetic and real-world data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Automatic document classification of biological literature

    PubMed Central

    Chen, David; Müller, Hans-Michael; Sternberg, Paul W

    2006-01-01

    Background Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusion We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. PMID:16893465

  9. Seismic event classification system

    DOEpatents

    Dowla, F.U.; Jarpe, S.P.; Maurer, W.

    1994-12-13

    In the computer interpretation of seismic data, the critical first step is to identify the general class of an unknown event. For example, the classification might be: teleseismic, regional, local, vehicular, or noise. Self-organizing neural networks (SONNs) can be used for classifying such events. Both Kohonen and Adaptive Resonance Theory (ART) SONNs are useful for this purpose. Given the detection of a seismic event and the corresponding signal, computation is made of: the time-frequency distribution, its binary representation, and finally a shift-invariant representation, which is the magnitude of the two-dimensional Fourier transform (2-D FFT) of the binary time-frequency distribution. This pre-processed input is fed into the SONNs. These neural networks are able to group events that look similar. The ART SONN has an advantage in classifying the event because the types of cluster groups do not need to be pre-defined. The results from the SONNs together with an expert seismologist's classification are then used to derive event classification probabilities. 21 figures.

  10. An automated cirrus classification

    NASA Astrophysics Data System (ADS)

    Gryspeerdt, Edward; Quaas, Johannes; Goren, Tom; Klocke, Daniel; Brueck, Matthias

    2018-05-01

    Cirrus clouds play an important role in determining the radiation budget of the earth, but many of their properties remain uncertain, particularly their response to aerosol variations and to warming. Part of the reason for this uncertainty is the dependence of cirrus cloud properties on the cloud formation mechanism, which itself is strongly dependent on the local meteorological conditions. In this work, a classification system (Identification and Classification of Cirrus or IC-CIR) is introduced to identify cirrus clouds by the cloud formation mechanism. Using reanalysis and satellite data, cirrus clouds are separated into four main types: orographic, frontal, convective and synoptic. Through a comparison to convection-permitting model simulations and back-trajectory-based analysis, it is shown that these observation-based regimes can provide extra information on the cloud-scale updraughts and the frequency of occurrence of liquid-origin ice, with the convective regime having higher updraughts and a greater occurrence of liquid-origin ice compared to the synoptic regimes. Despite having different cloud formation mechanisms, the radiative properties of the regimes are not distinct, indicating that retrieved cloud properties alone are insufficient to completely describe them. This classification is designed to be easily implemented in GCMs, helping improve future model-observation comparisons and leading to improved parametrisations of cirrus cloud processes.

  11. Classification of mental disorders*

    PubMed Central

    Stengel, E.

    1959-01-01

    One of the fundamental difficulties in devising a classification of mental disorders is the lack of agreement among psychiatrists regarding the concepts upon which it should be based: diagnoses can rarely be verified objectively and the same or similar conditions are described under a confusing variety of names. This situation militates against the ready exchange of ideas and experiences and hampers progress. As a first step towards remedying this state of affairs, the author of the article below has undertaken a critical survey of existing classifications. He shows how some of the difficulties created by lack of knowledge regarding pathology and etiology may be overcome by the use of “operational definitions” and outlines the basic principles on which he believes a generally acceptable international classification might be constructed. If this can be done it should lead to a greater measure of agreement regarding the value of specific treatments for mental disorders and greatly facilitate a broad epidemiological approach to psychiatric research. PMID:13834299

  12. Neuromuscular disease classification system

    NASA Astrophysics Data System (ADS)

    Sáez, Aurora; Acha, Begoña; Montero-Sánchez, Adoración; Rivas, Eloy; Escudero, Luis M.; Serrano, Carmen

    2013-06-01

    Diagnosis of neuromuscular diseases is based on subjective visual assessment of biopsies from patients by the pathologist specialist. A system for objective analysis and classification of muscular dystrophies and neurogenic atrophies through muscle biopsy images of fluorescence microscopy is presented. The procedure starts with an accurate segmentation of the muscle fibers using mathematical morphology and a watershed transform. A feature extraction step is carried out in two parts: 24 features that pathologists take into account to diagnose the diseases and 58 structural features that the human eye cannot see, based on the assumption that the biopsy is considered as a graph, where the nodes are represented by each fiber, and two nodes are connected if two fibers are adjacent. A feature selection using sequential forward selection and sequential backward selection methods, a classification using a Fuzzy ARTMAP neural network, and a study of grading the severity are performed on these two sets of features. A database consisting of 91 images was used: 71 images for the training step and 20 as the test. A classification error of 0% was obtained. It is concluded that the addition of features undetectable by the human visual inspection improves the categorization of atrophic patterns.

  13. Seismic event classification system

    DOEpatents

    Dowla, Farid U.; Jarpe, Stephen P.; Maurer, William

    1994-01-01

    In the computer interpretation of seismic data, the critical first step is to identify the general class of an unknown event. For example, the classification might be: teleseismic, regional, local, vehicular, or noise. Self-organizing neural networks (SONNs) can be used for classifying such events. Both Kohonen and Adaptive Resonance Theory (ART) SONNs are useful for this purpose. Given the detection of a seismic event and the corresponding signal, computation is made of: the time-frequency distribution, its binary representation, and finally a shift-invariant representation, which is the magnitude of the two-dimensional Fourier transform (2-D FFT) of the binary time-frequency distribution. This pre-processed input is fed into the SONNs. These neural networks are able to group events that look similar. The ART SONN has an advantage in classifying the event because the types of cluster groups do not need to be pre-defined. The results from the SONNs together with an expert seismologist's classification are then used to derive event classification probabilities.

  14. Boosting bonsai trees for handwritten/printed text discrimination

    NASA Astrophysics Data System (ADS)

    Ricquebourg, Yann; Raymond, Christian; Poirriez, Baptiste; Lemaitre, Aurélie; Coüasnon, Bertrand

    2013-12-01

    Boosting over decision-stumps proved its efficiency in Natural Language Processing essentially with symbolic features, and its good properties (fast, few and not critical parameters, not sensitive to over-fitting) could be of great interest in the numeric world of pixel images. In this article we investigated the use of boosting over small decision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, we conducted experiments to compare it to usual SVM-based classification revealing convincing results with very close performance, but with faster predictions and behaving far less as a black-box. Those promising results tend to make use of this classifier in more complex recognition tasks like multiclass problems.

  15. Radiographic classifications in Perthes disease

    PubMed Central

    Huhnstock, Stefan; Svenningsen, Svein; Merckoll, Else; Catterall, Anthony; Terjesen, Terje; Wiig, Ola

    2017-01-01

    Background and purpose Different radiographic classifications have been proposed for prediction of outcome in Perthes disease. We assessed whether the modified lateral pillar classification would provide more reliable interobserver agreement and prognostic value compared with the original lateral pillar classification and the Catterall classification. Patients and methods 42 patients (38 boys) with Perthes disease were included in the interobserver study. Their mean age at diagnosis was 6.5 (3–11) years. 5 observers classified the radiographs in 2 separate sessions according to the Catterall classification, the original and the modified lateral pillar classifications. Interobserver agreement was analysed using weighted kappa statistics. We assessed the associations between the classifications and femoral head sphericity at 5-year follow-up in 37 non-operatively treated patients in a crosstable analysis (Gamma statistics for ordinal variables, γ). Results The original lateral pillar and Catterall classifications showed moderate interobserver agreement (kappa 0.49 and 0.43, respectively) while the modified lateral pillar classification had fair agreement (kappa 0.40). The original lateral pillar classification was strongly associated with the 5-year radiographic outcome, with a mean γ correlation coefficient of 0.75 (95% CI: 0.61–0.95) among the 5 observers. The modified lateral pillar and Catterall classifications showed moderate associations (mean γ correlation coefficient 0.55 [95% CI: 0.38–0.66] and 0.64 [95% CI: 0.57–0.72], respectively). Interpretation The Catterall classification and the original lateral pillar classification had sufficient interobserver agreement and association to late radiographic outcome to be suitable for clinical use. Adding the borderline B/C group did not increase the interobserver agreement or prognostic value of the original lateral pillar classification. PMID:28613966

  16. Sur la classification des adverbes en -ment (On the Classification of -ment Adverbs)

    ERIC Educational Resources Information Center

    Mordrup, Ole

    1976-01-01

    Presents a classification of French "-ment" adverbs based on syntactical criteria. The major divisions, consisting of "sentence adverbs" and "adverbs of manner," are further sub-divided into functional sub-groups. (Text is in French.) Available from: Akademisk Forlag, St. Kannikestraede 6-8, DK-1169 Copenhague K Danemark. (AM)

  17. 14 CFR 1203.412 - Classification guides.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 14 Aeronautics and Space 5 2013-01-01 2013-01-01 false Classification guides. 1203.412 Section... PROGRAM Guides for Original Classification § 1203.412 Classification guides. (a) General. A classification guide, based upon classification determinations made by appropriate program and classification...

  18. 14 CFR 1203.412 - Classification guides.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 14 Aeronautics and Space 5 2012-01-01 2012-01-01 false Classification guides. 1203.412 Section... PROGRAM Guides for Original Classification § 1203.412 Classification guides. (a) General. A classification guide, based upon classification determinations made by appropriate program and classification...

  19. 14 CFR 1203.412 - Classification guides.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 14 Aeronautics and Space 5 2011-01-01 2010-01-01 true Classification guides. 1203.412 Section 1203... Guides for Original Classification § 1203.412 Classification guides. (a) General. A classification guide, based upon classification determinations made by appropriate program and classification authorities...

  20. Carbohydrate terminology and classification.

    PubMed

    Cummings, J H; Stephen, A M

    2007-12-01

    Dietary carbohydrates are a group of chemically defined substances with a range of physical and physiological properties and health benefits. As with other macronutrients, the primary classification of dietary carbohydrate is based on chemistry, that is character of individual monomers, degree of polymerization (DP) and type of linkage (alpha or beta), as agreed at the Food and Agriculture Organization/World Health Organization Expert Consultation in 1997. This divides carbohydrates into three main groups, sugars (DP 1-2), oligosaccharides (short-chain carbohydrates) (DP 3-9) and polysaccharides (DP> or =10). Within this classification, a number of terms are used such as mono- and disaccharides, polyols, oligosaccharides, starch, modified starch, non-starch polysaccharides, total carbohydrate, sugars, etc. While effects of carbohydrates are ultimately related to their primary chemistry, they are modified by their physical properties. These include water solubility, hydration, gel formation, crystalline state, association with other molecules such as protein, lipid and divalent cations and aggregation into complex structures in cell walls and other specialized plant tissues. A classification based on chemistry is essential for a system of measurement, predication of properties and estimation of intakes, but does not allow a simple translation into nutritional effects since each class of carbohydrate has overlapping physiological properties and effects on health. This dichotomy has led to the use of a number of terms to describe carbohydrate in foods, for example intrinsic and extrinsic sugars, prebiotic, resistant starch, dietary fibre, available and unavailable carbohydrate, complex carbohydrate, glycaemic and whole grain. This paper reviews these terms and suggests that some are more useful than others. A clearer understanding of what is meant by any particular word used to describe carbohydrate is essential to progress in translating the growing knowledge of the

  1. Classification of hand eczema.

    PubMed

    Agner, T; Aalto-Korte, K; Andersen, K E; Foti, C; Gimenéz-Arnau, A; Goncalo, M; Goossens, A; Le Coz, C; Diepgen, T L

    2015-12-01

    Classification of hand eczema (HE) is mandatory in epidemiological and clinical studies, and also important in clinical work. The aim was to test a recently proposed classification system of HE in clinical practice in a prospective multicentre study. Patients were recruited from nine different tertiary referral centres. All patients underwent examination by specialists in dermatology and were checked using relevant allergy testing. Patients were classified into one of the six diagnostic subgroups of HE: allergic contact dermatitis, irritant contact dermatitis, atopic HE, protein contact dermatitis/contact urticaria, hyperkeratotic endogenous eczema and vesicular endogenous eczema, respectively. An additional diagnosis was given if symptoms indicated that factors additional to the main diagnosis were of importance for the disease. Four hundred and twenty-seven patients were included, 379 (89%) of the patients could be classified directly into one of the six diagnostic subgroups, with irritant and allergic contact dermatitis comprising 249 patients (58%). For 32 (7%) more than one of the six diagnostic subgroups had been formulated as a main diagnosis, and 16 (4%) could not be classified. 38% had one additional diagnosis and 26% had two or more additional diagnoses. Eczema on feet was found in 30% of the patients, statistically significantly more frequently associated with hyperkeratotic and vesicular endogenous eczema. We find that the classification system investigated in the present study was useful, being able to give an appropriate main diagnosis for 89% of HE patients, and for another 7% when using two main diagnoses. The fact that more than half of the patients had one or more additional diagnoses illustrates that HE is a multifactorial disease. © 2015 European Academy of Dermatology and Venereology.

  2. Interactive Classification Technology

    NASA Technical Reports Server (NTRS)

    deBessonet, Cary

    2000-01-01

    The investigators upgraded a knowledge representation language called SL (Symbolic Language) and an automated reasoning system called SMS (Symbolic Manipulation System) to enable the more effective use of the technologies in automated reasoning and interactive classification systems. The overall goals of the project were: 1) the enhancement of the representation language SL to accommodate a wider range of meaning; 2) the development of a default inference scheme to operate over SL notation as it is encoded; and 3) the development of an interpreter for SL that would handle representations of some basic cognitive acts and perspectives.

  3. Learning classification trees

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1991-01-01

    Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.

  4. Classification of Physical Activity

    PubMed Central

    Turksoy, Kamuran; Paulino, Thiago Marques Luz; Zaharieva, Dessi P.; Yavelberg, Loren; Jamnik, Veronica; Riddell, Michael C.; Cinar, Ali

    2015-01-01

    Physical activity has a wide range of effects on glucose concentrations in type 1 diabetes (T1D) depending on the type (ie, aerobic, anaerobic, mixed) and duration of activity performed. This variability in glucose responses to physical activity makes the development of artificial pancreas (AP) systems challenging. Automatic detection of exercise type and intensity, and its classification as aerobic or anaerobic would provide valuable information to AP control algorithms. This can be achieved by using a multivariable AP approach where biometric variables are measured and reported to the AP at high frequency. We developed a classification system that identifies, in real time, the exercise intensity and its reliance on aerobic or anaerobic metabolism and tested this approach using clinical data collected from 5 persons with T1D and 3 individuals without T1D in a controlled laboratory setting using a variety of common types of physical activity. The classifier had an average sensitivity of 98.7% for physiological data collected over a range of exercise modalities and intensities in these subjects. The classifier will be added as a new module to the integrated multivariable adaptive AP system to enable the detection of aerobic and anaerobic exercise for enhancing the accuracy of insulin infusion strategies during and after exercise. PMID:26443291

  5. Classification of Martian deltas

    NASA Technical Reports Server (NTRS)

    Dehon, R. A.

    1993-01-01

    Water-borne sediments in streams are deposited, upon eventual cessation of flow, either as deltas or as alluvial fans or plains. Deltas and alluvial fans share a common characteristic; both may be described as deposition Al plains at the mouth of a river or stream. A delta is formed where a stream or river deposits its sedimentary load into a standing body of water such as an ocean or lake. An alluvial fan is produced where a stream loses capacity by a greatly decreased gradient. A delta has subaerial and subaqueous components, but an alluvial fan is entirely subaerial. In terrestrial conditions, deltas and alluvial fans are reasonably distinct landforms. The juxtaposition of concomitant features composition and internal structure are sufficiently explicit as to avoid any confusion regarding their proper identification on Mars, the recognition of deltas and their distinction from alluvial fans is made difficult by low resolution imaging. Further, although it may be demonstrated that standing bodies of water existed on the surface of Mars, many of these bodies may have existed for extremely short periods of time (a few days to months); hence, distinctive shoreline features were not developed. Thus, in an attempt to derive a Martian classification of deltas, the inclusion of wholly subaerial deposits may be unavoidable. A simple, broad, morphological classification of Martian deltas, primarily on planimetric shape, includes digitate deltas, fan-shaped deltas, and re-entrant deltas. A fourth, somewhat problematical class includes featureless plains at the end of many valley systems.

  6. "Interactive Classification Technology"

    NASA Technical Reports Server (NTRS)

    deBessonet, Cary

    1999-01-01

    The investigators are upgrading a knowledge representation language called SL (Symbolic Language) and an automated reasoning system called SMS (Symbolic Manipulation System) to enable the technologies to be used in automated reasoning and interactive classification systems. The overall goals of the project are: a) the enhancement of the representation language SL to accommodate multiple perspectives and a wider range of meaning; b) the development of a sufficient set of operators to enable the interpreter of SL to handle representations of basic cognitive acts; and c) the development of a default inference scheme to operate over SL notation as it is encoded. As to particular goals the first-year work plan focused on inferencing and.representation issues, including: 1) the development of higher level cognitive/ classification functions and conceptual models for use in inferencing and decision making; 2) the specification of a more detailed scheme of defaults and the enrichment of SL notation to accommodate the scheme; and 3) the adoption of additional perspectives for inferencing.

  7. Supply chain planning classification

    NASA Astrophysics Data System (ADS)

    Hvolby, Hans-Henrik; Trienekens, Jacques; Bonde, Hans

    2001-10-01

    Industry experience a need to shift in focus from internal production planning towards planning in the supply network. In this respect customer oriented thinking becomes almost a common good amongst companies in the supply network. An increase in the use of information technology is needed to enable companies to better tune their production planning with customers and suppliers. Information technology opportunities and supply chain planning systems facilitate companies to monitor and control their supplier network. In spite if these developments, most links in today's supply chains make individual plans, because the real demand information is not available throughout the chain. The current systems and processes of the supply chains are not designed to meet the requirements now placed upon them. For long term relationships with suppliers and customers, an integrated decision-making process is needed in order to obtain a satisfactory result for all parties. Especially when customized production and short lead-time is in focus. An effective value chain makes inventory available and visible among the value chain members, minimizes response time and optimizes total inventory value held throughout the chain. In this paper a supply chain planning classification grid is presented based current manufacturing classifications and supply chain planning initiatives.

  8. Key-phrase based classification of public health web pages.

    PubMed

    Dolamic, Ljiljana; Boyer, Célia

    2013-01-01

    This paper describes and evaluates the public health web pages classification model based on key phrase extraction and matching. Easily extendible both in terms of new classes as well as the new language this method proves to be a good solution for text classification faced with the total lack of training data. To evaluate the proposed solution we have used a small collection of public health related web pages created by a double blind manual classification. Our experiments have shown that by choosing the adequate threshold value the desired value for either precision or recall can be achieved.

  9. Application of text mining for customer evaluations in commercial banking

    NASA Astrophysics Data System (ADS)

    Tan, Jing; Du, Xiaojiang; Hao, Pengpeng; Wang, Yanbo J.

    2015-07-01

    Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by using Chinese Word Segmentation, C5.0 and Apriori, and a set of experiments were run based on a collection of real textual data that includes 823 customer evaluations taken from a Chinese commercial bank. Results, consequent solutions, some advice for the commercial bank are given in this paper.

  10. Automated compound classification using a chemical ontology.

    PubMed

    Bobach, Claudia; Böhme, Timo; Laube, Ulf; Püschel, Anett; Weber, Lutz

    2012-12-29

    Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate chemistry expert knowledge into a

  11. Automated compound classification using a chemical ontology

    PubMed Central

    2012-01-01

    Background Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. Results In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. Conclusions A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate

  12. 78 FR 54970 - Cotton Futures Classification: Optional Classification Procedure

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-09-09

    ... Service 7 CFR Part 27 [AMS-CN-13-0043] RIN 0581-AD33 Cotton Futures Classification: Optional... optional cotton futures classification procedure--identified and known as ``registration'' by the U.S. cotton industry and the Intercontinental Exchange (ICE). In response to requests from the U.S. cotton...

  13. Dissimilarity representations in lung parenchyma classification

    NASA Astrophysics Data System (ADS)

    Sørensen, Lauge; de Bruijne, Marleen

    2009-02-01

    A good problem representation is important for a pattern recognition system to be successful. The traditional approach to statistical pattern recognition is feature representation. More specifically, objects are represented by a number of features in a feature vector space, and classifiers are built in this representation. This is also the general trend in lung parenchyma classification in computed tomography (CT) images, where the features often are measures on feature histograms. Instead, we propose to build normal density based classifiers in dissimilarity representations for lung parenchyma classification. This allows for the classifiers to work on dissimilarities between objects, which might be a more natural way of representing lung parenchyma. In this context, dissimilarity is defined between CT regions of interest (ROI)s. ROIs are represented by their CT attenuation histogram and ROI dissimilarity is defined as a histogram dissimilarity measure between the attenuation histograms. In this setting, the full histograms are utilized according to the chosen histogram dissimilarity measure. We apply this idea to classification of different emphysema patterns as well as normal, healthy tissue. Two dissimilarity representation approaches as well as different histogram dissimilarity measures are considered. The approaches are evaluated on a set of 168 CT ROIs using normal density based classifiers all showing good performance. Compared to using histogram dissimilarity directly as distance in a emph{k} nearest neighbor classifier, which achieves a classification accuracy of 92.9%, the best dissimilarity representation based classifier is significantly better with a classification accuracy of 97.0% (text{emph{p" border="0" class="imgtopleft"> = 0.046).

  14. A new classification of glaucomas

    PubMed Central

    Bordeianu, Constantin-Dan

    2014-01-01

    Purpose To suggest a new glaucoma classification that is pathogenic, etiologic, and clinical. Methods After discussing the logical pathway used in criteria selection, the paper presents the new classification and compares it with the classification currently in use, that is, the one issued by the European Glaucoma Society in 2008. Results The paper proves that the new classification is clear (being based on a coherent and consistently followed set of criteria), is comprehensive (framing all forms of glaucoma), and helps in understanding the sickness understanding (in that it uses a logical framing system). The great advantage is that it facilitates therapeutic decision making in that it offers direct therapeutic suggestions and avoids errors leading to disasters. Moreover, the scheme remains open to any new development. Conclusion The suggested classification is a pathogenic, etiologic, and clinical classification that fulfills the conditions of an ideal classification. The suggested classification is the first classification in which the main criterion is consistently used for the first 5 to 7 crossings until its differentiation capabilities are exhausted. Then, secondary criteria (etiologic and clinical) pick up the relay until each form finds its logical place in the scheme. In order to avoid unclear aspects, the genetic criterion is no longer used, being replaced by age, one of the clinical criteria. The suggested classification brings only benefits to all categories of ophthalmologists: the beginners will have a tool to better understand the sickness and to ease their decision making, whereas the experienced doctors will have their practice simplified. For all doctors, errors leading to therapeutic disasters will be less likely to happen. Finally, researchers will have the object of their work gathered in the group of glaucoma with unknown or uncertain pathogenesis, whereas the results of their work will easily find a logical place in the scheme, as the

  15. Classification-based reasoning

    NASA Technical Reports Server (NTRS)

    Gomez, Fernando; Segami, Carlos

    1991-01-01

    A representation formalism for N-ary relations, quantification, and definition of concepts is described. Three types of conditions are associated with the concepts: (1) necessary and sufficient properties, (2) contingent properties, and (3) necessary properties. Also explained is how complex chains of inferences can be accomplished by representing existentially quantified sentences, and concepts denoted by restrictive relative clauses as classification hierarchies. The representation structures that make possible the inferences are explained first, followed by the reasoning algorithms that draw the inferences from the knowledge structures. All the ideas explained have been implemented and are part of the information retrieval component of a program called Snowy. An appendix contains a brief session with the program.

  16. Bayesian classification theory

    NASA Technical Reports Server (NTRS)

    Hanson, Robin; Stutz, John; Cheeseman, Peter

    1991-01-01

    The task of inferring a set of classes and class descriptions most likely to explain a given data set can be placed on a firm theoretical foundation using Bayesian statistics. Within this framework and using various mathematical and algorithmic approximations, the AutoClass system searches for the most probable classifications, automatically choosing the number of classes and complexity of class descriptions. A simpler version of AutoClass has been applied to many large real data sets, has discovered new independently-verified phenomena, and has been released as a robust software package. Recent extensions allow attributes to be selectively correlated within particular classes, and allow classes to inherit or share model parameters though a class hierarchy. We summarize the mathematical foundations of AutoClass.

  17. Automatic diabetic retinopathy classification

    NASA Astrophysics Data System (ADS)

    Bravo, María. A.; Arbeláez, Pablo A.

    2017-11-01

    Diabetic retinopathy (DR) is a disease in which the retina is damaged due to augmentation in the blood pressure of small vessels. DR is the major cause of blindness for diabetics. It has been shown that early diagnosis can play a major role in prevention of visual loss and blindness. This work proposes a computer based approach for the detection of DR in back-of-the-eye images based on the use of convolutional neural networks (CNNs). Our CNN uses deep architectures to classify Back-of-the-eye Retinal Photographs (BRP) in 5 stages of DR. Our method combines several preprocessing images of BRP to obtain an ACA score of 50.5%. Furthermore, we explore subproblems by training a larger CNN of our main classification task.

  18. [Biologics - nomenclature and classification].

    PubMed

    Eichbaum, Christine; Haefeli, Walter E

    2011-11-01

    Biological medicines are a heterogeneous group of drugs that are produced by living organisms using genetic or biological technology. Unlike chemically derived small molecules biologics are structurally complex making characterization and manufacturing difficult. Moreover, biological medicines show a great variety concerning their clinical use. To appropriately consider these particularities, there are other standards and guidelines for approval of similar derivatives of biologics, the so-called biosimilars or follow-on biologics. In contrast to a generic medicinal product containing a chemically identical active ingredient, a biosimilar is only expected to be similar to the innovator drug. Nowadays, monoclonal antibodies, fragments of antibodies, and fusion proteins manufactured by recombinant procedures play an important role. They have been used in many specialties for diagnostic and therapeutic purposes and are subject to continuous further development and improvement. Their nomenclature is based on a classification by the WHO which allows drawing conclusions for class of substance, origin, and pharmacological target.

  19. Classification of extraterrestrial civilizations

    NASA Astrophysics Data System (ADS)

    Tang, Tong B.; Chang, Grace

    1991-06-01

    A scheme of classification of extraterrestrial intelligence (ETI) communities based on the scope of energy accessible to the civilization in question is proposed as an alternative to the Kardeshev (1964) scheme that includes three types of civilization, as determined by their levels of energy expenditure. The proposed scheme includes six classes: (1) a civilization that runs essentially on energy exerted by individual beings or by domesticated lower life forms, (2) harnessing of natural sources on planetary surface with artificial constructions, like water wheels and wind sails, (3) energy from fossils and fissionable isotopes, mined beneath the planet surface, (4) exploitation of nuclear fusion on a large scale, whether on the planet, in space, or from primary solar energy, (5) extensive use of antimatter for energy storage, and (6) energy from spacetime, perhaps via the action of naked singularities.

  20. Measurement of prompt and nonprompt [Formula: see text] production in [Formula: see text] and [Formula: see text] collisions at [Formula: see text].

    PubMed

    Sirunyan, A M; Tumasyan, A; Adam, W; Asilar, E; Bergauer, T; Brandstetter, J; Brondolin, E; Dragicevic, M; Erö, J; Flechl, M; Friedl, M; Frühwirth, R; Ghete, V M; Hartl, C; Hörmann, N; Hrubec, J; Jeitler, M; König, A; Krätschmer, I; Liko, D; Matsushita, T; Mikulec, I; Rabady, D; Rad, N; Rahbaran, B; Rohringer, H; Schieck, J; Strauss, J; Waltenberger, W; Wulz, C-E; Dvornikov, O; Makarenko, V; Mossolov, V; Suarez Gonzalez, J; Zykunov, V; Shumeiko, N; Alderweireldt, S; De Wolf, E A; Janssen, X; Lauwers, J; Van De Klundert, M; Van Haevermaet, H; Van Mechelen, P; Van Remortel, N; Van Spilbeeck, A; Abu Zeid, S; Blekman, F; D'Hondt, J; Daci, N; De Bruyn, I; Deroover, K; Lowette, S; Moortgat, S; Moreels, L; Olbrechts, A; Python, Q; Skovpen, K; Tavernier, S; Van Doninck, W; Van Mulders, P; Van Parijs, I; Brun, H; Clerbaux, B; De Lentdecker, G; Delannoy, H; Fasanella, G; Favart, L; Goldouzian, R; Grebenyuk, A; Karapostoli, G; Lenzi, T; Léonard, A; Luetic, J; Maerschalk, T; Marinov, A; Randle-Conde, A; Seva, T; Vander Velde, C; Vanlaer, P; Vannerom, D; Yonamine, R; Zenoni, F; Zhang, F; Cimmino, A; Cornelis, T; Dobur, D; Fagot, A; Gul, M; Khvastunov, I; Poyraz, D; Salva, S; Schöfbeck, R; Tytgat, M; Van Driessche, W; Yazgan, E; Zaganidis, N; Bakhshiansohi, H; Beluffi, C; Bondu, O; Brochet, S; Bruno, G; Caudron, A; De Visscher, S; Delaere, C; Delcourt, M; Francois, B; Giammanco, A; Jafari, A; Komm, M; Krintiras, G; Lemaitre, V; Magitteri, A; Mertens, A; Musich, M; Piotrzkowski, K; Quertenmont, L; Selvaggi, M; Vidal Marono, M; Wertz, S; Beliy, N; Aldá Júnior, W L; Alves, F L; Alves, G A; Brito, L; Hensel, C; Moraes, A; Pol, M E; Rebello Teles, P; Belchior Batista Das Chagas, E; Carvalho, W; Chinellato, J; Custódio, A; Da Costa, E M; Da Silveira, G G; De Jesus Damiao, D; De Oliveira Martins, C; Fonseca De Souza, S; Huertas Guativa, L M; Malbouisson, H; Matos Figueiredo, D; Mora Herrera, C; Mundim, L; Nogima, H; Prado Da Silva, W L; Santoro, A; Sznajder, A; Tonelli Manganote, E J; Torres Da Silva De Araujo, F; Vilela Pereira, A; Ahuja, S; Bernardes, C A; Dogra, S; Fernandez Perez Tomei, T R; Gregores, E M; Mercadante, P G; Moon, C S; Novaes, S F; Padula, Sandra S; Romero Abad, D; Ruiz Vargas, J C; Aleksandrov, A; Hadjiiska, R; Iaydjiev, P; Rodozov, M; Stoykova, S; Sultanov, G; Vutova, M; Dimitrov, A; Glushkov, I; Litov, L; Pavlov, B; Petkov, P; Fang, W; Ahmad, M; Bian, J G; Chen, G M; Chen, H S; Chen, M; Chen, Y; Cheng, T; Jiang, C H; Leggat, D; Liu, Z; Romeo, F; Ruan, M; Shaheen, S M; Spiezia, A; Tao, J; Wang, C; Wang, Z; Zhang, H; Zhao, J; Ban, Y; Chen, G; Li, Q; Liu, S; Mao, Y; Qian, S J; Wang, D; Xu, Z; Avila, C; Cabrera, A; Chaparro Sierra, L F; Florez, C; Gomez, J P; González Hernández, C F; Ruiz Alvarez, J D; Sanabria, J C; Godinovic, N; Lelas, D; Puljak, I; Ribeiro Cipriano, P M; Sculac, T; Antunovic, Z; Kovac, M; Brigljevic, V; Ferencek, D; Kadija, K; Mesic, B; Susa, T; Attikis, A; Mavromanolakis, G; Mousa, J; Nicolaou, C; Ptochos, F; Razis, P A; Rykaczewski, H; Tsiakkouri, D; Finger, M; Finger, M; Carrera Jarrin, E; Assran, Y; Elkafrawy, T; Mahrous, A; Kadastik, M; Perrini, L; Raidal, M; Tiko, A; Veelken, C; Eerola, P; Pekkanen, J; Voutilainen, M; Härkönen, J; Järvinen, T; Karimäki, V; Kinnunen, R; Lampén, T; Lassila-Perini, K; Lehti, S; Lindén, T; Luukka, P; Tuominiemi, J; Tuovinen, E; Wendland, L; Talvitie, J; Tuuva, T; Besancon, M; Couderc, F; Dejardin, M; Denegri, D; Fabbro, B; Faure, J L; Favaro, C; Ferri, F; Ganjour, S; Ghosh, S; Givernaud, A; Gras, P; Hamel de Monchenault, G; Jarry, P; Kucher, I; Locci, E; Machet, M; Malcles, J; Rander, J; Rosowsky, A; Titov, M; Abdulsalam, A; Antropov, I; Arleo, F; Baffioni, S; Beaudette, F; Busson, P; Cadamuro, L; Chapon, E; Charlot, C; Davignon, O; Granier de Cassagnac, R; Jo, M; Lisniak, S; Miné, P; Nguyen, M; Ochando, C; Ortona, G; Paganini, P; Pigard, P; Regnard, S; Salerno, R; Sirois, Y; Strebler, T; Yilmaz, Y; Zabi, A; Zghiche, A; Agram, J-L; Andrea, J; Aubin, A; Bloch, D; Brom, J-M; Buttignol, M; Chabert, E C; Chanon, N; Collard, C; Conte, E; Coubez, X; Fontaine, J-C; Gelé, D; Goerlach, U; Le Bihan, A-C; Van Hove, P; Gadrat, S; Beauceron, S; Bernet, C; Boudoul, G; Carrillo Montoya, C A; Chierici, R; Contardo, D; Courbon, B; Depasse, P; El Mamouni, H; Fay, J; Gascon, S; Gouzevitch, M; Grenier, G; Ille, B; Lagarde, F; Laktineh, I B; Lethuillier, M; Mirabito, L; Pequegnot, A L; Perries, S; Popov, A; Sabes, D; Sordini, V; Vander Donckt, M; Verdier, P; Viret, S; Khvedelidze, A; Tsamalaidze, Z; Autermann, C; Beranek, S; Feld, L; Kiesel, M K; Klein, K; Lipinski, M; Preuten, M; Schomakers, C; Schulz, J; Verlage, T; Albert, A; Brodski, M; Dietz-Laursonn, E; Duchardt, D; Endres, M; Erdmann, M; Erdweg, S; Esch, T; Fischer, R; Güth, A; Hamer, M; Hebbeker, T; Heidemann, C; Hoepfner, K; Knutzen, S; Merschmeyer, M; Meyer, A; Millet, P; Mukherjee, S; Olschewski, M; Padeken, K; Pook, T; Radziej, M; Reithler, H; Rieger, M; Scheuch, F; Sonnenschein, L; Teyssier, D; Thüer, S; Cherepanov, V; Flügge, G; Kargoll, B; Kress, T; Künsken, A; Lingemann, J; Müller, T; Nehrkorn, A; Nowack, A; Pistone, C; Pooth, O; Stahl, A; Aldaya Martin, M; Arndt, T; Asawatangtrakuldee, C; Beernaert, K; Behnke, O; Behrens, U; Bin Anuar, A A; Borras, K; Campbell, A; Connor, P; Contreras-Campana, C; Costanza, F; Diez Pardos, C; Dolinska, G; Eckerlin, G; Eckstein, D; Eichhorn, T; Eren, E; Gallo, E; Garay Garcia, J; Geiser, A; Gizhko, A; Grados Luyando, J M; Grohsjean, A; Gunnellini, P; Harb, A; Hauk, J; Hempel, M; Jung, H; Kalogeropoulos, A; Karacheban, O; Kasemann, M; Keaveney, J; Kleinwort, C; Korol, I; Krücker, D; Lange, W; Lelek, A; Lenz, T; Leonard, J; Lipka, K; Lobanov, A; Lohmann, W; Mankel, R; Melzer-Pellmann, I-A; Meyer, A B; Mittag, G; Mnich, J; Mussgiller, A; Pitzl, D; Placakyte, R; Raspereza, A; Roland, B; Sahin, M Ö; Saxena, P; Schoerner-Sadenius, T; Spannagel, S; Stefaniuk, N; Van Onsem, G P; Walsh, R; Wissing, C; Blobel, V; Centis Vignali, M; Draeger, A R; Dreyer, T; Garutti, E; Gonzalez, D; Haller, J; Hoffmann, M; Junkes, A; Klanner, R; Kogler, R; Kovalchuk, N; Lapsien, T; Marchesini, I; Marconi, D; Meyer, M; Niedziela, M; Nowatschin, D; Pantaleo, F; Peiffer, T; Perieanu, A; Poehlsen, J; Scharf, C; Schleper, P; Schmidt, A; Schumann, S; Schwandt, J; Stadie, H; Steinbrück, G; Stober, F M; Stöver, M; Tholen, H; Troendle, D; Usai, E; Vanelderen, L; Vanhoefer, A; Vormwald, B; Akbiyik, M; Barth, C; Baur, S; Baus, C; Berger, J; Butz, E; Caspart, R; Chwalek, T; Colombo, F; De Boer, W; Dierlamm, A; Fink, S; Freund, B; Friese, R; Giffels, M; Gilbert, A; Goldenzweig, P; Haitz, D; Hartmann, F; Heindl, S M; Husemann, U; Katkov, I; Kudella, S; Mildner, H; Mozer, M U; Müller, Th; Plagge, M; Quast, G; Rabbertz, K; Röcker, S; Roscher, F; Schröder, M; Shvetsov, I; Sieber, G; Simonis, H J; Ulrich, R; Wayand, S; Weber, M; Weiler, T; Williamson, S; Wöhrmann, C; Wolf, R; Anagnostou, G; Daskalakis, G; Geralis, T; Giakoumopoulou, V A; Kyriakis, A; Loukas, D; Topsis-Giotis, I; Kesisoglou, S; Panagiotou, A; Saoulidou, N; Tziaferi, E; Evangelou, I; Flouris, G; Foudas, C; Kokkas, P; Loukas, N; Manthos, N; Papadopoulos, I; Paradas, E; Filipovic, N; Pasztor, G; Bencze, G; Hajdu, C; Horvath, D; Sikler, F; Veszpremi, V; Vesztergombi, G; Zsigmond, A J; Beni, N; Czellar, S; Karancsi, J; Makovec, A; Molnar, J; Szillasi, Z; Bartók, M; Raics, P; Trocsanyi, Z L; Ujvari, B; Komaragiri, J R; Bahinipati, S; Bhowmik, S; Choudhury, S; Mal, P; Mandal, K; Nayak, A; Sahoo, D K; Sahoo, N; Swain, S K; Bansal, S; Beri, S B; Bhatnagar, V; Chawla, R; Bhawandeep, U; Kalsi, A K; Kaur, A; Kaur, M; Kumar, R; Kumari, P; Mehta, A; Mittal, M; Singh, J B; Walia, G; Kumar, Ashok; Bhardwaj, A; Choudhary, B C; Garg, R B; Keshri, S; Malhotra, S; Naimuddin, M; Ranjan, K; Sharma, R; Sharma, V; Bhattacharya, R; Bhattacharya, S; Chatterjee, K; Dey, S; Dutt, S; Dutta, S; Ghosh, S; Majumdar, N; Modak, A; Mondal, K; Mukhopadhyay, S; Nandan, S; Purohit, A; Roy, A; Roy, D; Roy Chowdhury, S; Sarkar, S; Sharan, M; Thakur, S; Behera, P K; Chudasama, R; Dutta, D; Jha, V; Kumar, V; Mohanty, A K; Netrakanti, P K; Pant, L M; Shukla, P; Topkar, A; Aziz, T; Dugad, S; Kole, G; Mahakud, B; Mitra, S; Mohanty, G B; Parida, B; Sur, N; Sutar, B; Banerjee, S; Dewanjee, R K; Ganguly, S; Guchait, M; Jain, Sa; Kumar, S; Maity, M; Majumder, G; Mazumdar, K; Sarkar, T; Wickramage, N; Chauhan, S; Dube, S; Hegde, V; Kapoor, A; Kothekar, K; Pandey, S; Rane, A; Sharma, S; Chenarani, S; Eskandari Tadavani, E; Etesami, S M; Khakzad, M; Mohammadi Najafabadi, M; Naseri, M; Paktinat Mehdiabadi, S; Rezaei Hosseinabadi, F; Safarzadeh, B; Zeinali, M; Felcini, M; Grunewald, M; Abbrescia, M; Calabria, C; Caputo, C; Colaleo, A; Creanza, D; Cristella, L; De Filippis, N; De Palma, M; Fiore, L; Iaselli, G; Maggi, G; Maggi, M; Miniello, G; My, S; Nuzzo, S; Pompili, A; Pugliese, G; Radogna, R; Ranieri, A; Selvaggi, G; Sharma, A; Silvestris, L; Venditti, R; Verwilligen, P; Abbiendi, G; Battilana, C; Bonacorsi, D; Braibant-Giacomelli, S; Brigliadori, L; Campanini, R; Capiluppi, P; Castro, A; Cavallo, F R; Chhibra, S S; Codispoti, G; Cuffiani, M; Dallavalle, G M; Fabbri, F; Fanfani, A; Fasanella, D; Giacomelli, P; Grandi, C; Guiducci, L; Marcellini, S; Masetti, G; Montanari, A; Navarria, F L; Perrotta, A; Rossi, A M; Rovelli, T; Siroli, G P; Tosi, N; Albergo, S; Costa, S; Di Mattia, A; Giordano, F; Potenza, R; Tricomi, A; Tuve, C; Barbagli, G; Ciulli, V; Civinini, C; D'Alessandro, R; Focardi, E; Lenzi, P; Meschini, M; Paoletti, S; Russo, L; Sguazzoni, G; Strom, D; Viliani, L; Benussi, L; Bianco, S; Fabbri, F; Piccolo, D; Primavera, F; Calvelli, V; Ferro, F; Monge, M R; Robutti, E; Tosi, S; Brianza, L; Brivio, F; Ciriolo, V; Dinardo, M E; Fiorendi, S; Gennai, S; Ghezzi, A; Govoni, P; Malberti, M; Malvezzi, S; Manzoni, R A; Menasce, D; Moroni, L; Paganoni, M; Pedrini, D; Pigazzini, S; Ragazzi, S; Tabarelli de Fatis, T; Buontempo, S; Cavallo, N; De Nardo, G; Di Guida, S; Esposito, M; Fabozzi, F; Fienga, F; Iorio, A O M; Lanza, G; Lista, L; Meola, S; Paolucci, P; Sciacca, C; Thyssen, F; Azzi, P; Bacchetta, N; Benato, L; Boletti, A; Carlin, R; Checchia, P; Dall'Osso, M; De Castro Manzano, P; Dorigo, T; Dosselli, U; Gasparini, F; Gasparini, U; Gozzelino, A; Lacaprara, S; Margoni, M; Meneguzzo, A T; Pazzini, J; Pegoraro, M; Pozzobon, N; Ronchese, P; Sgaravatto, M; Simonetto, F; Torassa, E; Ventura, S; Zanetti, M; Zotto, P; Braghieri, A; Fallavollita, F; Magnani, A; Montagna, P; Ratti, S P; Re, V; Riccardi, C; Salvini, P; Vai, I; Vitulo, P; Alunni Solestizi, L; Bilei, G M; Ciangottini, D; Fanò, L; Lariccia, P; Leonardi, R; Mantovani, G; Menichelli, M; Saha, A; Santocchia, A; Androsov, K; Azzurri, P; Bagliesi, G; Bernardini, J; Boccali, T; Castaldi, R; Ciocci, M A; Dell'Orso, R; Donato, S; Fedi, G; Giassi, A; Grippo, M T; Ligabue, F; Lomtadze, T; Martini, L; Messineo, A; Palla, F; Rizzi, A; Savoy-Navarro, A; Spagnolo, P; Tenchini, R; Tonelli, G; Venturi, A; Verdini, P G; Barone, L; Cavallari, F; Cipriani, M; Del Re, D; Diemoz, M; Gelli, S; Longo, E; Margaroli, F; Marzocchi, B; Meridiani, P; Organtini, G; Paramatti, R; Preiato, F; Rahatlou, S; Rovelli, C; Santanastasio, F; Amapane, N; Arcidiacono, R; Argiro, S; Arneodo, M; Bartosik, N; Bellan, R; Biino, C; Cartiglia, N; Cenna, F; Costa, M; Covarelli, R; Degano, A; Demaria, N; Finco, L; Kiani, B; Mariotti, C; Maselli, S; Migliore, E; Monaco, V; Monteil, E; Monteno, M; Obertino, M M; Pacher, L; Pastrone, N; Pelliccioni, M; Pinna Angioni, G L; Ravera, F; Romero, A; Ruspa, M; Sacchi, R; Shchelina, K; Sola, V; Solano, A; Staiano, A; Traczyk, P; Belforte, S; Casarsa, M; Cossutti, F; Della Ricca, G; Zanetti, A; Kim, D H; Kim, G N; Kim, M S; Lee, S; Lee, S W; Oh, Y D; Sekmen, S; Son, D C; Yang, Y C; Lee, A; Kim, H; Brochero Cifuentes, J A; Kim, T J; Cho, S; Choi, S; Go, Y; Gyun, D; Ha, S; Hong, B; Jo, Y; Kim, Y; Lee, K; Lee, K S; Lee, S; Lim, J; Park, S K; Roh, Y; Almond, J; Kim, J; Lee, H; Oh, S B; Radburn-Smith, B C; Seo, S H; Yang, U K; Yoo, H D; Yu, G B; Choi, M; Kim, H; Kim, J H; Lee, J S H; Park, I C; Ryu, G; Ryu, M S; Choi, Y; Goh, J; Hwang, C; Lee, J; Yu, I; Dudenas, V; Juodagalvis, A; Vaitkus, J; Ahmed, I; Ibrahim, Z A; Md Ali, M A B; Mohamad Idris, F; Wan Abdullah, W A T; Yusli, M N; Zolkapli, Z; Castilla-Valdez, H; De La Cruz-Burelo, E; Heredia-De La Cruz, I; Hernandez-Almada, A; Lopez-Fernandez, R; Magaña Villalba, R; Mejia Guisao, J; Sanchez-Hernandez, A; Carrillo Moreno, S; Oropeza Barrera, C; Vazquez Valencia, F; Carpinteyro, S; Pedraza, I; Salazar Ibarguen, H A; Uribe Estrada, C; Morelos Pineda, A; Krofcheck, D; Butler, P H; Ahmad, A; Ahmad, M; Hassan, Q; Hoorani, H R; Khan, W A; Saddique, A; Shah, M A; Shoaib, M; Waqas, M; Bialkowska, H; Bluj, M; Boimska, B; Frueboes, T; Górski, M; Kazana, M; Nawrocki, K; Romanowska-Rybinska, K; Szleper, M; Zalewski, P; Bunkowski, K; Byszuk, A; Doroba, K; Kalinowski, A; Konecki, M; Krolikowski, J; Misiura, M; Olszewski, M; Walczak, M; Bargassa, P; Beirão Da Cruz E Silva, C; Calpas, B; Di Francesco, A; Faccioli, P; Ferreira Parracho, P G; Gallinaro, M; Hollar, J; Leonardo, N; Lloret Iglesias, L; Nemallapudi, M V; Rodrigues Antunes, J; Seixas, J; Toldaiev, O; Vadruccio, D; Varela, J; Vischia, P; Afanasiev, S; Bunin, P; Gavrilenko, M; Golutvin, I; Gorbunov, I; Kamenev, A; Karjavin, V; Lanev, A; Malakhov, A; Matveev, V; Palichik, V; Perelygin, V; Shmatov, S; Shulha, S; Skatchkov, N; Smirnov, V; Voytishin, N; Zarubin, A; Chtchipounov, L; Golovtsov, V; Ivanov, Y; Kim, V; Kuznetsova, E; Murzin, V; Oreshkin, V; Sulimov, V; Vorobyev, A; Andreev, Yu; Dermenev, A; Gninenko, S; Golubev, N; Karneyeu, A; Kirsanov, M; Krasnikov, N; Pashenkov, A; Tlisov, D; Toropin, A; Epshteyn, V; Gavrilov, V; Lychkovskaya, N; Popov, V; Pozdnyakov, I; Safronov, G; Spiridonov, A; Toms, M; Vlasov, E; Zhokin, A; Aushev, T; Bylinkin, A; Chadeeva, M; Chistov, R; Polikarpov, S; Andreev, V; Azarkin, M; Dremin, I; Kirakosyan, M; Leonidov, A; Terkulov, A; Baskakov, A; Belyaev, A; Boos, E; Ershov, A; Gribushin, A; Kaminskiy, A; Kodolova, O; Korotkikh, V; Lokhtin, I; Miagkov, I; Obraztsov, S; Petrushanko, S; Savrin, V; Snigirev, A; Vardanyan, I; Blinov, V; Skovpen, Y; Shtol, D; Azhgirey, I; Bayshev, I; Bitioukov, S; Elumakhov, D; Kachanov, V; Kalinin, A; Konstantinov, D; Krychkine, V; Petrov, V; Ryutin, R; Sobol, A; Troshin, S; Tyurin, N; Uzunian, A; Volkov, A; Adzic, P; Cirkovic, P; Devetak, D; Dordevic, M; Milosevic, J; Rekovic, V; Alcaraz Maestre, J; Barrio Luna, M; Calvo, E; Cerrada, M; Chamizo Llatas, M; Colino, N; De La Cruz, B; Delgado Peris, A; Escalante Del Valle, A; Fernandez Bedoya, C; Fernández Ramos, J P; Flix, J; Fouz, M C; Garcia-Abia, P; Gonzalez Lopez, O; Goy Lopez, S; Hernandez, J M; Josa, M I; Navarro De Martino, E; Pérez-Calero Yzquierdo, A; Puerta Pelayo, J; Quintario Olmeda, A; Redondo, I; Romero, L; Soares, M S; de Trocóniz, J F; Missiroli, M; Moran, D; Cuevas, J; Fernandez Menendez, J; Gonzalez Caballero, I; González Fernández, J R; Palencia Cortezon, E; Sanchez Cruz, S; Suárez Andrés, I; Vizan Garcia, J M; Cabrillo, I J; Calderon, A; Curras, E; Fernandez, M; Garcia-Ferrero, J; Gomez, G; Lopez Virto, A; Marco, J; Martinez Rivero, C; Matorras, F; Piedra Gomez, J; Rodrigo, T; Ruiz-Jimeno, A; Scodellaro, L; Trevisani, N; Vila, I; Vilar Cortabitarte, R; Abbaneo, D; Auffray, E; Auzinger, G; Baillon, P; Ball, A H; Barney, D; Bloch, P; Bocci, A; Botta, C; Camporesi, T; Castello, R; Cepeda, M; Cerminara, G; Chen, Y; d'Enterria, D; Dabrowski, A; Daponte, V; David, A; De Gruttola, M; De Roeck, A; Di Marco, E; Dobson, M; Dorney, B; du Pree, T; Duggan, D; Dünser, M; Dupont, N; Elliott-Peisert, A; Everaerts, P; Fartoukh, S; Franzoni, G; Fulcher, J; Funk, W; Gigi, D; Gill, K; Girone, M; Glege, F; Gulhan, D; Gundacker, S; Guthoff, M; Harris, P; Hegeman, J; Innocente, V; Janot, P; Kieseler, J; Kirschenmann, H; Knünz, V; Kornmayer, A; Kortelainen, M J; Kousouris, K; Krammer, M; Lange, C; Lecoq, P; Lourenço, C; Lucchini, M T; Malgeri, L; Mannelli, M; Martelli, A; Meijers, F; Merlin, J A; Mersi, S; Meschi, E; Milenovic, P; Moortgat, F; Morovic, S; Mulders, M; Neugebauer, H; Orfanelli, S; Orsini, L; Pape, L; Perez, E; Peruzzi, M; Petrilli, A; Petrucciani, G; Pfeiffer, A; Pierini, M; Racz, A; Reis, T; Rolandi, G; Rovere, M; Sakulin, H; Sauvan, J B; Schäfer, C; Schwick, C; Seidel, M; Sharma, A; Silva, P; Sphicas, P; Steggemann, J; Stoye, M; Takahashi, Y; Tosi, M; Treille, D; Triossi, A; Tsirou, A; Veckalns, V; Veres, G I; Verweij, M; Wardle, N; Wöhri, H K; Zagozdzinska, A; Zeuner, W D; Bertl, W; Deiters, K; Erdmann, W; Horisberger, R; Ingram, Q; Kaestli, H C; Kotlinski, D; Langenegger, U; Rohe, T; Wiederkehr, S A; Bachmair, F; Bäni, L; Bianchini, L; Casal, B; Dissertori, G; Dittmar, M; Donegà, M; Grab, C; Heidegger, C; Hits, D; Hoss, J; Kasieczka, G; Lustermann, W; Mangano, B; Marionneau, M; Martinez Ruiz Del Arbol, P; Masciovecchio, M; Meinhard, M T; Meister, D; Micheli, F; Musella, P; Nessi-Tedaldi, F; Pandolfi, F; Pata, J; Pauss, F; Perrin, G; Perrozzi, L; Quittnat, M; Rossini, M; Schönenberger, M; Starodumov, A; Tavolaro, V R; Theofilatos, K; Wallny, R; Aarrestad, T K; Amsler, C; Caminada, L; Canelli, M F; De Cosa, A; Galloni, C; Hinzmann, A; Hreus, T; Kilminster, B; Ngadiuba, J; Pinna, D; Rauco, G; Robmann, P; Salerno, D; Seitz, C; Yang, Y; Zucchetta, A; Candelise, V; Doan, T H; Jain, Sh; Khurana, R; Konyushikhin, M; Kuo, C M; Lin, W; Pozdnyakov, A; Yu, S S; Kumar, Arun; Chang, P; Chang, Y H; Chao, Y; Chen, K F; Chen, P H; Fiori, F; Hou, W-S; Hsiung, Y; Liu, Y F; Lu, R-S; Miñano Moya, M; Paganis, E; Psallidas, A; Tsai, J F; Asavapibhop, B; Singh, G; Srimanobhas, N; Suwonjandee, N; Adiguzel, A; Cerci, S; Damarseckin, S; Demiroglu, Z S; Dozen, C; Dumanoglu, I; Girgis, S; Gokbulut, G; Guler, Y; Hos, I; Kangal, E E; Kara, O; Kayis Topaksu, A; Kiminsu, U; Oglakci, M; Onengut, G; Ozdemir, K; Sunar Cerci, D; Topakli, H; Turkcapar, S; Zorbakir, I S; Zorbilmez, C; Bilin, B; Bilmis, S; Isildak, B; Karapinar, G; Yalvac, M; Zeyrek, M; Gülmez, E; Kaya, M; Kaya, O; Yetkin, E A; Yetkin, T; Cakir, A; Cankocak, K; Sen, S; Grynyov, B; Levchuk, L; Sorokin, P; Aggleton, R; Ball, F; Beck, L; Brooke, J J; Burns, D; Clement, E; Cussans, D; Flacher, H; Goldstein, J; Grimes, M; Heath, G P; Heath, H F; Jacob, J; Kreczko, L; Lucas, C; Newbold, D M; Paramesvaran, S; Poll, A; Sakuma, T; Seif El Nasr-Storey, S; Smith, D; Smith, V J; Belyaev, A; Brew, C; Brown, R M; Calligaris, L; Cieri, D; Cockerill, D J A; Coughlan, J A; Harder, K; Harper, S; Olaiya, E; Petyt, D; Shepherd-Themistocleous, C H; Thea, A; Tomalin, I R; Williams, T; Baber, M; Bainbridge, R; Buchmuller, O; Bundock, A; Burton, D; Casasso, S; Citron, M; Colling, D; Corpe, L; Dauncey, P; Davies, G; De Wit, A; Della Negra, M; Di Maria, R; Dunne, P; Elwood, A; Futyan, D; Haddad, Y; Hall, G; Iles, G; James, T; Lane, R; Laner, C; Lucas, R; Lyons, L; Magnan, A-M; Malik, S; Mastrolorenzo, L; Nash, J; Nikitenko, A; Pela, J; Penning, B; Pesaresi, M; Raymond, D M; Richards, A; Rose, A; Scott, E; Seez, C; Summers, S; Tapper, A; Uchida, K; Vazquez Acosta, M; Virdee, T; Wright, J; Zenz, S C; Cole, J E; Hobson, P R; Khan, A; Kyberd, P; Reid, I D; Symonds, P; Teodorescu, L; Turner, M; Borzou, A; Call, K; Dittmann, J; Hatakeyama, K; Liu, H; Pastika, N; Bartek, R; Dominguez, A; Buccilli, A; Cooper, S I; Henderson, C; Rumerio, P; West, C; Arcaro, D; Avetisyan, A; Bose, T; Gastler, D; Rankin, D; Richardson, C; Rohlf, J; Sulak, L; Zou, D; Benelli, G; Cutts, D; Garabedian, A; Hakala, J; Heintz, U; Hogan, J M; Jesus, O; Kwok, K H M; Laird, E; Landsberg, G; Mao, Z; Narain, M; Piperov, S; Sagir, S; Spencer, E; Syarif, R; Breedon, R; Burns, D; Calderon De La Barca Sanchez, M; Chauhan, S; Chertok, M; Conway, J; Conway, R; Cox, P T; Erbacher, R; Flores, C; Funk, G; Gardner, M; Ko, W; Lander, R; Mclean, C; Mulhearn, M; Pellett, D; Pilot, J; Shalhout, S; Shi, M; Smith, J; Squires, M; Stolp, D; Tos, K; Tripathi, M; Bachtis, M; Bravo, C; Cousins, R; Dasgupta, A; Florent, A; Hauser, J; Ignatenko, M; Mccoll, N; Saltzberg, D; Schnaible, C; Valuev, V; Weber, M; Bouvier, E; Burt, K; Clare, R; Ellison, J; Gary, J W; Ghiasi Shirazi, S M A; Hanson, G; Heilman, J; Jandir, P; Kennedy, E; Lacroix, F; Long, O R; Olmedo Negrete, M; Paneva, M I; Shrinivas, A; Si, W; Wei, H; Wimpenny, S; Yates, B R; Branson, J G; Cerati, G B; Cittolin, S; Derdzinski, M; Gerosa, R; Holzner, A; Klein, D; Krutelyov, V; Letts, J; Macneill, I; Olivito, D; Padhi, S; Pieri, M; Sani, M; Sharma, V; Simon, S; Tadel, M; Vartak, A; Wasserbaech, S; Welke, C; Wood, J; Würthwein, F; Yagil, A; Zevi Della Porta, G; Amin, N; Bhandari, R; Bradmiller-Feld, J; Campagnari, C; Dishaw, A; Dutta, V; Franco Sevilla, M; George, C; Golf, F; Gouskos, L; Gran, J; Heller, R; Incandela, J; Mullin, S D; Ovcharova, A; Qu, H; Richman, J; Stuart, D; Suarez, I; Yoo, J; Anderson, D; Bendavid, J; Bornheim, A; Bunn, J; Duarte, J; Lawhorn, J M; Mott, A; Newman, H B; Pena, C; Spiropulu, M; Vlimant, J R; Xie, S; Zhu, R Y; Andrews, M B; Ferguson, T; Paulini, M; Russ, J; Sun, M; Vogel, H; Vorobiev, I; Weinberg, M; Cumalat, J P; Ford, W T; Jensen, F; Johnson, A; Krohn, M; Leontsinis, S; Mulholland, T; Stenson, K; Wagner, S R; Alexander, J; Chaves, J; Chu, J; Dittmer, S; Mcdermott, K; Mirman, N; Nicolas Kaufman, G; Patterson, J R; Rinkevicius, A; Ryd, A; Skinnari, L; Soffi, L; Tan, S M; Tao, Z; Thom, J; Tucker, J; Wittich, P; Zientek, M; Winn, D; Abdullin, S; Albrow, M; Apollinari, G; Apresyan, A; Banerjee, S; Bauerdick, L A T; Beretvas, A; Berryhill, J; Bhat, P C; Bolla, G; Burkett, K; Butler, J N; Cheung, H W K; Chlebana, F; Cihangir, S; Cremonesi, M; Elvira, V D; Fisk, I; Freeman, J; Gottschalk, E; Gray, L; Green, D; Grünendahl, S; Gutsche, O; Hare, D; Harris, R M; Hasegawa, S; Hirschauer, J; Hu, Z; Jayatilaka, B; Jindariani, S; Johnson, M; Joshi, U; Klima, B; Kreis, B; Lammel, S; Linacre, J; Lincoln, D; Lipton, R; Liu, M; Liu, T; Lopes De Sá, R; Lykken, J; Maeshima, K; Magini, N; Marraffino, J M; Maruyama, S; Mason, D; McBride, P; Merkel, P; Mrenna, S; Nahn, S; O'Dell, V; Pedro, K; Prokofyev, O; Rakness, G; Ristori, L; Sexton-Kennedy, E; Soha, A; Spalding, W J; Spiegel, L; Stoynev, S; Strait, J; Strobbe, N; Taylor, L; Tkaczyk, S; Tran, N V; Uplegger, L; Vaandering, E W; Vernieri, C; Verzocchi, M; Vidal, R; Wang, M; Weber, H A; Whitbeck, A; Wu, Y; Acosta, D; Avery, P; Bortignon, P; Bourilkov, D; Brinkerhoff, A; Carnes, A; Carver, M; Curry, D; Das, S; Field, R D; Furic, I K; Konigsberg, J; Korytov, A; Low, J F; Ma, P; Matchev, K; Mei, H; Mitselmakher, G; Rank, D; Shchutska, L; Sperka, D; Thomas, L; Wang, J; Wang, S; Yelton, J; Linn, S; Markowitz, P; Martinez, G; Rodriguez, J L; Ackert, A; Adams, T; Askew, A; Bein, S; Hagopian, S; Hagopian, V; Johnson, K F; Prosper, H; Santra, A; Yohay, R; Baarmand, M M; Bhopatkar, V; Colafranceschi, S; Hohlmann, M; Noonan, D; Roy, T; Yumiceva, F; Adams, M R; Apanasevich, L; Berry, D; Betts, R R; Bucinskaite, I; Cavanaugh, R; Evdokimov, O; Gauthier, L; Gerber, C E; Hofman, D J; Jung, K; Sandoval Gonzalez, I D; Varelas, N; Wang, H; Wu, Z; Zakaria, M; Zhang, J; Bilki, B; Clarida, W; Dilsiz, K; Durgut, S; Gandrajula, R P; Haytmyradov, M; Khristenko, V; Merlo, J-P; Mermerkaya, H; Mestvirishvili, A; Moeller, A; Nachtman, J; Ogul, H; Onel, Y; Ozok, F; Penzo, A; Snyder, C; Tiras, E; Wetzel, J; Yi, K; Anderson, I; Blumenfeld, B; Cocoros, A; Eminizer, N; Fehling, D; Feng, L; Gritsan, A V; Maksimovic, P; Roskes, J; Sarica, U; Swartz, M; Xiao, M; Xin, Y; You, C; Al-Bataineh, A; Baringer, P; Bean, A; Boren, S; Bowen, J; Castle, J; Forthomme, L; Kenny, R P; Khalil, S; Kropivnitskaya, A; Majumder, D; Mcbrayer, W; Murray, M; Sanders, S; Stringer, R; Tapia Takaki, J D; Wang, Q; Ivanov, A; Kaadze, K; Maravin, Y; Mohammadi, A; Saini, L K; Skhirtladze, N; Toda, S; Rebassoo, F; Wright, D; Anelli, C; Baden, A; Baron, O; Belloni, A; Calvert, B; Eno, S C; Ferraioli, C; Gomez, J A; Hadley, N J; Jabeen, S; Jeng, G Y; Kellogg, R G; Kolberg, T; Kunkle, J; Mignerey, A C; Ricci-Tam, F; Shin, Y H; Skuja, A; Tonjes, M B; Tonwar, S C; Abercrombie, D; Allen, B; Apyan, A; Azzolini, V; Barbieri, R; Baty, A; Bi, R; Bierwagen, K; Brandt, S; Busza, W; Cali, I A; D'Alfonso, M; Demiragli, Z; Di Matteo, L; Gomez Ceballos, G; Goncharov, M; Hsu, D; Iiyama, Y; Innocenti, G M; Klute, M; Kovalskyi, D; Krajczar, K; Lai, Y S; Lee, Y-J; Levin, A; Luckey, P D; Maier, B; Marini, A C; Mcginn, C; Mironov, C; Narayanan, S; Niu, X; Paus, C; Roland, C; Roland, G; Salfeld-Nebgen, J; Stephans, G S F; Tatar, K; Varma, M; Velicanu, D; Veverka, J; Wang, J; Wang, T W; Wyslouch, B; Yang, M; Benvenuti, A C; Chatterjee, R M; Evans, A; Hansen, P; Kalafut, S; Kao, S C; Kubota, Y; Lesko, Z; Mans, J; Nourbakhsh, S; Ruckstuhl, N; Rusack, R; Tambe, N; Turkewitz, J; Acosta, J G; Oliveros, S; Avdeeva, E; Bloom, K; Claes, D R; Fangmeier, C; Gonzalez Suarez, R; Kamalieddin, R; Kravchenko, I; Malta Rodrigues, A; Monroy, J; Siado, J E; Snow, G R; Stieger, B; Alyari, M; Dolen, J; Godshalk, A; Harrington, C; Iashvili, I; Kaisen, J; Nguyen, D; Parker, A; Rappoccio, S; Roozbahani, B; Alverson, G; Barberis, E; Hortiangtham, A; Massironi, A; Morse, D M; Nash, D; Orimoto, T; Teixeira De Lima, R; Trocino, D; Wang, R-J; Wood, D; Bhattacharya, S; Charaf, O; Hahn, K A; Kumar, A; Mucia, N; Odell, N; Pollack, B; Schmitt, M H; Sung, K; Trovato, M; Velasco, M; Dev, N; Hildreth, M; Hurtado Anampa, K; Jessop, C; Karmgard, D J; Kellams, N; Lannon, K; Marinelli, N; Meng, F; Mueller, C; Musienko, Y; Planer, M; Reinsvold, A; Ruchti, R; Rupprecht, N; Smith, G; Taroni, S; Wayne, M; Wolf, M; Woodard, A; Alimena, J; Antonelli, L; Bylsma, B; Durkin, L S; Flowers, S; Francis, B; Hart, A; Hill, C; Hughes, R; Ji, W; Liu, B; Luo, W; Puigh, D; Winer, B L; Wulsin, H W; Cooperstein, S; Driga, O; Elmer, P; Hardenbrook, J; Hebda, P; Lange, D; Luo, J; Marlow, D; Medvedeva, T; Mei, K; Ojalvo, I; Olsen, J; Palmer, C; Piroué, P; Stickland, D; Svyatkovskiy, A; Tully, C; Malik, S; Barker, A; Barnes, V E; Folgueras, S; Gutay, L; Jha, M K; Jones, M; Jung, A W; Khatiwada, A; Miller, D H; Neumeister, N; Schulte, J F; Shi, X; Sun, J; Wang, F; Xie, W; Parashar, N; Stupak, J; Adair, A; Akgun, B; Chen, Z; Ecklund, K M; Geurts, F J M; Guilbaud, M; Li, W; Michlin, B; Northup, M; Padley, B P; Roberts, J; Rorie, J; Tu, Z; Zabel, J; Betchart, B; Bodek, A; de Barbaro, P; Demina, R; Duh, Y T; Ferbel, T; Galanti, M; Garcia-Bellido, A; Han, J; Hindrichs, O; Khukhunaishvili, A; Lo, K H; Tan, P; Verzetti, M; Agapitos, A; Chou, J P; Gershtein, Y; Gómez Espinosa, T A; Halkiadakis, E; Heindl, M; Hughes, E; Kaplan, S; Kunnawalkam Elayavalli, R; Kyriacou, S; Lath, A; Nash, K; Osherson, M; Saka, H; Salur, S; Schnetzer, S; Sheffield, D; Somalwar, S; Stone, R; Thomas, S; Thomassen, P; Walker, M; Delannoy, A G; Foerster, M; Heideman, J; Riley, G; Rose, K; Spanier, S; Thapa, K; Bouhali, O; Celik, A; Dalchenko, M; De Mattia, M; Delgado, A; Dildick, S; Eusebi, R; Gilmore, J; Huang, T; Juska, E; Kamon, T; Mueller, R; Pakhotin, Y; Patel, R; Perloff, A; Perniè, L; Rathjens, D; Safonov, A; Tatarinov, A; Ulmer, K A; Akchurin, N; Cowden, C; Damgov, J; De Guio, F; Dragoiu, C; Dudero, P R; Faulkner, J; Gurpinar, E; Kunori, S; Lamichhane, K; Lee, S W; Libeiro, T; Peltola, T; Undleeb, S; Volobouev, I; Wang, Z; Greene, S; Gurrola, A; Janjam, R; Johns, W; Maguire, C; Melo, A; Ni, H; Sheldon, P; Tuo, S; Velkovska, J; Xu, Q; Arenton, M W; Barria, P; Cox, B; Goodell, J; Hirosky, R; Ledovskoy, A; Li, H; Neu, C; Sinthuprasith, T; Sun, X; Wang, Y; Wolfe, E; Xia, F; Clarke, C; Harr, R; Karchin, P E; Sturdy, J; Belknap, D A; Buchanan, J; Caillol, C; Dasu, S; Dodd, L; Duric, S; Gomber, B; Grothe, M; Herndon, M; Hervé, A; Klabbers, P; Lanaro, A; Levine, A; Long, K; Loveless, R; Perry, T; Pierro, G A; Polese, G; Ruggles, T; Savin, A; Smith, N; Smith, W H; Taylor, D; Woods, N

    2017-01-01

    This paper reports the measurement of [Formula: see text] meson production in proton-proton ([Formula: see text]) and proton-lead ([Formula: see text]) collisions at a center-of-mass energy per nucleon pair of [Formula: see text] by the CMS experiment at the LHC. The data samples used in the analysis correspond to integrated luminosities of 28[Formula: see text] and 35[Formula: see text] for [Formula: see text] and [Formula: see text] collisions, respectively. Prompt and nonprompt [Formula: see text] mesons, the latter produced in the decay of [Formula: see text] hadrons, are measured in their dimuon decay channels. Differential cross sections are measured in the transverse momentum range of [Formula: see text], and center-of-mass rapidity ranges of [Formula: see text] ([Formula: see text]) and [Formula: see text] ([Formula: see text]). The nuclear modification factor, [Formula: see text], is measured as a function of both [Formula: see text] and [Formula: see text]. Small modifications to the [Formula: see text] cross sections are observed in [Formula: see text] relative to [Formula: see text] collisions. The ratio of [Formula: see text] production cross sections in [Formula: see text]-going and Pb-going directions, [Formula: see text], studied as functions of [Formula: see text] and [Formula: see text], shows a significant decrease for increasing transverse energy deposited at large pseudorapidities. These results, which cover a wide kinematic range, provide new insight on the role of cold nuclear matter effects on prompt and nonprompt [Formula: see text] production.

  1. Classification of Rainbows

    NASA Astrophysics Data System (ADS)

    Ricard, J. L.; Peter, A. L.; Barckicke, J.

    2015-12-01

    CLASSIFICATION OF RAINBOWS Jean Louis Ricard,1,2,* Peter Adams ,2 and Jean Barckicke 2,3 1CNRM, Météo-France,42 Avenue Gaspard Coriolis, 31057 Toulouse, France 2CEPAL, 148 Himley Road, Dudley, West Midlands DY1 2QH, United Kingdom 3DP/Compas,Météo-France,42 Avenue Gaspard Coriolis, 31057 Toulouse, France *Corresponding author: Dr_Jean_Ricard@yahoo,co,ukRainbows are the most beautiful and most spectacular optical atmospheric phenomenon. Humphreys (1964) pointedly noted that "the "explanations" generally given of the rainbow [ in textbooks] may well be said to explain beautifully that which does not occur, and to leave unexplained which does" . . . "The records of close observations of rainbows soon show that not even the colors are always the same". Textbooks stress that the main factor affecting the aspect of the rainbow is the radius of the water droplets. In his well-known textbook entitled "the nature of light & colour in the open air", Minnaert (1954) gives the chief features of the rainbow depending on the diameter of the drops producing it. For this study, we have gathered hundreds of pictures of primary bows. We sort out the pictures into classes. The classes are defined in a such way that rainbows belonging to the same class look similar. Our results are surprising and do not confirm Minnaert's classification. In practice, the size of the water droplets is only a minor factor controlling the overall aspect of the rainbow. The main factor appears to be the height of the sun above the horizon. At sunset, the width of the red band increases, while the width of the other bands of colours decreases. The orange, the violet, the blue and the green bands disappear completely in this order. At the end, the primary bow is mainly red and slightly yellow. Picture = Contrast-enhanced photograph of a primary bow picture (prepared by Andrew Dunn).

  2. Robotic Rock Classification

    NASA Technical Reports Server (NTRS)

    Hebert, Martial

    1999-01-01

    This report describes a three-month research program undertook jointly by the Robotics Institute at Carnegie Mellon University and Ames Research Center as part of the Ames' Joint Research Initiative (JRI.) The work was conducted at the Ames Research Center by Mr. Liam Pedersen, a graduate student in the CMU Ph.D. program in Robotics under the supervision Dr. Ted Roush at the Space Science Division of the Ames Research Center from May 15 1999 to August 15, 1999. Dr. Martial Hebert is Mr. Pedersen's research adviser at CMU and is Principal Investigator of this Grant. The goal of this project is to investigate and implement methods suitable for a robotic rover to autonomously identify rocks and minerals in its vicinity, and to statistically characterize the local geological environment. Although primary sensors for these tasks are a reflection spectrometer and color camera, the goal is to create a framework under which data from multiple sensors, and multiple readings on the same object, can be combined in a principled manner. Furthermore, it is envisioned that knowledge of the local area, either a priori or gathered by the robot, will be used to improve classification accuracy. The key results obtained during this project are: The continuation of the development of a rock classifier; development of theoretical statistical methods; development of methods for evaluating and selecting sensors; and experimentation with data mining techniques on the Ames spectral library. The results of this work are being applied at CMU, in particular in the context of the Winter 99 Antarctica expedition in which the classification techniques will be used on the Nomad robot. Conversely, the software developed based on those techniques will continue to be made available to NASA Ames and the data collected from the Nomad experiments will also be made available.

  3. A Functional Classification of Occupations.

    ERIC Educational Resources Information Center

    McKinlay, Donald Bruce

    The need for more and better manpower information is hampered by the lack of adequate occupational data classification systems. The diversity of interests in occupations probably accounts for the absence of consensus regarding either the general outlines or the specific details of a standardized occupational classification system which would…

  4. Dewey Decimal Classification: A Quagmire.

    ERIC Educational Resources Information Center

    Gamaluddin, Ahmad Fouad

    1980-01-01

    A survey of 660 Pennsylvania school librarians indicates that, though there is limited professional interest in the Library of Congress Classification system, Dewey Decimal Classification (DDC) appears to be firmly entrenched. This article also discusses the relative merits of DDC, the need for a uniform system, librarianship preparation, and…

  5. 43 CFR 2461.1 - Proposed classifications.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false Proposed classifications. 2461.1 Section... MANAGEMENT, DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.1 Proposed classifications. (a) Proposed classifications will...

  6. 43 CFR 2461.4 - Changing classifications.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false Changing classifications. 2461.4 Section... MANAGEMENT, DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.4 Changing classifications. Classifications may be changed...

  7. 22 CFR 9.4 - Original classification.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 1 2014-04-01 2014-04-01 false Original classification. 9.4 Section 9.4... classification. (a) Definition. Original classification is the initial determination that certain information... classification. (b) Classification levels. (1) Top Secret shall be applied to information the unauthorized...

  8. 40 CFR 152.164 - Classification procedures.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 25 2013-07-01 2013-07-01 false Classification procedures. 152.164... PESTICIDE REGISTRATION AND CLASSIFICATION PROCEDURES Classification of Pesticides § 152.164 Classification procedures. (a) Grouping of products for classification purposes. In its discretion, the Agency may identify...

  9. 43 CFR 2461.1 - Proposed classifications.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 43 Public Lands: Interior 2 2013-10-01 2013-10-01 false Proposed classifications. 2461.1 Section... MANAGEMENT, DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.1 Proposed classifications. (a) Proposed classifications will...

  10. 40 CFR 152.164 - Classification procedures.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 24 2014-07-01 2014-07-01 false Classification procedures. 152.164... PESTICIDE REGISTRATION AND CLASSIFICATION PROCEDURES Classification of Pesticides § 152.164 Classification procedures. (a) Grouping of products for classification purposes. In its discretion, the Agency may identify...

  11. 7 CFR 27.34 - Classification procedure.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Classification procedure. 27.34 Section 27.34... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Classification and Micronaire Determinations § 27.34 Classification procedure. Classification shall proceed as rapidly as possible, but not...

  12. 22 CFR 9.4 - Original classification.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 1 2013-04-01 2013-04-01 false Original classification. 9.4 Section 9.4... classification. (a) Definition. Original classification is the initial determination that certain information... classification. (b) Classification levels. (1) Top Secret shall be applied to information the unauthorized...

  13. 32 CFR 2001.15 - Classification guides.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 32 National Defense 6 2013-07-01 2013-07-01 false Classification guides. 2001.15 Section 2001.15..., NATIONAL ARCHIVES AND RECORDS ADMINISTRATION CLASSIFIED NATIONAL SECURITY INFORMATION Classification § 2001.15 Classification guides. (a) Preparation of classification guides. Originators of classification...

  14. 22 CFR 9.4 - Original classification.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 1 2012-04-01 2012-04-01 false Original classification. 9.4 Section 9.4... classification. (a) Definition. Original classification is the initial determination that certain information... classification. (b) Classification levels. (1) Top Secret shall be applied to information the unauthorized...

  15. 32 CFR 2001.14 - Classification challenges.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 32 National Defense 6 2013-07-01 2013-07-01 false Classification challenges. 2001.14 Section 2001... Classification § 2001.14 Classification challenges. (a) Challenging classification. Authorized holders, including authorized holders outside the classifying agency, who want to challenge the classification status of...

  16. 43 CFR 2461.4 - Changing classifications.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 43 Public Lands: Interior 2 2012-10-01 2012-10-01 false Changing classifications. 2461.4 Section... MANAGEMENT, DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.4 Changing classifications. Classifications may be changed...

  17. 32 CFR 2001.14 - Classification challenges.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 32 National Defense 6 2014-07-01 2014-07-01 false Classification challenges. 2001.14 Section 2001... Classification § 2001.14 Classification challenges. (a) Challenging classification. Authorized holders, including authorized holders outside the classifying agency, who want to challenge the classification status of...

  18. 32 CFR 2001.15 - Classification guides.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 6 2012-07-01 2012-07-01 false Classification guides. 2001.15 Section 2001.15..., NATIONAL ARCHIVES AND RECORDS ADMINISTRATION CLASSIFIED NATIONAL SECURITY INFORMATION Classification § 2001.15 Classification guides. (a) Preparation of classification guides. Originators of classification...

  19. 40 CFR 152.164 - Classification procedures.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 25 2012-07-01 2012-07-01 false Classification procedures. 152.164... PESTICIDE REGISTRATION AND CLASSIFICATION PROCEDURES Classification of Pesticides § 152.164 Classification procedures. (a) Grouping of products for classification purposes. In its discretion, the Agency may identify...

  20. 7 CFR 27.34 - Classification procedure.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Classification procedure. 27.34 Section 27.34... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Classification and Micronaire Determinations § 27.34 Classification procedure. Classification shall proceed as rapidly as possible, but not...

  1. 32 CFR 2001.14 - Classification challenges.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 6 2012-07-01 2012-07-01 false Classification challenges. 2001.14 Section 2001... Classification § 2001.14 Classification challenges. (a) Challenging classification. Authorized holders, including authorized holders outside the classifying agency, who want to challenge the classification status of...

  2. 43 CFR 2461.1 - Proposed classifications.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 43 Public Lands: Interior 2 2012-10-01 2012-10-01 false Proposed classifications. 2461.1 Section... MANAGEMENT, DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.1 Proposed classifications. (a) Proposed classifications will...

  3. 43 CFR 2461.4 - Changing classifications.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 43 Public Lands: Interior 2 2014-10-01 2014-10-01 false Changing classifications. 2461.4 Section... MANAGEMENT, DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.4 Changing classifications. Classifications may be changed...

  4. 43 CFR 2461.1 - Proposed classifications.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 43 Public Lands: Interior 2 2014-10-01 2014-10-01 false Proposed classifications. 2461.1 Section... MANAGEMENT, DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.1 Proposed classifications. (a) Proposed classifications will...

  5. 43 CFR 2461.4 - Changing classifications.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 43 Public Lands: Interior 2 2013-10-01 2013-10-01 false Changing classifications. 2461.4 Section... MANAGEMENT, DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.4 Changing classifications. Classifications may be changed...

  6. 7 CFR 27.34 - Classification procedure.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Classification procedure. 27.34 Section 27.34... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Classification and Micronaire Determinations § 27.34 Classification procedure. Classification shall proceed as rapidly as possible, but not...

  7. 32 CFR 2001.15 - Classification guides.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 32 National Defense 6 2014-07-01 2014-07-01 false Classification guides. 2001.15 Section 2001.15..., NATIONAL ARCHIVES AND RECORDS ADMINISTRATION CLASSIFIED NATIONAL SECURITY INFORMATION Classification § 2001.15 Classification guides. (a) Preparation of classification guides. Originators of classification...

  8. 7 CFR 28.911 - Review classification.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Review classification. 28.911 Section 28.911... Producers Classification § 28.911 Review classification. (a) A producer may request one review classification for each bale of eligible cotton. The fee for review classification is $2.20 per bale. (b) Samples...

  9. 12 CFR 403.4 - Derivative classification.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... SAFEGUARDING OF NATIONAL SECURITY INFORMATION § 403.4 Derivative classification. (a) Use of derivative classification. (1) Unlike original classification which is an initial determination, derivative classification... 12 Banks and Banking 4 2010-01-01 2010-01-01 false Derivative classification. 403.4 Section 403.4...

  10. 22 CFR 9.4 - Original classification.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 1 2011-04-01 2011-04-01 false Original classification. 9.4 Section 9.4... classification. (a) Definition. Original classification is the initial determination that certain information... classification. (b) Classification levels. (1) Top Secret shall be applied to information the unauthorized...

  11. 32 CFR 2001.14 - Classification challenges.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 32 National Defense 6 2011-07-01 2011-07-01 false Classification challenges. 2001.14 Section 2001... Classification § 2001.14 Classification challenges. (a) Challenging classification. Authorized holders, including authorized holders outside the classifying agency, who want to challenge the classification status of...

  12. 32 CFR 2001.15 - Classification guides.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classification guides. 2001.15 Section 2001.15..., NATIONAL ARCHIVES AND RECORDS ADMINISTRATION CLASSIFIED NATIONAL SECURITY INFORMATION Classification § 2001.15 Classification guides. (a) Preparation of classification guides. Originators of classification...

  13. 40 CFR 152.164 - Classification procedures.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 24 2011-07-01 2011-07-01 false Classification procedures. 152.164... PESTICIDE REGISTRATION AND CLASSIFICATION PROCEDURES Classification of Pesticides § 152.164 Classification procedures. (a) Grouping of products for classification purposes. In its discretion, the Agency may identify...

  14. 40 CFR 152.164 - Classification procedures.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 23 2010-07-01 2010-07-01 false Classification procedures. 152.164... PESTICIDE REGISTRATION AND CLASSIFICATION PROCEDURES Classification of Pesticides § 152.164 Classification procedures. (a) Grouping of products for classification purposes. In its discretion, the Agency may identify...

  15. 7 CFR 27.34 - Classification procedure.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Classification procedure. 27.34 Section 27.34... REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Classification and Micronaire Determinations § 27.34 Classification procedure. Classification shall proceed as rapidly as possible, but not...

  16. 32 CFR 2001.15 - Classification guides.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 32 National Defense 6 2011-07-01 2011-07-01 false Classification guides. 2001.15 Section 2001.15..., NATIONAL ARCHIVES AND RECORDS ADMINISTRATION CLASSIFIED NATIONAL SECURITY INFORMATION Classification § 2001.15 Classification guides. (a) Preparation of classification guides. Originators of classification...

  17. Doing Mathematics with Purpose: Mathematical Text Types

    ERIC Educational Resources Information Center

    Dostal, Hannah M.; Robinson, Richard

    2018-01-01

    Mathematical literacy includes learning to read and write different types of mathematical texts as part of purposeful mathematical meaning making. Thus in this article, we describe how learning to read and write mathematical texts (proof text, algorithmic text, algebraic/symbolic text, and visual text) supports the development of students'…

  18. What's so Simple about Simplified Texts? A Computational and Psycholinguistic Investigation of Text Comprehension and Text Processing

    ERIC Educational Resources Information Center

    Crossley, Scott A.; Yang, Hae Sung; McNamara, Danielle S.

    2014-01-01

    This study uses a moving windows self-paced reading task to assess both text comprehension and processing time of authentic texts and these same texts simplified to beginning and intermediate levels. Forty-eight second language learners each read 9 texts (3 different authentic, beginning, and intermediate level texts). Repeated measures ANOVAs…

  19. Science and Technology Text Mining: Nonlinear Dynamics

    DTIC Science & Technology

    2004-02-01

    journal/ institution publication and citation data. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES...systems whose time evolution has a sensitive dependence on initial conditions. An approximately 100 term query was developed for accessing records from the...SCI papers by a factor of ~ 2. Appendix 4 contains a co-occurrence matrix of the top 15 countries. In terms of absolute numbers of co-authored papers

  20. 14 CFR 1203.501 - Applying derivative classification markings.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... INFORMATION SECURITY PROGRAM Derivative Classification § 1203.501 Applying derivative classification markings... classification decisions: (b) Verify the information's current level of classification so far as practicable...

  1. 14 CFR 1203.501 - Applying derivative classification markings.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... INFORMATION SECURITY PROGRAM Derivative Classification § 1203.501 Applying derivative classification markings... classification decisions: (b) Verify the information's current level of classification so far as practicable...

  2. Text Mining of UU-ITE Implementation in Indonesia

    NASA Astrophysics Data System (ADS)

    Hakim, Lukmanul; Kusumasari, Tien F.; Lubis, Muharman

    2018-04-01

    At present, social media and networks act as one of the main platforms for sharing information, idea, thought and opinions. Many people share their knowledge and express their views on the specific topics or current hot issues that interest them. The social media texts have rich information about the complaints, comments, recommendation and suggestion as the automatic reaction or respond to government initiative or policy in order to overcome certain issues.This study examines the sentiment from netizensas part of citizen who has vocal sound about the implementation of UU ITE as the first cyberlaw in Indonesia as a means to identify the current tendency of citizen perception. To perform text mining techniques, this study used Twitter Rest API while R programming was utilized for the purpose of classification analysis based on hierarchical cluster.

  3. Fever detection from free-text clinical records for biosurveillance.

    PubMed

    Chapman, Wendy W; Dowling, John N; Wagner, Michael M

    2004-04-01

    Automatic detection of cases of febrile illness may have potential for early detection of outbreaks of infectious disease either by identification of anomalous numbers of febrile illness or in concert with other information in diagnosing specific syndromes, such as febrile respiratory syndrome. At most institutions, febrile information is contained only in free-text clinical records. We compared the sensitivity and specificity of three fever detection algorithms for detecting fever from free-text. Keyword CC and CoCo classified patients based on triage chief complaints; Keyword HP classified patients based on dictated emergency department reports. Keyword HP was the most sensitive (sensitivity 0.98, specificity 0.89), and Keyword CC was the most specific (sensitivity 0.61, specificity 1.0). Because chief complaints are available sooner than emergency department reports, we suggest a combined application that classifies patients based on their chief complaint followed by classification based on their emergency department report, once the report becomes available.

  4. 45 CFR 601.2 - Classification authority.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.2 Classification authority. The... a Foundation employee develops information that appears to warrant classification because of its...

  5. 45 CFR 601.2 - Classification authority.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.2 Classification authority. The... a Foundation employee develops information that appears to warrant classification because of its...

  6. 45 CFR 601.2 - Classification authority.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.2 Classification authority. The... a Foundation employee develops information that appears to warrant classification because of its...

  7. 45 CFR 601.2 - Classification authority.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.2 Classification authority. The... a Foundation employee develops information that appears to warrant classification because of its...

  8. 45 CFR 601.2 - Classification authority.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... Public Welfare Regulations Relating to Public Welfare (Continued) NATIONAL SCIENCE FOUNDATION CLASSIFICATION AND DECLASSIFICATION OF NATIONAL SECURITY INFORMATION § 601.2 Classification authority. The... a Foundation employee develops information that appears to warrant classification because of its...

  9. 6 CFR 7.26 - Derivative classification.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... already classified, and marking the newly developed material consistent with the classification markings... classification decisions and carry forward to any newly created documents the pertinent classification markings. (d) Information classified derivatively from other classified information shall be classified and...

  10. 5 CFR 9901.221 - Classification requirements.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Section 9901.221 Administrative Personnel DEPARTMENT OF DEFENSE HUMAN RESOURCES MANAGEMENT AND LABOR RELATIONS SYSTEMS (DEPARTMENT OF DEFENSE-OFFICE OF PERSONNEL MANAGEMENT) DEPARTMENT OF DEFENSE NATIONAL SECURITY PERSONNEL SYSTEM (NSPS) Classification Classification Process § 9901.221 Classification...

  11. 5 CFR 9701.221 - Classification requirements.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... Section 9701.221 Administrative Personnel DEPARTMENT OF HOMELAND SECURITY HUMAN RESOURCES MANAGEMENT SYSTEM (DEPARTMENT OF HOMELAND SECURITY-OFFICE OF PERSONNEL MANAGEMENT) DEPARTMENT OF HOMELAND SECURITY HUMAN RESOURCES MANAGEMENT SYSTEM Classification Classification Process § 9701.221 Classification...

  12. 5 CFR 9701.221 - Classification requirements.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... Section 9701.221 Administrative Personnel DEPARTMENT OF HOMELAND SECURITY HUMAN RESOURCES MANAGEMENT SYSTEM (DEPARTMENT OF HOMELAND SECURITY-OFFICE OF PERSONNEL MANAGEMENT) DEPARTMENT OF HOMELAND SECURITY HUMAN RESOURCES MANAGEMENT SYSTEM Classification Classification Process § 9701.221 Classification...

  13. 5 CFR 9701.221 - Classification requirements.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... Section 9701.221 Administrative Personnel DEPARTMENT OF HOMELAND SECURITY HUMAN RESOURCES MANAGEMENT SYSTEM (DEPARTMENT OF HOMELAND SECURITY-OFFICE OF PERSONNEL MANAGEMENT) DEPARTMENT OF HOMELAND SECURITY HUMAN RESOURCES MANAGEMENT SYSTEM Classification Classification Process § 9701.221 Classification...

  14. 5 CFR 9701.221 - Classification requirements.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Section 9701.221 Administrative Personnel DEPARTMENT OF HOMELAND SECURITY HUMAN RESOURCES MANAGEMENT SYSTEM (DEPARTMENT OF HOMELAND SECURITY-OFFICE OF PERSONNEL MANAGEMENT) DEPARTMENT OF HOMELAND SECURITY HUMAN RESOURCES MANAGEMENT SYSTEM Classification Classification Process § 9701.221 Classification...

  15. Classification of close binary systems by Svechnikov

    NASA Astrophysics Data System (ADS)

    Dryomova, G. N.

    The paper presents the historical overview of classification schemes of eclipsing variable stars with the foreground of advantages of the classification scheme by Svechnikov being widely appreciated for Close Binary Systems due to simplicity of classification criteria and brevity.

  16. 75 FR 70754 - Postal Classification Changes

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-18

    ... POSTAL REGULATORY COMMISSION [Docket No. MC2011-5; Order No. 583] Postal Classification Changes...-filed Postal Service request announcing a classification change affecting bundle and container charges... Commission announcing a classification change [[Page 70755

  17. The neuron classification problem

    PubMed Central

    Bota, Mihail; Swanson, Larry W.

    2007-01-01

    A systematic account of neuron cell types is a basic prerequisite for determining the vertebrate nervous system global wiring diagram. With comprehensive lineage and phylogenetic information unavailable, a general ontology based on structure-function taxonomy is proposed and implemented in a knowledge management system, and a prototype analysis of select regions (including retina, cerebellum, and hypothalamus) presented. The supporting Brain Architecture Knowledge Management System (BAMS) Neuron ontology is online and its user interface allows queries about terms and their definitions, classification criteria based on the original literature and “Petilla Convention” guidelines, hierarchies, and relations—with annotations documenting each ontology entry. Combined with three BAMS modules for neural regions, connections between regions and neuron types, and molecules, the Neuron ontology provides a general framework for physical descriptions and computational modeling of neural systems. The knowledge management system interacts with other web resources, is accessible in both XML and RDF/OWL, is extendible to the whole body, and awaits large-scale data population requiring community participation for timely implementation. PMID:17582506

  18. Aircraft Operations Classification System

    NASA Technical Reports Server (NTRS)

    Harlow, Charles; Zhu, Weihong

    2001-01-01

    Accurate data is important in the aviation planning process. In this project we consider systems for measuring aircraft activity at airports. This would include determining the type of aircraft such as jet, helicopter, single engine, and multiengine propeller. Some of the issues involved in deploying technologies for monitoring aircraft operations are cost, reliability, and accuracy. In addition, the system must be field portable and acceptable at airports. A comparison of technologies was conducted and it was decided that an aircraft monitoring system should be based upon acoustic technology. A multimedia relational database was established for the study. The information contained in the database consists of airport information, runway information, acoustic records, photographic records, a description of the event (takeoff, landing), aircraft type, and environmental information. We extracted features from the time signal and the frequency content of the signal. A multi-layer feed-forward neural network was chosen as the classifier. Training and testing results were obtained. We were able to obtain classification results of over 90 percent for training and testing for takeoff events.

  19. Stellar Classification Online - Public Exploration

    NASA Astrophysics Data System (ADS)

    Castelaz, Michael W.; Bedell, W.; Barker, T.; Cline, J.; Owen, L.

    2009-01-01

    The Michigan Objective Prism Blue Survey (e.g. Sowell et al 2007, AJ, 134, 1089) photographic plates located in the Astronomical Photographic Data Archive at the Pisgah Astronomical Research Institute hold hundreds of thousands of stellar spectra, many of which have not been classified before. The public is invited to participate in a distributed computing online environment to classify the stars on the objective prism plates. The online environment is called Stellar Classification Online - Public Exploration (SCOPE). Through a website, SCOPE participants are given a tutorial on stellar spectra and their classification, and given the chance to practice their skills at classification. After practice, participants register, login, and select stars for classification from scans of the objective prism plates. Their classifications are recorded in a database where the accumulation of classifications of the same star by many users will be statistically analyzed. The project includes stars with known spectral types to help test the reliability of classifications. The SCOPE webpage and the use of results will be described.

  20. Nonlinear programming for classification problems in machine learning

    NASA Astrophysics Data System (ADS)

    Astorino, Annabella; Fuduli, Antonio; Gaudioso, Manlio

    2016-10-01

    We survey some nonlinear models for classification problems arising in machine learning. In the last years this field has become more and more relevant due to a lot of practical applications, such as text and web classification, object recognition in machine vision, gene expression profile analysis, DNA and protein analysis, medical diagnosis, customer profiling etc. Classification deals with separation of sets by means of appropriate separation surfaces, which is generally obtained by solving a numerical optimization model. While linear separability is the basis of the most popular approach to classification, the Support Vector Machine (SVM), in the recent years using nonlinear separating surfaces has received some attention. The objective of this work is to recall some of such proposals, mainly in terms of the numerical optimization models. In particular we tackle the polyhedral, ellipsoidal, spherical and conical separation approaches and, for some of them, we also consider the semisupervised versions.

  1. Angle classification revisited 2: a modified Angle classification.

    PubMed

    Katz, M I

    1992-09-01

    Edward Angle, in his classification of malocclusions, appears to have made Class I a range of abnormality, not a point of ideal occlusion. Current goals of orthodontic treatment, however, strive for the designation "Class I occlusion" to be synonymous with the point of ideal intermeshing and not a broad range. If contemporary orthodontists are to continue to use Class I as a goal, then it is appropriate that Dr. Angle's century-old classification, be modified to be more precise.

  2. Text Analysis: Critical Component of Planning for Text-Based Discussion Focused on Comprehension of Informational Texts

    ERIC Educational Resources Information Center

    Kucan, Linda; Palincsar, Annemarie Sullivan

    2018-01-01

    This investigation focuses on a tool used in a reading methods course to introduce reading specialist candidates to text analysis as a critical component of planning for text-based discussions. Unlike planning that focuses mainly on important text content or information, a text analysis approach focuses both on content and how that content is…

  3. Sentiment analysis of Arabic tweets using text mining techniques

    NASA Astrophysics Data System (ADS)

    Al-Horaibi, Lamia; Khan, Muhammad Badruddin

    2016-07-01

    Sentiment analysis has become a flourishing field of text mining and natural language processing. Sentiment analysis aims to determine whether the text is written to express positive, negative, or neutral emotions about a certain domain. Most sentiment analysis researchers focus on English texts, with very limited resources available for other complex languages, such as Arabic. In this study, the target was to develop an initial model that performs satisfactorily and measures Arabic Twitter sentiment by using machine learning approach, Naïve Bayes and Decision Tree for classification algorithms. The datasets used contains more than 2,000 Arabic tweets collected from Twitter. We performed several experiments to check the performance of the two algorithms classifiers using different combinations of text-processing functions. We found that available facilities for Arabic text processing need to be made from scratch or improved to develop accurate classifiers. The small functionalities developed by us in a Python language environment helped improve the results and proved that sentiment analysis in the Arabic domain needs lot of work on the lexicon side.

  4. Text, photo, and line extraction in scanned documents

    NASA Astrophysics Data System (ADS)

    Erkilinc, M. Sezer; Jaber, Mustafa; Saber, Eli; Bauer, Peter; Depalov, Dejan

    2012-07-01

    We propose a page layout analysis algorithm to classify a scanned document into different regions such as text, photo, or strong lines. The proposed scheme consists of five modules. The first module performs several image preprocessing techniques such as image scaling, filtering, color space conversion, and gamma correction to enhance the scanned image quality and reduce the computation time in later stages. Text detection is applied in the second module wherein wavelet transform and run-length encoding are employed to generate and validate text regions, respectively. The third module uses a Markov random field based block-wise segmentation that employs a basis vector projection technique with maximum a posteriori probability optimization to detect photo regions. In the fourth module, methods for edge detection, edge linking, line-segment fitting, and Hough transform are utilized to detect strong edges and lines. In the last module, the resultant text, photo, and edge maps are combined to generate a page layout map using K-Means clustering. The proposed algorithm has been tested on several hundred documents that contain simple and complex page layout structures and contents such as articles, magazines, business cards, dictionaries, and newsletters, and compared against state-of-the-art page-segmentation techniques with benchmark performance. The results indicate that our methodology achieves an average of ˜89% classification accuracy in text, photo, and background regions.

  5. Learning the Structure of Biomedical Relationships from Unstructured Text

    PubMed Central

    Percha, Bethany; Altman, Russ B.

    2015-01-01

    The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. PMID:26219079

  6. The classification of phobic disorders.

    PubMed

    Sheehan, D V; Sheehan, K H

    The history of classification of phobic disorders is reviewed. Problems in the ability of current classification schemes to predict, control and describe the relationship between the symptoms and other phenomena are outlined. A new classification of phobic disorders is proposed based on the presence or absence of an endogenous anxiety syndrome with the phobias. The two categories of phobic disorder have a different clinical presentation and course, a different mean age of onset, distribution of age of onset, sex distribution, response to treatment modalities, GSR testing and habituation response. Empirical evidence supporting this proposal is cited. This classification has heuristic merit in guiding research efforts and discussions and in directing the clinician to a simple and practical solution of his patient's phobic disorder.

  7. CLASSIFICATION FRAMEWORK FOR COASTAL SYSTEMS

    EPA Science Inventory

    U.S. Environmental Protection Agency. Classification Framework for Coastal Systems. EPA/600/R-04/061. U.S. Environmental Protection Agency, National Health and Environmental Effects Research Laboratory, Atlantic Ecology Division, Narragansett, RI, Gulf Ecology Division, Gulf Bree...

  8. Vehicle classification system : FHWA perspective

    DOT National Transportation Integrated Search

    2000-08-01

    Issues related to vehicle classification in light of the Federal Highway Administration (FHWA) Traffic Monitoring Guide (TMG) revision will be discussed. The discussion will look at the expressed needs of the data users and the associated impacts on ...

  9. Handwriting Classification in Forensic Science.

    ERIC Educational Resources Information Center

    Ansell, Michael

    1979-01-01

    Considers systems for the classification of handwriting features, discusses computer storage of information about handwriting features, and summarizes recent studies that give an idea of the range of forensic handwriting research. (GT)

  10. Deep Learning for ECG Classification

    NASA Astrophysics Data System (ADS)

    Pyakillya, B.; Kazachenko, N.; Mikhailovsky, N.

    2017-10-01

    The importance of ECG classification is very high now due to many current medical applications where this problem can be stated. Currently, there are many machine learning (ML) solutions which can be used for analyzing and classifying ECG data. However, the main disadvantages of these ML results is use of heuristic hand-crafted or engineered features with shallow feature learning architectures. The problem relies in the possibility not to find most appropriate features which will give high classification accuracy in this ECG problem. One of the proposing solution is to use deep learning architectures where first layers of convolutional neurons behave as feature extractors and in the end some fully-connected (FCN) layers are used for making final decision about ECG classes. In this work the deep learning architecture with 1D convolutional layers and FCN layers for ECG classification is presented and some classification results are showed.

  11. Critical Evaluation of Headache Classifications

    PubMed Central

    ÖZGE, Aynur

    2013-01-01

    Transforming a subjective sense like headache into an objective state and establishing a common language for this complaint which can be both a symptom and a disease all by itself have kept the investigators busy for years. Each recommendation proposed has brought along a set of patients who do not meet the criteria. While almost the most ideal and most comprehensive classification studies continued at this point, this time criticisims about withdrawing from daily practice came to the fore. In this article, the classification adventure of scientists who work in the area of headache will be summarized. More specifically, 2 classifications made by the International Headache Society (IHS) and the point reached in relation with the 3rd classification which is still being worked on will be discussed together with headache subtypes. It has been presented with the wish and belief that it will contribute to the readers and young investigators who are interested in this subject. PMID:28360577

  12. Critical Evaluation of Headache Classifications.

    PubMed

    Özge, Aynur

    2013-08-01

    Transforming a subjective sense like headache into an objective state and establishing a common language for this complaint which can be both a symptom and a disease all by itself have kept the investigators busy for years. Each recommendation proposed has brought along a set of patients who do not meet the criteria. While almost the most ideal and most comprehensive classification studies continued at this point, this time criticisims about withdrawing from daily practice came to the fore. In this article, the classification adventure of scientists who work in the area of headache will be summarized. More specifically, 2 classifications made by the International Headache Society (IHS) and the point reached in relation with the 3rd classification which is still being worked on will be discussed together with headache subtypes. It has been presented with the wish and belief that it will contribute to the readers and young investigators who are interested in this subject.

  13. Vehicle classification using mobile sensors.

    DOT National Transportation Integrated Search

    2013-04-01

    In this research, the feasibility of using mobile traffic sensors for binary vehicle classification on arterial roads is investigated. Features (e.g. : speed related, acceleration/deceleration related, etc.) are extracted from vehicle traces (passeng...

  14. Text Extraction from Scene Images by Character Appearance and Structure Modeling

    PubMed Central

    Yi, Chucai; Tian, Yingli

    2012-01-01

    In this paper, we propose a novel algorithm to detect text information from natural scene images. Scene text classification and detection are still open research topics. Our proposed algorithm is able to model both character appearance and structure to generate representative and discriminative text descriptors. The contributions of this paper include three aspects: 1) a new character appearance model by a structure correlation algorithm which extracts discriminative appearance features from detected interest points of character samples; 2) a new text descriptor based on structons and correlatons, which model character structure by structure differences among character samples and structure component co-occurrence; and 3) a new text region localization method by combining color decomposition, character contour refinement, and string line alignment to localize character candidates and refine detected text regions. We perform three groups of experiments to evaluate the effectiveness of our proposed algorithm, including text classification, text detection, and character identification. The evaluation results on benchmark datasets demonstrate that our algorithm achieves the state-of-the-art performance on scene text classification and detection, and significantly outperforms the existing algorithms for character identification. PMID:23316111

  15. Using machine learning to disentangle homonyms in large text corpora.

    PubMed

    Roll, Uri; Correia, Ricardo A; Berger-Tal, Oded

    2018-06-01

    Systematic reviews are an increasingly popular decision-making tool that provides an unbiased summary of evidence to support conservation action. These reviews bridge the gap between researchers and managers by presenting a comprehensive overview of all studies relating to a particular topic and identify specifically where and under which conditions an effect is present. However, several technical challenges can severely hinder the feasibility and applicability of systematic reviews, for example, homonyms (terms that share spelling but differ in meaning). Homonyms add noise to search results and cannot be easily identified or removed. We developed a semiautomated approach that can aid in the classification of homonyms among narratives. We used a combination of automated content analysis and artificial neural networks to quickly and accurately sift through large corpora of academic texts and classify them to distinct topics. As an example, we explored the use of the word reintroduction in academic texts. Reintroduction is used within the conservation context to indicate the release of organisms to their former native habitat; however, a Web of Science search for this word returned thousands of publications in which the term has other meanings and contexts. Using our method, we automatically classified a sample of 3000 of these publications with over 99% accuracy, relative to a manual classification. Our approach can be used easily with other homonyms and can greatly facilitate systematic reviews or similar work in which homonyms hinder the harnessing of large text corpora. Beyond homonyms we see great promise in combining automated content analysis and machine-learning methods to handle and screen big data for relevant information in conservation science. © 2017 Society for Conservation Biology.

  16. Phylogenetic classification of bony fishes.

    PubMed

    Betancur-R, Ricardo; Wiley, Edward O; Arratia, Gloria; Acero, Arturo; Bailly, Nicolas; Miya, Masaki; Lecointre, Guillaume; Ortí, Guillermo

    2017-07-06

    Fish classifications, as those of most other taxonomic groups, are being transformed drastically as new molecular phylogenies provide support for natural groups that were unanticipated by previous studies. A brief review of the main criteria used by ichthyologists to define their classifications during the last 50 years, however, reveals slow progress towards using an explicit phylogenetic framework. Instead, the trend has been to rely, in varying degrees, on deep-rooted anatomical concepts and authority, often mixing taxa with explicit phylogenetic support with arbitrary groupings. Two leading sources in ichthyology frequently used for fish classifications (JS Nelson's volumes of Fishes of the World and W. Eschmeyer's Catalog of Fishes) fail to adopt a global phylogenetic framework despite much recent progress made towards the resolution of the fish Tree of Life. The first explicit phylogenetic classification of bony fishes was published in 2013, based on a comprehensive molecular phylogeny ( www.deepfin.org ). We here update the first version of that classification by incorporating the most recent phylogenetic results. The updated classification presented here is based on phylogenies inferred using molecular and genomic data for nearly 2000 fishes. A total of 72 orders (and 79 suborders) are recognized in this version, compared with 66 orders in version 1. The phylogeny resolves placement of 410 families, or ~80% of the total of 514 families of bony fishes currently recognized. The ordinal status of 30 percomorph families included in this study, however, remains uncertain (incertae sedis in the series Carangaria, Ovalentaria, or Eupercaria). Comments to support taxonomic decisions and comparisons with conflicting taxonomic groups proposed by others are presented. We also highlight cases were morphological support exist for the groups being classified. This version of the phylogenetic classification of bony fishes is substantially improved, providing resolution

  17. Examining Text Complexity in the Early Grades

    ERIC Educational Resources Information Center

    Fitzgerald, Jill; Elmore, Jeff; Hiebert, Elfrieda H.; Koons, Heather H.; Bowen, Kimberly; Sanford-Moore, Eleanor E.; Stenner, A. Jackson

    2016-01-01

    The Common Core raises the stature of texts to new heights, creating a hubbub. The fuss is especially messy at the early grades, where children are expected to read more complex texts than in the past. But early-grades teachers have been given little actionable guidance about text complexity. The authors recently examined early-grades texts to…

  18. Characterizing Reading Comprehension of Mathematical Texts

    ERIC Educational Resources Information Center

    Osterholm, Magnus

    2006-01-01

    This study compares reading comprehension of three different texts: two mathematical texts and one historical text. The two mathematical texts both present basic concepts of group theory, but one does it using mathematical symbols and the other only uses natural language. A total of 95 upper secondary and university students read one of the…

  19. The Text Marking Patterns of College Students.

    ERIC Educational Resources Information Center

    Nist, Sherrie L.; Kirby, Katie

    1989-01-01

    Examines patterns in college students' text markings using texts from three content areas: history, political science, and sociology. Indicates little differential marking between various text-types. Concludes that students seem to have little idea how to mark text efficiently. (MG)

  20. Full-Text Databases in Medicine.

    ERIC Educational Resources Information Center

    Sievert, MaryEllen C.; And Others

    1995-01-01

    Describes types of full-text databases in medicine; discusses features for searching full-text journal databases available through online vendors; reviews research on full-text databases in medicine; and describes the MEDLINE/Full-Text Research Project at the University of Missouri (Columbia) which investigated precision, recall, and relevancy.…

  1. Ototoxicity (cochleotoxicity) classifications: A review.

    PubMed

    Crundwell, Gemma; Gomersall, Phil; Baguley, David M

    2016-01-01

    Drug-mediated ototoxicity, specifically cochleotoxicity, is a concern for patients receiving medications for the treatment of serious illness. A number of classification schemes exist, most of which are based on pure-tone audiometry, in order to assist non-audiological/non-otological specialists in the identification and monitoring of iatrogenic hearing loss. This review identifies the primary classification systems used in cochleototoxicity monitoring. By bringing together classifications published in discipline-specific literature, the paper aims to increase awareness of their relative strengths and limitations in the assessment and monitoring of ototoxic hearing loss and to indicate how future classification systems may improve upon the status-quo. Literature review. PubMed identified 4878 articles containing the search term ototox*. A systematic search identified 13 key classification systems. Cochleotoxicity classification systems can be divided into those which focus on hearing change from a baseline audiogram and those that focus on the functional impact of the hearing loss. Common weaknesses of these grading scales included a lack of sensitivity to small adverse changes in hearing thresholds, a lack of high-frequency audiometry (>8 kHz), and lack of indication of which changes are likely to be clinically significant for communication and quality of life.

  2. Acoustic classification of zooplankton

    NASA Astrophysics Data System (ADS)

    Martin Traykovski, Linda V.

    1998-11-01

    Work on the forward problem in zooplankton bioacoustics has resulted in the identification of three categories of acoustic scatterers: elastic-shelled (e.g. pteropods), fluid-like (e.g. euphausiids), and gas-bearing (e.g. siphonophores). The relationship between backscattered energy and animal biomass has been shown to vary by a factor of ~19,000 across these categories, so that to make accurate estimates of zooplankton biomass from acoustic backscatter measurements of the ocean, the acoustic characteristics of the species of interest must be well-understood. This thesis describes the development of both feature based and model based classification techniques to invert broadband acoustic echoes from individual zooplankton for scatterer type, as well as for particular parameters such as animal orientation. The feature based Empirical Orthogonal Function Classifier (EOFC) discriminates scatterer types by identifying characteristic modes of variability in the echo spectra, exploiting only the inherent characteristic structure of the acoustic signatures. The model based Model Parameterisation Classifier (MPC) classifies based on correlation of observed echo spectra with simplified parameterisations of theoretical scattering models for the three classes. The Covariance Mean Variance Classifiers (CMVC) are a set of advanced model based techniques which exploit the full complexity of the theoretical models by searching the entire physical model parameter space without employing simplifying parameterisations. Three different CMVC algorithms were developed: the Integrated Score Classifier (ISC), the Pairwise Score Classifier (PSC) and the Bayesian Probability Classifier (BPC); these classifiers assign observations to a class based on similarities in covariance, mean, and variance, while accounting for model ambiguity and validity. These feature based and model based inversion techniques were successfully applied to several thousand echoes acquired from broadband (~350 k

  3. Imitating manual curation of text-mined facts in biomedicine.

    PubMed

    Rodriguez-Esteban, Raul; Iossifov, Ivan; Rzhetsky, Andrey

    2006-09-08

    Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts--to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.

  4. 49 CFR 8.17 - Classification challenges.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 49 Transportation 1 2011-10-01 2011-10-01 false Classification challenges. 8.17 Section 8.17 Transportation Office of the Secretary of Transportation CLASSIFIED INFORMATION: CLASSIFICATION/DECLASSIFICATION/ACCESS Classification/Declassification of Information § 8.17 Classification challenges. (a) Authorized...

  5. 43 CFR 2410.1 - All classifications.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false All classifications. 2410.1 Section 2410.1..., DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) CRITERIA FOR ALL LAND CLASSIFICATIONS General Criteria § 2410.1 All classifications. All classifications under the regulations of this part will give due...

  6. 43 CFR 2461.2 - Classifications.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 43 Public Lands: Interior 2 2011-10-01 2011-10-01 false Classifications. 2461.2 Section 2461.2..., DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.2 Classifications. Not less than 60 days after publication of the...

  7. 32 CFR 2001.21 - Original classification.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Original classification. 2001.21 Section 2001.21... Markings § 2001.21 Original classification. (a) Primary markings. At the time of original classification... authority. The name and position, or personal identifier, of the original classification authority shall...

  8. Classification of wheat: Badhwar profile similarity technique

    NASA Technical Reports Server (NTRS)

    Austin, W. W.

    1980-01-01

    The Badwar profile similarity classification technique used successfully for classification of corn was applied to spring wheat classifications. The software programs and the procedures used to generate full-scene classifications are presented, and numerical results of the acreage estimations are given.

  9. 7 CFR 51.3198 - Size classifications.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Size classifications. 51.3198 Section 51.3198... Onions Size Classifications § 51.3198 Size classifications. Size shall be specified in connection with... certain size or larger, or in accordance with one of the size classifications listed below: Provided, that...

  10. 43 CFR 2410.1 - All classifications.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 43 Public Lands: Interior 2 2012-10-01 2012-10-01 false All classifications. 2410.1 Section 2410.1..., DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) CRITERIA FOR ALL LAND CLASSIFICATIONS General Criteria § 2410.1 All classifications. All classifications under the regulations of this part will give due...

  11. 49 CFR 8.17 - Classification challenges.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 49 Transportation 1 2013-10-01 2013-10-01 false Classification challenges. 8.17 Section 8.17 Transportation Office of the Secretary of Transportation CLASSIFIED INFORMATION: CLASSIFICATION/DECLASSIFICATION/ACCESS Classification/Declassification of Information § 8.17 Classification challenges. (a) Authorized...

  12. 43 CFR 2410.1 - All classifications.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 43 Public Lands: Interior 2 2013-10-01 2013-10-01 false All classifications. 2410.1 Section 2410.1..., DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) CRITERIA FOR ALL LAND CLASSIFICATIONS General Criteria § 2410.1 All classifications. All classifications under the regulations of this part will give due...

  13. 7 CFR 28.911 - Review classification.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Review classification. 28.911 Section 28.911... REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Cotton Classification and Market News Service for Producers Classification § 28.911 Review classification. (a) A producer may request one review...

  14. 28 CFR 345.20 - Position classification.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 28 Judicial Administration 2 2012-07-01 2012-07-01 false Position classification. 345.20 Section... INDUSTRIES (FPI) INMATE WORK PROGRAMS Position Classification § 345.20 Position classification. (a) Inmate... the objectives and principles of pay classification as a part of the routine orientation of new FPI...

  15. 49 CFR 8.17 - Classification challenges.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 49 Transportation 1 2014-10-01 2014-10-01 false Classification challenges. 8.17 Section 8.17 Transportation Office of the Secretary of Transportation CLASSIFIED INFORMATION: CLASSIFICATION/DECLASSIFICATION/ACCESS Classification/Declassification of Information § 8.17 Classification challenges. (a) Authorized...

  16. 7 CFR 51.2836 - Size classifications.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Size classifications. 51.2836 Section 51.2836...) Size Classifications § 51.2836 Size classifications. The size of onions may be specified in accordance with one of the following classifications. Size designation Minimum diameter Inches Millimeters Maximum...

  17. 49 CFR 8.17 - Classification challenges.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 49 Transportation 1 2012-10-01 2012-10-01 false Classification challenges. 8.17 Section 8.17 Transportation Office of the Secretary of Transportation CLASSIFIED INFORMATION: CLASSIFICATION/DECLASSIFICATION/ACCESS Classification/Declassification of Information § 8.17 Classification challenges. (a) Authorized...

  18. 6 CFR 7.26 - Derivative classification.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 6 Domestic Security 1 2012-01-01 2012-01-01 false Derivative classification. 7.26 Section 7.26... INFORMATION Classified Information § 7.26 Derivative classification. (a) Derivative classification is defined... already classified, and marking the newly developed material consistent with the classification markings...

  19. 32 CFR 2700.22 - Classification guides.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 6 2012-07-01 2012-07-01 false Classification guides. 2700.22 Section 2700.22... SECURITY INFORMATION REGULATIONS Derivative Classification § 2700.22 Classification guides. OMSN shall issue classification guides pursuant to section 2-2 of E.O. 12065. These guides, which shall be used to...

  20. 32 CFR 2001.10 - Classification standards.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 32 National Defense 6 2014-07-01 2014-07-01 false Classification standards. 2001.10 Section 2001... Classification § 2001.10 Classification standards. Identifying or describing damage to the national security. Section 1.1(a) of the Order specifies the conditions that must be met when making classification decisions...

  1. 7 CFR 51.2836 - Size classifications.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Size classifications. 51.2836 Section 51.2836...-Granex-Grano and Creole Types) Size Classifications § 51.2836 Size classifications. The size of onions may be specified in accordance with one of the following classifications. Size designation Minimum...

  2. 32 CFR 2001.10 - Classification standards.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 32 National Defense 6 2013-07-01 2013-07-01 false Classification standards. 2001.10 Section 2001... Classification § 2001.10 Classification standards. Identifying or describing damage to the national security. Section 1.1(a) of the Order specifies the conditions that must be met when making classification decisions...

  3. 7 CFR 51.1860 - Color classification.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Color classification. 51.1860 Section 51.1860... STANDARDS) United States Standards for Fresh Tomatoes 1 Color Classification § 51.1860 Color classification... illustrating the color classification requirements, as set forth in this section. This visual aid may be...

  4. 32 CFR 2700.22 - Classification guides.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 32 National Defense 6 2014-07-01 2014-07-01 false Classification guides. 2700.22 Section 2700.22... SECURITY INFORMATION REGULATIONS Derivative Classification § 2700.22 Classification guides. OMSN shall issue classification guides pursuant to section 2-2 of E.O. 12065. These guides, which shall be used to...

  5. 43 CFR 2461.2 - Classifications.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 43 Public Lands: Interior 2 2014-10-01 2014-10-01 false Classifications. 2461.2 Section 2461.2..., DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.2 Classifications. Not less than 60 days after publication of the...

  6. 32 CFR 2700.22 - Classification guides.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 32 National Defense 6 2013-07-01 2013-07-01 false Classification guides. 2700.22 Section 2700.22... SECURITY INFORMATION REGULATIONS Derivative Classification § 2700.22 Classification guides. OMSN shall issue classification guides pursuant to section 2-2 of E.O. 12065. These guides, which shall be used to...

  7. 28 CFR 345.20 - Position classification.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 28 Judicial Administration 2 2014-07-01 2014-07-01 false Position classification. 345.20 Section... INDUSTRIES (FPI) INMATE WORK PROGRAMS Position Classification § 345.20 Position classification. (a) Inmate... the objectives and principles of pay classification as a part of the routine orientation of new FPI...

  8. 22 CFR 9.6 - Derivative classification.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 1 2012-04-01 2012-04-01 false Derivative classification. 9.6 Section 9.6... classification. (a) Definition. Derivative classification is the incorporating, paraphrasing, restating or... with the classification of the source material. Duplication or reproduction of existing classified...

  9. 12 CFR 1777.20 - Capital classifications.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 12 Banks and Banking 9 2013-01-01 2013-01-01 false Capital classifications. 1777.20 Section 1777... DEVELOPMENT SAFETY AND SOUNDNESS PROMPT CORRECTIVE ACTION Capital Classifications and Orders Under Section 1366 of the 1992 Act § 1777.20 Capital classifications. (a) Capital classifications after the effective...

  10. 12 CFR 1777.20 - Capital classifications.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 12 Banks and Banking 10 2014-01-01 2014-01-01 false Capital classifications. 1777.20 Section 1777... DEVELOPMENT SAFETY AND SOUNDNESS PROMPT CORRECTIVE ACTION Capital Classifications and Orders Under Section 1366 of the 1992 Act § 1777.20 Capital classifications. (a) Capital classifications after the effective...

  11. 22 CFR 9.8 - Classification challenges.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 22 Foreign Relations 1 2012-04-01 2012-04-01 false Classification challenges. 9.8 Section 9.8 Foreign Relations DEPARTMENT OF STATE GENERAL SECURITY INFORMATION REGULATIONS § 9.8 Classification... classification status is improper are expected and encouraged to challenge the classification status of the...

  12. 22 CFR 9.6 - Derivative classification.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 1 2013-04-01 2013-04-01 false Derivative classification. 9.6 Section 9.6... classification. (a) Definition. Derivative classification is the incorporating, paraphrasing, restating or... with the classification of the source material. Duplication or reproduction of existing classified...

  13. 7 CFR 51.2284 - Size classification.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Size classification. 51.2284 Section 51.2284... Size classification. The following classifications are provided to describe the size of any lot... shall conform to the requirements of the specified classification as defined below: (a) Halves. Lot...

  14. 7 CFR 51.2284 - Size classification.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Size classification. 51.2284 Section 51.2284...) Size Requirements § 51.2284 Size classification. The following classifications are provided to describe... of kernels in the lot shall conform to the requirements of the specified classification as defined...

  15. 28 CFR 524.73 - Classification procedures.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 28 Judicial Administration 2 2014-07-01 2014-07-01 false Classification procedures. 524.73 Section..., CLASSIFICATION, AND TRANSFER CLASSIFICATION OF INMATES Central Inmate Monitoring (CIM) System § 524.73 Classification procedures. (a) Initial assignment. Except as provided for in paragraphs (a) (1) through (4) of...

  16. 32 CFR 2001.10 - Classification standards.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 6 2012-07-01 2012-07-01 false Classification standards. 2001.10 Section 2001... Classification § 2001.10 Classification standards. Identifying or describing damage to the national security. Section 1.1(a) of the Order specifies the conditions that must be met when making classification decisions...

  17. 7 CFR 28.911 - Review classification.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Review classification. 28.911 Section 28.911... REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Cotton Classification and Market News Service for Producers Classification § 28.911 Review classification. (a) A producer may request one review...

  18. 7 CFR 51.1402 - Size classification.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Size classification. 51.1402 Section 51.1402... Classification § 51.1402 Size classification. Size of pecans may be specified in connection with the grade in accordance with one of the following classifications. To meet the requirements for any one of these...

  19. 28 CFR 345.20 - Position classification.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Position classification. 345.20 Section... INDUSTRIES (FPI) INMATE WORK PROGRAMS Position Classification § 345.20 Position classification. (a) Inmate... the objectives and principles of pay classification as a part of the routine orientation of new FPI...

  20. 6 CFR 7.26 - Derivative classification.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 6 Domestic Security 1 2013-01-01 2013-01-01 false Derivative classification. 7.26 Section 7.26... INFORMATION Classified Information § 7.26 Derivative classification. (a) Derivative classification is defined... already classified, and marking the newly developed material consistent with the classification markings...

  1. 37 CFR 2.85 - Classification schedules.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Classification schedules. 2..., DEPARTMENT OF COMMERCE RULES OF PRACTICE IN TRADEMARK CASES Classification § 2.85 Classification schedules. (a) International classification system. Section 6.1 of this chapter sets forth the international...

  2. 7 CFR 51.2284 - Size classification.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Size classification. 51.2284 Section 51.2284...) Size Requirements § 51.2284 Size classification. The following classifications are provided to describe... of kernels in the lot shall conform to the requirements of the specified classification as defined...

  3. 43 CFR 2461.2 - Classifications.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 43 Public Lands: Interior 2 2012-10-01 2012-10-01 false Classifications. 2461.2 Section 2461.2..., DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.2 Classifications. Not less than 60 days after publication of the...

  4. 43 CFR 2461.2 - Classifications.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 43 Public Lands: Interior 2 2013-10-01 2013-10-01 false Classifications. 2461.2 Section 2461.2..., DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) BUREAU INITIATED CLASSIFICATION SYSTEM Multiple-Use Classification Procedures § 2461.2 Classifications. Not less than 60 days after publication of the...

  5. 12 CFR 1777.20 - Capital classifications.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 12 Banks and Banking 9 2012-01-01 2012-01-01 false Capital classifications. 1777.20 Section 1777... DEVELOPMENT SAFETY AND SOUNDNESS PROMPT CORRECTIVE ACTION Capital Classifications and Orders Under Section 1366 of the 1992 Act § 1777.20 Capital classifications. (a) Capital classifications after the effective...

  6. 28 CFR 524.73 - Classification procedures.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 28 Judicial Administration 2 2012-07-01 2012-07-01 false Classification procedures. 524.73 Section..., CLASSIFICATION, AND TRANSFER CLASSIFICATION OF INMATES Central Inmate Monitoring (CIM) System § 524.73 Classification procedures. (a) Initial assignment. Except as provided for in paragraphs (a) (1) through (4) of...

  7. 7 CFR 51.1402 - Size classification.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Size classification. 51.1402 Section 51.1402... Classification § 51.1402 Size classification. Size of pecans may be specified in connection with the grade in accordance with one of the following classifications. To meet the requirements for any one of these...

  8. 28 CFR 524.73 - Classification procedures.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 28 Judicial Administration 2 2013-07-01 2013-07-01 false Classification procedures. 524.73 Section..., CLASSIFICATION, AND TRANSFER CLASSIFICATION OF INMATES Central Inmate Monitoring (CIM) System § 524.73 Classification procedures. (a) Initial assignment. Except as provided for in paragraphs (a) (1) through (4) of...

  9. 7 CFR 51.1436 - Color classifications.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Color classifications. 51.1436 Section 51.1436... Classifications § 51.1436 Color classifications. (a) The skin color of pecan kernels may be described in terms of the color classifications provided in this section. When the color of kernels in a lot generally...

  10. 7 CFR 51.3198 - Size classifications.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Size classifications. 51.3198 Section 51.3198... Onions Size Classifications § 51.3198 Size classifications. Size shall be specified in connection with... certain size or larger, or in accordance with one of the size classifications listed below: Provided, that...

  11. 37 CFR 2.85 - Classification schedules.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Classification schedules. 2..., DEPARTMENT OF COMMERCE RULES OF PRACTICE IN TRADEMARK CASES Classification § 2.85 Classification schedules. (a) International classification system. Section 6.1 of this chapter sets forth the international...

  12. 7 CFR 51.1436 - Color classifications.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Color classifications. 51.1436 Section 51.1436... Classifications § 51.1436 Color classifications. (a) The skin color of pecan kernels may be described in terms of the color classifications provided in this section. When the color of kernels in a lot generally...

  13. 7 CFR 51.2281 - Color classifications.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Color classifications. 51.2281 Section 51.2281...) Color Requirements § 51.2281 Color classifications. The following classifications are provided to... the lot shall not be darker than the darkest color permitted in the specified classification as shown...

  14. 6 CFR 7.26 - Derivative classification.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 6 Domestic Security 1 2014-01-01 2014-01-01 false Derivative classification. 7.26 Section 7.26... INFORMATION Classified Information § 7.26 Derivative classification. (a) Derivative classification is defined... already classified, and marking the newly developed material consistent with the classification markings...

  15. 37 CFR 2.85 - Classification schedules.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Classification schedules. 2..., DEPARTMENT OF COMMERCE RULES OF PRACTICE IN TRADEMARK CASES Classification § 2.85 Classification schedules. (a) International classification system. Section 6.1 of this chapter sets forth the international...

  16. 43 CFR 2410.1 - All classifications.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... 43 Public Lands: Interior 2 2014-10-01 2014-10-01 false All classifications. 2410.1 Section 2410.1..., DEPARTMENT OF THE INTERIOR LAND RESOURCE MANAGEMENT (2000) CRITERIA FOR ALL LAND CLASSIFICATIONS General Criteria § 2410.1 All classifications. All classifications under the regulations of this part will give due...

  17. 22 CFR 9.8 - Classification challenges.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 22 Foreign Relations 1 2013-04-01 2013-04-01 false Classification challenges. 9.8 Section 9.8 Foreign Relations DEPARTMENT OF STATE GENERAL SECURITY INFORMATION REGULATIONS § 9.8 Classification... classification status is improper are expected and encouraged to challenge the classification status of the...

  18. 32 CFR 2001.21 - Original classification.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 32 National Defense 6 2013-07-01 2013-07-01 false Original classification. 2001.21 Section 2001.21... Markings § 2001.21 Original classification. (a) Primary markings. At the time of original classification, the following shall be indicated in a manner that is immediately apparent: (1) Classification...

  19. 7 CFR 51.2281 - Color classifications.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 2 2013-01-01 2013-01-01 false Color classifications. 51.2281 Section 51.2281...) Color Requirements § 51.2281 Color classifications. The following classifications are provided to... the lot shall not be darker than the darkest color permitted in the specified classification as shown...

  20. 7 CFR 28.911 - Review classification.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Review classification. 28.911 Section 28.911... REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Cotton Classification and Market News Service for Producers Classification § 28.911 Review classification. (a) A producer may request one review...

  1. 7 CFR 51.2836 - Size classifications.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 2 2014-01-01 2014-01-01 false Size classifications. 51.2836 Section 51.2836...-Granex-Grano and Creole Types) Size Classifications § 51.2836 Size classifications. The size of onions may be specified in accordance with one of the following classifications. Size designation Minimum...

  2. 22 CFR 9.8 - Classification challenges.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 1 2014-04-01 2014-04-01 false Classification challenges. 9.8 Section 9.8 Foreign Relations DEPARTMENT OF STATE GENERAL SECURITY INFORMATION REGULATIONS § 9.8 Classification... classification status is improper are expected and encouraged to challenge the classification status of the...

  3. 32 CFR 2001.21 - Original classification.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 32 National Defense 6 2014-07-01 2014-07-01 false Original classification. 2001.21 Section 2001.21... Markings § 2001.21 Original classification. (a) Primary markings. At the time of original classification, the following shall be indicated in a manner that is immediately apparent: (1) Classification...

  4. 7 CFR 51.1436 - Color classifications.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Color classifications. 51.1436 Section 51.1436... STANDARDS) United States Standards for Grades of Shelled Pecans Color Classifications § 51.1436 Color classifications. (a) The skin color of pecan kernels may be described in terms of the color classifications...

  5. 7 CFR 51.2281 - Color classifications.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 2 2012-01-01 2012-01-01 false Color classifications. 51.2281 Section 51.2281... Color classifications. The following classifications are provided to describe the color of any lot... than the darkest color permitted in the specified classification as shown on the color chart. ...

  6. 22 CFR 9.6 - Derivative classification.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 22 Foreign Relations 1 2014-04-01 2014-04-01 false Derivative classification. 9.6 Section 9.6... classification. (a) Definition. Derivative classification is the incorporating, paraphrasing, restating or... with the classification of the source material. Duplication or reproduction of existing classified...

  7. 32 CFR 2001.21 - Original classification.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 32 National Defense 6 2012-07-01 2012-07-01 false Original classification. 2001.21 Section 2001.21... Markings § 2001.21 Original classification. (a) Primary markings. At the time of original classification, the following shall be indicated in a manner that is immediately apparent: (1) Classification...

  8. Prototype Expert System for Climate Classification.

    ERIC Educational Resources Information Center

    Harris, Clay

    Many students find climate classification laborious and time-consuming, and through their lack of repetition fail to grasp the details of classification. This paper describes an expert system for climate classification that is being developed at Middle Tennessee State University. Topics include: (1) an introduction to the nature of classification,…

  9. 7 CFR 51.1436 - Color classifications.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Color classifications. 51.1436 Section 51.1436... STANDARDS) United States Standards for Grades of Shelled Pecans Color Classifications § 51.1436 Color classifications. (a) The skin color of pecan kernels may be described in terms of the color classifications...

  10. 5 CFR 1312.7 - Derivative classification.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ..., DOWNGRADING, DECLASSIFICATION AND SAFEGUARDING OF NATIONAL SECURITY INFORMATION Classification and Declassification of National Security Information § 1312.7 Derivative classification. A derivative classification... 5 Administrative Personnel 3 2010-01-01 2010-01-01 false Derivative classification. 1312.7 Section...

  11. 37 CFR 2.85 - Classification schedules.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Classification schedules. 2..., DEPARTMENT OF COMMERCE RULES OF PRACTICE IN TRADEMARK CASES Classification § 2.85 Classification schedules. (a) International classification system. Section 6.1 of this chapter sets forth the international...

  12. 12 CFR 1777.20 - Capital classifications.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 12 Banks and Banking 7 2011-01-01 2011-01-01 false Capital classifications. 1777.20 Section 1777... DEVELOPMENT SAFETY AND SOUNDNESS PROMPT CORRECTIVE ACTION Capital Classifications and Orders Under Section 1366 of the 1992 Act § 1777.20 Capital classifications. (a) Capital classifications after the effective...

  13. 37 CFR 2.85 - Classification schedules.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Classification schedules. 2..., DEPARTMENT OF COMMERCE RULES OF PRACTICE IN TRADEMARK CASES Classification § 2.85 Classification schedules. (a) International classification system. Section 6.1 of this chapter sets forth the international...

  14. 22 CFR 9.8 - Classification challenges.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Classification challenges. 9.8 Section 9.8 Foreign Relations DEPARTMENT OF STATE GENERAL SECURITY INFORMATION REGULATIONS § 9.8 Classification... classification status is improper are expected and encouraged to challenge the classification status of the...

  15. 32 CFR 2001.10 - Classification standards.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Classification standards. 2001.10 Section 2001... Classification § 2001.10 Classification standards. Identifying or describing damage to the national security. Section 1.1(a) of the Order specifies the conditions that must be met when making classification decisions...

  16. 32 CFR 2001.21 - Original classification.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 32 National Defense 6 2011-07-01 2011-07-01 false Original classification. 2001.21 Section 2001.21... Markings § 2001.21 Original classification. (a) Primary markings. At the time of original classification, the following shall be indicated in a manner that is immediately apparent: (1) Classification...

  17. 49 CFR 8.17 - Classification challenges.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 49 Transportation 1 2010-10-01 2010-10-01 false Classification challenges. 8.17 Section 8.17 Transportation Office of the Secretary of Transportation CLASSIFIED INFORMATION: CLASSIFICATION/DECLASSIFICATION/ACCESS Classification/Declassification of Information § 8.17 Classification challenges. (a) Authorized...

  18. 28 CFR 345.20 - Position classification.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 28 Judicial Administration 2 2011-07-01 2011-07-01 false Position classification. 345.20 Section... INDUSTRIES (FPI) INMATE WORK PROGRAMS Position Classification § 345.20 Position classification. (a) Inmate... the objectives and principles of pay classification as a part of the routine orientation of new FPI...

  19. 7 CFR 51.1436 - Color classifications.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Color classifications. 51.1436 Section 51.1436... STANDARDS) United States Standards for Grades of Shelled Pecans Color Classifications § 51.1436 Color classifications. (a) The skin color of pecan kernels may be described in terms of the color classifications...

  20. 22 CFR 9.8 - Classification challenges.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 1 2011-04-01 2011-04-01 false Classification challenges. 9.8 Section 9.8 Foreign Relations DEPARTMENT OF STATE GENERAL SECURITY INFORMATION REGULATIONS § 9.8 Classification... classification status is improper are expected and encouraged to challenge the classification status of the...

  1. 32 CFR 2700.22 - Classification guides.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 32 National Defense 6 2011-07-01 2011-07-01 false Classification guides. 2700.22 Section 2700.22... SECURITY INFORMATION REGULATIONS Derivative Classification § 2700.22 Classification guides. OMSN shall issue classification guides pursuant to section 2-2 of E.O. 12065. These guides, which shall be used to...

  2. 28 CFR 345.20 - Position classification.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Position classification. 345.20 Section... INDUSTRIES (FPI) INMATE WORK PROGRAMS Position Classification § 345.20 Position classification. (a) Inmate... the objectives and principles of pay classification as a part of the routine orientation of new FPI...

  3. 7 CFR 51.2284 - Size classification.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Size classification. 51.2284 Section 51.2284... Size classification. The following classifications are provided to describe the size of any lot... shall conform to the requirements of the specified classification as defined below: (a) Halves. Lot...

  4. 7 CFR 51.2836 - Size classifications.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Size classifications. 51.2836 Section 51.2836...) Size Classifications § 51.2836 Size classifications. The size of onions may be specified in accordance with one of the following classifications. Size designation Minimum diameter Inches Millimeters Maximum...

  5. 7 CFR 28.911 - Review classification.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Review classification. 28.911 Section 28.911... REGULATIONS COTTON CLASSING, TESTING, AND STANDARDS Cotton Classification and Market News Service for Producers Classification § 28.911 Review classification. (a) A producer may request one review...

  6. 22 CFR 9.6 - Derivative classification.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 22 Foreign Relations 1 2011-04-01 2011-04-01 false Derivative classification. 9.6 Section 9.6... classification. (a) Definition. Derivative classification is the incorporating, paraphrasing, restating or... with the classification of the source material. Duplication or reproduction of existing classified...

  7. 7 CFR 51.2836 - Size classifications.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Size classifications. 51.2836 Section 51.2836...) Size Classifications § 51.2836 Size classifications. The size of onions may be specified in accordance with one of the following classifications. Size designation Minimum diameter Inches Millimeters Maximum...

  8. 7 CFR 51.2281 - Color classifications.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Color classifications. 51.2281 Section 51.2281... Color classifications. The following classifications are provided to describe the color of any lot... than the darkest color permitted in the specified classification as shown on the color chart. ...

  9. 32 CFR 2001.10 - Classification standards.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 32 National Defense 6 2011-07-01 2011-07-01 false Classification standards. 2001.10 Section 2001... Classification § 2001.10 Classification standards. Identifying or describing damage to the national security. Section 1.1(a) of the Order specifies the conditions that must be met when making classification decisions...

  10. 28 CFR 524.73 - Classification procedures.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 28 Judicial Administration 2 2011-07-01 2011-07-01 false Classification procedures. 524.73 Section..., CLASSIFICATION, AND TRANSFER CLASSIFICATION OF INMATES Central Inmate Monitoring (CIM) System § 524.73 Classification procedures. (a) Initial assignment. Except as provided for in paragraphs (a) (1) through (4) of...

  11. 5 CFR 9701.221 - Classification requirements.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 5 Administrative Personnel 3 2010-01-01 2010-01-01 false Classification requirements. 9701.221 Section 9701.221 Administrative Personnel DEPARTMENT OF HOMELAND SECURITY HUMAN RESOURCES MANAGEMENT... HUMAN RESOURCES MANAGEMENT SYSTEM Classification Classification Process § 9701.221 Classification...

  12. Extracting biomedical events from pairs of text entities

    PubMed Central

    2015-01-01

    Background Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline. Results We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition. PMID:26201478

  13. Ensemble methods with simple features for document zone classification

    NASA Astrophysics Data System (ADS)

    Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing

    2012-01-01

    Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.

  14. Analysis of Nature of Science Included in Recent Popular Writing Using Text Mining Techniques

    ERIC Educational Resources Information Center

    Jiang, Feng; McComas, William F.

    2014-01-01

    This study examined the inclusion of nature of science (NOS) in popular science writing to determine whether it could serve supplementary resource for teaching NOS and to evaluate the accuracy of text mining and classification as a viable research tool in science education research. Four groups of documents published from 2001 to 2010 were…

  15. Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension

    ERIC Educational Resources Information Center

    Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

    2017-01-01

    This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…

  16. Improving Physics Texts: Students Speak Out.

    ERIC Educational Resources Information Center

    Guzzetti, Barbara J.; And Others

    1995-01-01

    Finds that high school physical science students prefer textbooks with expository text that not only gives a correct concept but refutes common incorrect ideas. Finds that writing a comprehensible text is more difficult than the researchers had imagined. (SR)

  17. Partition of Ni between olivine and sulfide: the effect of temperature, f_{{text{O}}_{text{2}} } and f_{{text{S}}_{text{2}} }

    NASA Astrophysics Data System (ADS)

    Fleet, M. E.; Macrae, N. D.

    1987-03-01

    The experimental distribution coefficient for Ni/ Fe exchange between olivine and monosulfide (KD3) is 35.6±1.1 at 1385° C, f_{{text{O}}_{text{2}} } = 10^{ - 8.87} ,f_{{text{S}}_{text{2}} } = 10^{ - 1.02} , and olivine of composition Fo96 to Fo92. These are the physicochemical conditions appropriate to hypothesized sulfur-saturated komatiite magma. The present experiments equilibrated natural olivine grains with sulfide-oxide liquid in the presence of a (Mg, Fe)-alumino-silicate melt. By a variety of different experimental procedures, K D3 is shown to be essentially constant at about 30 to 35 in the temperature range 900 to 1400° C, for olivine of composition Fo97 to FoO, monosulfide composition with up to 70 mol. % NiS, and a wide range of f_{{text{O}}_{text{2}} } and f_{{text{S}}_{text{2}} }.

  18. TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources

    PubMed Central

    2011-01-01

    Background Several tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input. Results TRAM (Transcriptome Mapper) is a new general tool that allows the simple generation and analysis of quantitative transcriptome maps, starting from any source listing gene expression values for a given gene set (e.g. expression microarrays), implemented as a relational database. It includes a parser able to assign univocal and updated gene symbols to gene identifiers from different data sources. Moreover, TRAM is able to perform intra-sample and inter-sample data normalization, including an original variant of quantile normalization (scaled quantile), useful to normalize data from platforms with highly different numbers of investigated genes. When in 'Map' mode, the software generates a quantitative representation of the transcriptome of a sample (or of a pool of samples) and identifies if segments of defined lengths are over/under-expressed compared to the desired threshold. When in 'Cluster' mode, the software searches for a set of over/under-expressed consecutive genes. Statistical significance for all results is calculated with respect to genes localized on the same chromosome or to all genome genes. Transcriptome maps, showing differential expression between two sample groups, relative to two different biological conditions, may be easily generated. We present the results of a biological model test, based on a meta-analysis comparison between a sample pool of human CD34+ hematopoietic progenitor cells and a sample pool of megakaryocytic cells. Biologically relevant chromosomal segments and gene clusters with differential expression during the differentiation toward megakaryocyte were identified. Conclusions TRAM is designed to create, and statistically analyze, quantitative transcriptome maps, based on gene expression data from multiple sources. The release includes FileMaker Pro database management runtime application and it is freely available at http://apollo11.isto.unibo.it/software/, along with preconfigured implementations for mapping of human, mouse and zebrafish transcriptomes. PMID:21333005

  19. Detection and Prevention of Insider Threats in Database Driven Web Services

    NASA Astrophysics Data System (ADS)

    Chumash, Tzvi; Yao, Danfeng

    In this paper, we take the first step to address the gap between the security needs in outsourced hosting services and the protection provided in the current practice. We consider both insider and outsider attacks in the third-party web hosting scenarios. We present SafeWS, a modular solution that is inserted between server side scripts and databases in order to prevent and detect website hijacking and unauthorized access to stored data. To achieve the required security, SafeWS utilizes a combination of lightweight cryptographic integrity and encryption tools, software engineering techniques, and security data management principles. We also describe our implementation of SafeWS and its evaluation. The performance analysis of our prototype shows the overhead introduced by security verification is small. SafeWS will allow business owners to significantly reduce the security risks and vulnerabilities of outsourcing their sensitive customer data to third-party providers.

  20. University Real Estate Development Database: A Database-Driven Internet Research Tool

    ERIC Educational Resources Information Center

    Wiewel, Wim; Kunst, Kara

    2008-01-01

    The University Real Estate Development Database is an Internet resource developed by the University of Baltimore for the Lincoln Institute of Land Policy, containing over six hundred cases of university expansion outside of traditional campus boundaries. The University Real Estate Development database is a searchable collection of real estate…

  1. General, database-driven fast-feedback system for the Stanford Linear Collider

    SciTech Connect

    Rouse, F.; Allison, S.; Castillo, S.

    A new feedback system has been developed for stabilizing the SLC beams at many locations. The feedback loops are designed to sample and correct at the 60 Hz repetition rate of the accelerator. Each loop can be distributed across several of the standard 80386 microprocessors which control the SLC hardware. A new communications system, KISNet, has been implemented to pass signals between the microprocessors at this rate. The software is written in a general fashion using the state space formalism of digital control theory. This allows a new loop to be implemented by just setting up the online database andmore » perhaps installing a communications link. 3 refs., 4 figs.« less

  2. Computerized Design Synthesis (CDS), A database-driven multidisciplinary design tool

    NASA Technical Reports Server (NTRS)

    Anderson, D. M.; Bolukbasi, A. O.

    1989-01-01

    The Computerized Design Synthesis (CDS) system under development at McDonnell Douglas Helicopter Company (MDHC) is targeted to make revolutionary improvements in both response time and resource efficiency in the conceptual and preliminary design of rotorcraft systems. It makes the accumulated design database and supporting technology analysis results readily available to designers and analysts of technology, systems, and production, and makes powerful design synthesis software available in a user friendly format.

  3. Assessing Text Readability Using Cognitively Based Indices

    ERIC Educational Resources Information Center

    Crossley, Scott A.; Greenfield, Jerry; McNamara, Danielle S.

    2008-01-01

    Many programs designed to compute the readability of texts are narrowly based on surface-level linguistic features and take too little account of the processes which a reader brings to the text. This study is an exploratory examination of the use of Coh-Metrix, a computational tool that measures cohesion and text difficulty at various levels of…

  4. Interdisciplinary Approach to Understanding Literary Texts

    ERIC Educational Resources Information Center

    Dossanova, Altynay Zh.; Ismakova, Bibissara S.; Tapanova, Saule E.; Ayupova, Gulbagira K.; Gotting, Valentina V.; Kaltayeva, Gulnar K.

    2016-01-01

    The primary purpose is the implementation of the interdisciplinary approach to understanding and the construction of integrative models of understanding literary texts. The interdisciplinary methodological paradigm of studying text understanding, based on the principles of various sciences facilitating the identification of the text understanding…

  5. Validating Automated Measures of Text Complexity

    ERIC Educational Resources Information Center

    Sheehan, Kathleen M.

    2017-01-01

    Automated text complexity measurement tools (also called readability metrics) have been proposed as a way to help teachers, textbook publishers, and assessment developers select texts that are closely aligned with the new, more demanding text complexity expectations specified in the Common Core State Standards. This article examines a critical…

  6. Text-Frame Relationships and ESL.

    ERIC Educational Resources Information Center

    Burquest, Donald A.; Henry, Floreen Barger

    The relationship of contextual background to comprehension of written texts is discussed with reference to instruction in English as a second language (ESL). A theory advanced by Kerry Stewart Robichaux proposes six possible relationships between a text and its cultural "frame." Applications of the six text-frame relationships to foreign…

  7. Teacher Modeling Using Complex Informational Texts

    ERIC Educational Resources Information Center

    Fisher, Douglas; Frey, Nancy

    2015-01-01

    Modeling in complex texts requires that teachers analyze the text for factors of qualitative complexity and then design lessons that introduce students to that complexity. In addition, teachers can model the disciplinary nature of content area texts as well as word solving and comprehension strategies. Included is a planning guide for think aloud.

  8. Creating and Using Culturally Sustaining Informational Texts

    ERIC Educational Resources Information Center

    Watanabe Kganetso, Lynne M.

    2017-01-01

    Current standards and assessments emphasize the importance of a variety of genres in students' literacy diets, which has placed increased attention on informational texts. Unfortunately, young students' current exposure to and experiences with informational texts are often limited by the texts' availability, quality, and relevance to children's…

  9. Does Writing Summaries Improve Memory for Text?

    ERIC Educational Resources Information Center

    Spirgel, Arie S.; Delaney, Peter F.

    2016-01-01

    In five experiments, we consistently found that items included in summaries were better remembered than items omitted from summaries. We did not, however, find evidence that summary writing was better than merely restudying the text. These patterns held with shorter and longer texts, when the text was present or absent during the summary writing,…

  10. Text Complexity: Primary Teachers' Views

    ERIC Educational Resources Information Center

    Fitzgerald, Jill; Hiebert, Elfrieda H.; Bowen, Kimberly; Relyea-Kim, E. Jackie; Kung, Melody; Elmore, Jeff

    2015-01-01

    The research question was, "What text characteristics do primary teachers think are most important for early grades text complexity?" Teachers from across the United States accomplished a two-part task. First, to stimulate teachers' thinking about important text characteristics, primary teachers completed an online paired-text…

  11. Text Dependent Questions and the CCSS

    ERIC Educational Resources Information Center

    Aspen Institute, 2012

    2012-01-01

    An effective text dependent question first and foremost embraces the key principle of close reading embedded in the Common Core State Standards (CCSS) Anchor Reading Standards by asking students to provide evidence from complex text and draw inferences based on what the text explicitly says (Standards 1 and 10). A close look at the intervening…

  12. Refutation Texts for Effective Climate Change Education

    ERIC Educational Resources Information Center

    Nussbaum, E. Michael; Cordova, Jacqueline R.; Rehmat, Abeera P.

    2017-01-01

    Refutation texts, which are texts that rebut scientific misconceptions and explain the normative concept, can be effective devices for addressing misconceptions and affecting conceptual change. However, few, if any, refutation texts specifically related to climate change have been validated for effectiveness. In this project, we developed and…

  13. The Costs of Texting in the Classroom

    ERIC Educational Resources Information Center

    Lawson, Dakota; Henderson, Bruce B.

    2015-01-01

    Many college students seem to find it impossible to resist the temptation to text on electronic devices during class lectures and discussions. One common response of college professors is to yield to the inevitable and try to ignore student texting. However, research indicates that because of limited cognitive capacities, even simple texting can…

  14. Evaluation Methods of The Text Entities

    ERIC Educational Resources Information Center

    Popa, Marius

    2006-01-01

    The paper highlights some evaluation methods to assess the quality characteristics of the text entities. The main concepts used in building and evaluation processes of the text entities are presented. Also, some aggregated metrics for orthogonality measurements are presented. The evaluation process for automatic evaluation of the text entities is…

  15. The Ecological Approach to Text Visualization.

    ERIC Educational Resources Information Center

    Wise, James A.

    1999-01-01

    Presents both theoretical and technical bases on which to build a "science of text visualization." The Spatial Paradigm for Information Retrieval and Exploration (SPIRE) text-visualization system, which images information from free-text documents as natural terrains, serves as an example of the "ecological approach" in its visual metaphor, its…

  16. Youth Texting: Help or Hindrance to Literacy?

    ERIC Educational Resources Information Center

    Zebroff, Dmitri

    2018-01-01

    An extensive amount of research has been performed in recent years into the widespread practice of text messaging in youth. As part of this broad area of research, the associations between youth texting and literacy have been investigated in a variety of contexts. A comprehensive, semi-systematic review of the literature into texting and literacy…

  17. Text Messaging for Student Communication and Voting

    ERIC Educational Resources Information Center

    McClean, Stephen; Hagan, Paul; Morgan, Jason

    2010-01-01

    Text messaging has gained widespread popularity in higher education as a communication tool and as a means of engaging students in the learning process. In this study we report on the use of text messaging in a large, year-one introductory chemistry module where students were encouraged to send questions and queries to a dedicated text number both…

  18. The Use of Summaries in Studying Texts.

    ERIC Educational Resources Information Center

    Duchastel, Philippe C.

    1983-01-01

    Presents a scheme for comparing the text-learning outcomes derivable from study of either text or a summary of the text and considers some practical study strategies students might adopt when summaries are available and when they are not. The value of summaries in instructional situations is discussed. (MBR)

  19. Summarization as the base for text assessment

    NASA Astrophysics Data System (ADS)

    Karanikolas, Nikitas N.

    2015-02-01

    We present a model that apply shallow text summarization as a cheap (in resources needed) process for Automatic (machine based) free text answer Assessment (AA). The evaluation of the proposed method induces the inference that the Conventional Assessment (CA, man made assessment of free text answers) does not have an obvious mechanical replacement. However, this is a research challenge.

  20. Academic Journal Embargoes and Full Text Databases.

    ERIC Educational Resources Information Center

    Brooks, Sam

    2003-01-01

    Documents the reasons for embargoes of academic journals in full text databases (i.e., publisher-imposed delays on the availability of full text content) and provides insight regarding common misconceptions. Tables present data on selected journals covering a cross-section of subjects and publishers and comparing two full text business databases.…

  1. A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification

    DTIC Science & Technology

    2016-07-01

    financial predictions, etc. and is finding growing use in text mining studies. In this paper, we present an efficient algorithm for classification of high...video data, set of images, hyperspectral data, medical data, text data, etc. Moreover, the framework provides a way to analyze data whose different...also be incorporated. For text classification, one can use tfidf (term frequency inverse document frequency) to form feature vectors for each document

  2. Entanglement classification with algebraic geometry

    NASA Astrophysics Data System (ADS)

    Sanz, M.; Braak, D.; Solano, E.; Egusquiza, I. L.

    2017-05-01

    We approach multipartite entanglement classification in the symmetric subspace in terms of algebraic geometry, its natural language. We show that the class of symmetric separable states has the structure of a Veronese variety and that its k-secant varieties are SLOCC invariants. Thus SLOCC classes gather naturally into families. This classification presents useful properties such as a linear growth of the number of families with the number of particles, and nesting, i.e. upward consistency of the classification. We attach physical meaning to this classification through the required interaction length of parent Hamiltonians. We show that the states W N and GHZ N are in the same secant family and that, effectively, the former can be obtained in a limit from the latter. This limit is understood in terms of tangents, leading to a refinement of the previous families. We compute explicitly the classification of symmetric states with N≤slant4 qubits in terms of both secant families and its refinement using tangents. This paves the way to further use of projective varieties in algebraic geometry to solve open problems in entanglement theory.

  3. New Features for Neuron Classification.

    PubMed

    Hernández-Pérez, Leonardo A; Delgado-Castillo, Duniel; Martín-Pérez, Rainer; Orozco-Morales, Rubén; Lorenzo-Ginori, Juan V

    2018-04-28

    This paper addresses the problem of obtaining new neuron features capable of improving results of neuron classification. Most studies on neuron classification using morphological features have been based on Euclidean geometry. Here three one-dimensional (1D) time series are derived from the three-dimensional (3D) structure of neuron instead, and afterwards a spatial time series is finally constructed from which the features are calculated. Digitally reconstructed neurons were separated into control and pathological sets, which are related to three categories of alterations caused by epilepsy, Alzheimer's disease (long and local projections), and ischemia. These neuron sets were then subjected to supervised classification and the results were compared considering three sets of features: morphological, features obtained from the time series and a combination of both. The best results were obtained using features from the time series, which outperformed the classification using only morphological features, showing higher correct classification rates with differences of 5.15, 3.75, 5.33% for epilepsy and Alzheimer's disease (long and local projections) respectively. The morphological features were better for the ischemia set with a difference of 3.05%. Features like variance, Spearman auto-correlation, partial auto-correlation, mutual information, local minima and maxima, all related to the time series, exhibited the best performance. Also we compared different evaluators, among which ReliefF was the best ranked.

  4. Featureless classification of light curves

    NASA Astrophysics Data System (ADS)

    Kügler, S. D.; Gianniotis, N.; Polsterer, K. L.

    2015-08-01

    In the era of rapidly increasing amounts of time series data, classification of variable objects has become the main objective of time-domain astronomy. Classification of irregularly sampled time series is particularly difficult because the data cannot be represented naturally as a vector which can be directly fed into a classifier. In the literature, various statistical features serve as vector representations. In this work, we represent time series by a density model. The density model captures all the information available, including measurement errors. Hence, we view this model as a generalization to the static features which directly can be derived, e.g. as moments from the density. Similarity between each pair of time series is quantified by the distance between their respective models. Classification is performed on the obtained distance matrix. In the numerical experiments, we use data from the OGLE (Optical Gravitational Lensing Experiment) and ASAS (All Sky Automated Survey) surveys and demonstrate that the proposed representation performs up to par with the best currently used feature-based approaches. The density representation preserves all static information present in the observational data, in contrast to a less-complete description by features. The density representation is an upper boundary in terms of information made available to the classifier. Consequently, the predictive power of the proposed classification depends on the choice of similarity measure and classifier, only. Due to its principled nature, we advocate that this new approach of representing time series has potential in tasks beyond classification, e.g. unsupervised learning.

  5. Improving text recall with multiple summaries.

    PubMed

    van der Meij, Hans; van der Meij, Jan

    2012-06-01

    QuikScan (QS) is an innovative design that aims to improve accessibility, comprehensibility, and subsequent recall of expository text by means of frequent within-document summaries that are formatted as numbered list items. The numbers in the QS summaries correspond to numbers placed in the body of the document where the summarized ideas are discussed in full. To examine the influence of QS summaries on participants' perceptions of text quality (i.e., comprehensibility, structure, and interest) and recall, an experimental - control group design compared the effects of a QS text with a structured abstract (SA) text. Forty psychology students participated voluntarily or received course credits. Students first read a control (SA) or experimental (QS) text on flashbulb memory (FBM). Next, their perceptions of text quality were measured through a questionnaire. Recall was assessed with an open answer test with items for facts, comprehension and higher order information. Perceptions of text quality did not vary across conditions. But QS did lead to significantly and substantially (d= 1.57) higher overall recall scores. Participants with the QS text performed significantly better on all item types than participants with the SA text. Studying a QS text led to a substantial improvement in recall compared to an SA text. Further research is needed to examine how readers study QS texts and whether a text model hypothesis or a repetition effect hypothesis accounts for the effectiveness. The first hypothesis posits that the QS summaries support the reader in constructing a text schema. The second attributes the effects of these summaries to their repetition of text topics. ©2011 The British Psychological Society.

  6. Content Abstract Classification Using Naive Bayes

    NASA Astrophysics Data System (ADS)

    Latif, Syukriyanto; Suwardoyo, Untung; Aldrin Wihelmus Sanadi, Edwin

    2018-03-01

    This study aims to classify abstract content based on the use of the highest number of words in an abstract content of the English language journals. This research uses a system of text mining technology that extracts text data to search information from a set of documents. Abstract content of 120 data downloaded at www.computer.org. Data grouping consists of three categories: DM (Data Mining), ITS (Intelligent Transport System) and MM (Multimedia). Systems built using naive bayes algorithms to classify abstract journals and feature selection processes using term weighting to give weight to each word. Dimensional reduction techniques to reduce the dimensions of word counts rarely appear in each document based on dimensional reduction test parameters of 10% -90% of 5.344 words. The performance of the classification system is tested by using the Confusion Matrix based on comparative test data and test data. The results showed that the best classification results were obtained during the 75% training data test and 25% test data from the total data. Accuracy rates for categories of DM, ITS and MM were 100%, 100%, 86%. respectively with dimension reduction parameters of 30% and the value of learning rate between 0.1-0.5.

  7. Automating document classification for the Immune Epitope Database

    PubMed Central

    Wang, Peng; Morgan, Alexander A; Zhang, Qing; Sette, Alessandro; Peters, Bjoern

    2007-01-01

    Background The Immune Epitope Database contains information on immune epitopes curated manually from the scientific literature. Like similar projects in other knowledge domains, significant effort is spent on identifying which articles are relevant for this purpose. Results We here report our experience in automating this process using Naïve Bayes classifiers trained on 20,910 abstracts classified by domain experts. Improvements on the basic classifier performance were made by a) utilizing information stored in PubMed beyond the abstract itself b) applying standard feature selection criteria and c) extracting domain specific feature patterns that e.g. identify peptides sequences. We have implemented the classifier into the curation process determining if abstracts are clearly relevant, clearly irrelevant, or if no certain classification can be made, in which case the abstracts are manually classified. Testing this classification scheme on an independent dataset, we achieve 95% sensitivity and specificity in the 51.1% of abstracts that were automatically classified. Conclusion By implementing text classification, we have sped up the reference selection process without sacrificing sensitivity or specificity of the human expert classification. This study provides both practical recommendations for users of text classification tools, as well as a large dataset which can serve as a benchmark for tool developers. PMID:17655769

  8. Figure Text Extraction in Biomedical Literature

    PubMed Central

    Kim, Daehyun; Yu, Hong

    2011-01-01

    Background Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures. Methodology We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons. Results/Conclusions The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19

  9. Figure text extraction in biomedical literature.

    PubMed

    Kim, Daehyun; Yu, Hong

    2011-01-13

    Figures are ubiquitous in biomedical full-text articles, and they represent important biomedical knowledge. However, the sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Therefore, we are developing the Biomedical Figure Search engine (http://figuresearch.askHERMES.org) to allow bioscientists to access figures efficiently. Since text frequently appears in figures, automatically extracting such text may assist the task of mining information from figures. Little research, however, has been conducted exploring text extraction from biomedical figures. We first evaluated an off-the-shelf Optical Character Recognition (OCR) tool on its ability to extract text from figures appearing in biomedical full-text articles. We then developed a Figure Text Extraction Tool (FigTExT) to improve the performance of the OCR tool for figure text extraction through the use of three innovative components: image preprocessing, character recognition, and text correction. We first developed image preprocessing to enhance image quality and to improve text localization. Then we adapted the off-the-shelf OCR tool on the improved text localization for character recognition. Finally, we developed and evaluated a novel text correction framework by taking advantage of figure-specific lexicons. The evaluation on 382 figures (9,643 figure texts in total) randomly selected from PubMed Central full-text articles shows that FigTExT performed with 84% precision, 98% recall, and 90% F1-score for text localization and with 62.5% precision, 51.0% recall and 56.2% F1-score for figure text extraction. When limiting figure texts to those judged by domain experts to be important content, FigTExT performed with 87.3% precision, 68.8% recall, and 77% F1-score. FigTExT significantly improved the performance of the off-the-shelf OCR tool we used, which on its own performed with 36.6% precision, 19.3% recall, and 25.3% F1-score for text

  10. Bilingual Text Messaging Translation: Translating Text Messages From English Into Spanish for the Text4Walking Program

    PubMed Central

    Sandi, Giselle; Ingram, Diana; Welch, Mary Jane; Ocampo, Edith V

    2015-01-01

    Background Hispanic adults in the United States are at particular risk for diabetes and inadequate blood pressure control. Physical activity improves these health problems; however Hispanic adults also have a low rate of recommended aerobic physical activity. To address improving physical inactivity, one area of rapidly growing technology that can be utilized is text messaging (short message service, SMS). A physical activity research team, Text4Walking, had previously developed an initial database of motivational physical activity text messages in English that could be used for physical activity text messaging interventions. However, the team needed to translate these existing English physical activity text messages into Spanish in order to have culturally meaningful and useful text messages for those adults within the Hispanic population who would prefer to receive text messages in Spanish. Objective The aim of this study was to translate a database of English motivational physical activity messages into Spanish and review these text messages with a group of Spanish speaking adults to inform the use of these text messages in an intervention study. Methods The consent form and study documents, including the existing English physical activity text messages, were translated from English into Spanish, and received translation certification as well as Institutional Review Board approval. The translated text messages were placed into PowerPoint, accompanied by a set of culturally appropriate photos depicting barriers to walking, as well as walking scenarios. At the focus group, eligibility criteria for this study included being an adult between 30 to 65 years old who spoke Spanish as their primary language. After a general group introduction, participants were placed into smaller groups of two or three. Each small group was asked to review a segment of the translated text messages for accuracy and meaningfulness. After the break out, the group was brought back together

  11. Text-speak processing impairs tactile location.

    PubMed

    Head, James; Helton, William; Russell, Paul; Neumann, Ewald

    2012-09-01

    Dual task experiments have highlighted that driving while having a conversation on a cell phone can have negative impacts on driving (Strayer & Drews, 2007). It has also been noted that this negative impact is greater when reading a text-message (Lee, 2007). Commonly used in text-messaging are shortening devices collectively known as text-speak (e.g.,Ys I wll ttyl 2nite, Yes I will talk to you later tonight). To the authors' knowledge, there has been no investigation into the potential negative impacts of reading text-speak on concurrent performance on other tasks. Forty participants read a correctly spelled story and a story presented in text-speak while concurrently monitoring for a vibration around their waist. Slower reaction times and fewer correct vibration detections occurred while reading text-speak than while reading a correctly spelled story. The results suggest that reading text-speak imposes greater cognitive load than reading correctly spelled text. These findings suggest that the negative impact of text messaging on driving may be compounded by the messages being in text-speak, instead of orthographically correct text. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. CRIE: An automated analyzer for Chinese texts.

    PubMed

    Sung, Yao-Ting; Chang, Tao-Hsing; Lin, Wei-Chun; Hsieh, Kuan-Sheng; Chang, Kuo-En

    2016-12-01

    Textual analysis has been applied to various fields, such as discourse analysis, corpus studies, text leveling, and automated essay evaluation. Several tools have been developed for analyzing texts written in alphabetic languages such as English and Spanish. However, currently there is no tool available for analyzing Chinese-language texts. This article introduces a tool for the automated analysis of simplified and traditional Chinese texts, called the Chinese Readability Index Explorer (CRIE). Composed of four subsystems and incorporating 82 multilevel linguistic features, CRIE is able to conduct the major tasks of segmentation, syntactic parsing, and feature extraction. Furthermore, the integration of linguistic features with machine learning models enables CRIE to provide leveling and diagnostic information for texts in language arts, texts for learning Chinese as a foreign language, and texts with domain knowledge. The usage and validation of the functions provided by CRIE are also introduced.

  13. How accurate are lexile text measures?

    PubMed

    Stenner, A Jackson; Burdick, Hal; Sanford, Eleanor E; Burdick, Donald S

    2006-01-01

    The Lexile Framework for Reading models comprehension as the difference between a reader measure and a text measure. Uncertainty in comprehension rates results from unreliability in reader measures and inaccuracy in text readability measures. Whole-text processing eliminates sampling error in text measures. However, Lexile text measures are imperfect due to misspecification of the Lexile theory. The standard deviation component associated with theory misspecification is estimated at 64L for a standard-length passage (approximately 125 words). A consequence is that standard errors for longer texts (2,500 to 150,000 words) are measured on the Lexile scale with uncertainties in the single digits. Uncertainties in expected comprehension rates are largely due to imprecision in reader ability and not inaccuracies in text readabilities.

  14. 48 CFR 19.303 - Determining North American Industry Classification System (NAICS) codes and size standards.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... Industry Classification System (NAICS) codes and size standards. 19.303 Section 19.303 Federal Acquisition... of Small Business Status for Small Business Programs 19.303 Determining North American Industry... user, the added text is set forth as follows: 19.303 Determining North American Industry Classification...

  15. The Classification of Protein Domains.

    PubMed

    Dawson, Natalie; Sillitoe, Ian; Marsden, Russell L; Orengo, Christine A

    2017-01-01

    The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.

  16. Machine learning in soil classification.

    PubMed

    Bhattacharya, B; Solomatine, D P

    2006-03-01

    In a number of engineering problems, e.g. in geotechnics, petroleum engineering, etc. intervals of measured series data (signals) are to be attributed a class maintaining the constraint of contiguity and standard classification methods could be inadequate. Classification in this case needs involvement of an expert who observes the magnitude and trends of the signals in addition to any a priori information that might be available. In this paper, an approach for automating this classification procedure is presented. Firstly, a segmentation algorithm is developed and applied to segment the measured signals. Secondly, the salient features of these segments are extracted using boundary energy method. Based on the measured data and extracted features to assign classes to the segments classifiers are built; they employ Decision Trees, ANN and Support Vector Machines. The methodology was tested in classifying sub-surface soil using measured data from Cone Penetration Testing and satisfactory results were obtained.

  17. Classification of hyperlipidaemias and hyperlipoproteinaemias*

    PubMed Central

    1970-01-01

    Many studies of atherosclerosis have indicated hyperlipidaemia as a predisposing factor to vascular disease. The relationship holds even for mild degrees of hyperlipidaemia, a fact that underlines the importance of this category of disorders. Both primary and secondary hyperlipidaemias represent such a variety of abnormalities that an internationally acceptable provisional classification is highly desirable in order to facilitate communication between scientists with different backgrounds. The present memorandum presents such a classification; it briefly describes the criteria for diagnosis of the main types of hyperlipidaemia as well as the methods of their determination. Because lipoproteins offer more information than analysis of plasma lipids (most of the plasma lipids being bound to various proteins), the classification is based on lipoprotein analyses by electrophoresis and ultracentrifugation. Simpler methods, however, such as the observation of plasma and measurements of cholesterol and triglycerides, are used to the fullest possible extent in determining the lipoprotein patterns. PMID:4930042

  18. Automated Classification of Pathology Reports.

    PubMed

    Oleynik, Michel; Finger, Marcelo; Patrão, Diogo F C

    2015-01-01

    This work develops an automated classifier of pathology reports which infers the topography and the morphology classes of a tumor using codes from the International Classification of Diseases for Oncology (ICD-O). Data from 94,980 patients of the A.C. Camargo Cancer Center was used for training and validation of Naive Bayes classifiers, evaluated by the F1-score. Measures greater than 74% in the topographic group and 61% in the morphologic group are reported. Our work provides a successful baseline for future research for the classification of medical documents written in Portuguese and in other domains.

  19. A classification of ecological boundaries

    USGS Publications Warehouse

    Strayer, D.L.; Power, M.E.; Fagan, W.F.; Pickett, S.T.A.; Belnap, J.

    2003-01-01

    Ecologists use the term boundary to refer to a wide range of real and conceptual structures. Because imprecise terminology may impede the search for general patterns and theories about ecological boundaries, we present a classification of the attributes of ecological boundaries to aid in communication and theory development. Ecological boundaries may differ in their origin and maintenance, their spatial structure, their function, and their temporal dynamics. A classification system based on these attributes should help ecologists determine whether boundaries are truly comparable. This system can be applied when comparing empirical studies, comparing theories, and testing theoretical predictions against empirical results.

  20. Project implementation : classification of organic soils and classification of marls - training of INDOT personnel.

    DOT National Transportation Integrated Search

    2012-09-01

    This is an implementation project for the research completed as part of the following projects: SPR3005 Classification of Organic Soils : and SPR3227 Classification of Marl Soils. The methods developed for the classification of both soi...