Using phrases and document metadata to improve topic modeling of clinical reports.
Speier, William; Ong, Michael K; Arnold, Corey W
2016-06-01
Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient's medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts. In the proposed model, phrases are represented by chained n-grams and a Dirichlet hyper-parameter is weighted by both document-level and patient-level context. This method and three other Latent Dirichlet allocation models were fit to a large collection of clinical reports. Examples of resulting topics demonstrate the results of the new model and the quality of the representations are evaluated using empirical log likelihood. The proposed model was able to create informative prior probabilities based on patient and document information, and captured phrases that represented various clinical concepts. The representation using the proposed model had a significantly higher empirical log likelihood than the compared methods. Integrating document metadata and capturing phrases in clinical text greatly improves the topic representation of clinical documents. The resulting clinically informative topics may effectively serve as the basis for an automatic summarization system for clinical reports. Copyright © 2016 Elsevier Inc. All rights reserved.
BoB, a best-of-breed automated text de-identification system for VHA clinical documents.
Ferrández, Oscar; South, Brett R; Shen, Shuying; Friedlin, F Jeffrey; Samore, Matthew H; Meystre, Stéphane M
2013-01-01
De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents. We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible. We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation. BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives. Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.
Can physicians recognize their own patients in de-identified notes?
Meystre, Stéphane; Shen, Shuying; Hofmann, Deborah; Gundlapalli, Adi
2014-01-01
The adoption of Electronic Health Records is growing at a fast pace, and this growth results in very large quantities of patient clinical information becoming available in electronic format, with tremendous potentials, but also equally growing concern for patient confidentiality breaches. De-identification of patient information has been proposed as a solution to both facilitate secondary uses of clinical information, and protect patient information confidentiality. Automated approaches based on Natural Language Processing have been implemented and evaluated, allowing for much faster text de-identification than manual approaches. A U.S. Veterans Affairs clinical text de-identification project focused on investigating the current state of the art of automatic clinical text de-identification, on developing a best-of-breed de-identification application for clinical documents, and on evaluating its impact on subsequent text uses and the risk for re-identification. To evaluate this risk, we de-identified discharge summaries from 86 patients using our 'best-of-breed' text de-identification application with resynthesis of the identifiers detected. We then asked physicians working in the ward the patients were hospitalized in if they could recognize these patients when reading the de-identified documents. Each document was examined by at least one resident and one attending physician, and with 4.65% of the documents, physicians thought they recognized the patient because of specific clinical information, but after verification, none was correctly re-identified.
Electronic Documentation Support Tools and Text Duplication in the Electronic Medical Record
ERIC Educational Resources Information Center
Wrenn, Jesse
2010-01-01
In order to ease the burden of electronic note entry on physicians, electronic documentation support tools have been developed to assist in note authoring. There is little evidence of the effects of these tools on attributes of clinical documentation, including document quality. Furthermore, the resultant abundance of duplicated text and…
Semantic retrieval and navigation in clinical document collections.
Kreuzthaler, Markus; Daumke, Philipp; Schulz, Stefan
2015-01-01
Patients with chronic diseases undergo numerous in- and outpatient treatment periods, and therefore many documents accumulate in their electronic records. We report on an on-going project focussing on the semantic enrichment of medical texts, in order to support recall-oriented navigation across a patient's complete documentation. A document pool of 1,696 de-identified discharge summaries was used for prototyping. A natural language processing toolset for document annotation (based on the text-mining framework UIMA) and indexing (Solr) was used to support a browser-based platform for document import, search and navigation. The integrated search engine combines free text and concept-based querying, supported by dynamically generated facets (diagnoses, procedures, medications, lab values, and body parts). The prototype demonstrates the feasibility of semantic document enrichment within document collections of a single patient. Originally conceived as an add-on for the clinical workplace, this technology could also be adapted to support personalised health record platforms, as well as cross-patient search for cohort building and other secondary use scenarios.
Data from clinical notes: a perspective on the tension between structure and flexible documentation
Denny, Joshua C; Xu, Hua; Lorenzi, Nancy; Stead, William W; Johnson, Kevin B
2011-01-01
Clinical documentation is central to patient care. The success of electronic health record system adoption may depend on how well such systems support clinical documentation. A major goal of integrating clinical documentation into electronic heath record systems is to generate reusable data. As a result, there has been an emphasis on deploying computer-based documentation systems that prioritize direct structured documentation. Research has demonstrated that healthcare providers value different factors when writing clinical notes, such as narrative expressivity, amenability to the existing workflow, and usability. The authors explore the tension between expressivity and structured clinical documentation, review methods for obtaining reusable data from clinical notes, and recommend that healthcare providers be able to choose how to document patient care based on workflow and note content needs. When reusable data are needed from notes, providers can use structured documentation or rely on post-hoc text processing to produce structured data, as appropriate. PMID:21233086
Vogel, Markus; Kaisers, Wolfgang; Wassmuth, Ralf; Mayatepek, Ertan
2015-11-03
Clinical documentation has undergone a change due to the usage of electronic health records. The core element is to capture clinical findings and document therapy electronically. Health care personnel spend a significant portion of their time on the computer. Alternatives to self-typing, such as speech recognition, are currently believed to increase documentation efficiency and quality, as well as satisfaction of health professionals while accomplishing clinical documentation, but few studies in this area have been published to date. This study describes the effects of using a Web-based medical speech recognition system for clinical documentation in a university hospital on (1) documentation speed, (2) document length, and (3) physician satisfaction. Reports of 28 physicians were randomized to be created with (intervention) or without (control) the assistance of a Web-based system of medical automatic speech recognition (ASR) in the German language. The documentation was entered into a browser's text area and the time to complete the documentation including all necessary corrections, correction effort, number of characters, and mood of participant were stored in a database. The underlying time comprised text entering, text correction, and finalization of the documentation event. Participants self-assessed their moods on a scale of 1-3 (1=good, 2=moderate, 3=bad). Statistical analysis was done using permutation tests. The number of clinical reports eligible for further analysis stood at 1455. Out of 1455 reports, 718 (49.35%) were assisted by ASR and 737 (50.65%) were not assisted by ASR. Average documentation speed without ASR was 173 (SD 101) characters per minute, while it was 217 (SD 120) characters per minute using ASR. The overall increase in documentation speed through Web-based ASR assistance was 26% (P=.04). Participants documented an average of 356 (SD 388) characters per report when not assisted by ASR and 649 (SD 561) characters per report when assisted by ASR. Participants' average mood rating was 1.3 (SD 0.6) using ASR assistance compared to 1.6 (SD 0.7) without ASR assistance (P<.001). We conclude that medical documentation with the assistance of Web-based speech recognition leads to an increase in documentation speed, document length, and participant mood when compared to self-typing. Speech recognition is a meaningful and effective tool for the clinical documentation process.
Taming Big Data: An Information Extraction Strategy for Large Clinical Text Corpora.
Gundlapalli, Adi V; Divita, Guy; Carter, Marjorie E; Redd, Andrew; Samore, Matthew H; Gupta, Kalpana; Trautner, Barbara
2015-01-01
Concepts of interest for clinical and research purposes are not uniformly distributed in clinical text available in electronic medical records. The purpose of our study was to identify filtering techniques to select 'high yield' documents for increased efficacy and throughput. Using two large corpora of clinical text, we demonstrate the identification of 'high yield' document sets in two unrelated domains: homelessness and indwelling urinary catheters. For homelessness, the high yield set includes homeless program and social work notes. For urinary catheters, concepts were more prevalent in notes from hospitalized patients; nursing notes accounted for a majority of the high yield set. This filtering will enable customization and refining of information extraction pipelines to facilitate extraction of relevant concepts for clinical decision support and other uses.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali
2018-06-01
Text categorization has been used extensively in recent years to classify plain-text clinical reports. This study employs text categorization techniques for the classification of open narrative forensic autopsy reports. One of the key steps in text classification is document representation. In document representation, a clinical report is transformed into a format that is suitable for classification. The traditional document representation technique for text categorization is the bag-of-words (BoW) technique. In this study, the traditional BoW technique is ineffective in classifying forensic autopsy reports because it merely extracts frequent but discriminative features from clinical reports. Moreover, this technique fails to capture word inversion, as well as word-level synonymy and polysemy, when classifying autopsy reports. Hence, the BoW technique suffers from low accuracy and low robustness unless it is improved with contextual and application-specific information. To overcome the aforementioned limitations of the BoW technique, this research aims to develop an effective conceptual graph-based document representation (CGDR) technique to classify 1500 forensic autopsy reports from four (4) manners of death (MoD) and sixteen (16) causes of death (CoD). Term-based and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) based conceptual features were extracted and represented through graphs. These features were then used to train a two-level text classifier. The first level classifier was responsible for predicting MoD. In addition, the second level classifier was responsible for predicting CoD using the proposed conceptual graph-based document representation technique. To demonstrate the significance of the proposed technique, its results were compared with those of six (6) state-of-the-art document representation techniques. Lastly, this study compared the effects of one-level classification and two-level classification on the experimental results. The experimental results indicated that the CGDR technique achieved 12% to 15% improvement in accuracy compared with fully automated document representation baseline techniques. Moreover, two-level classification obtained better results compared with one-level classification. The promising results of the proposed conceptual graph-based document representation technique suggest that pathologists can adopt the proposed system as their basis for second opinion, thereby supporting them in effectively determining CoD. Copyright © 2018 Elsevier Inc. All rights reserved.
TEXTINFO: a tool for automatic determination of patient clinical profiles using text analysis.
Borst, F.; Lyman, M.; Nhàn, N. T.; Tick, L. J.; Sager, N.; Scherrer, J. R.
1991-01-01
The clinical data contained in narrative patient documents is made available via grammatical and semantic processing. Retrievals from the resulting relational database tables are matched against a set of clinical descriptors to obtain clinical profiles of the patients in terms of the descriptors present in the documents. Discharge summaries of 57 Dept. of Digestive Surgery patients were processed in this manner. Factor analysis and discriminant analysis procedures were then applied, showing the profiles to be useful for diagnosis definitions (by establishing relations between diagnoses and clinical findings), for diagnosis assessment (by viewing the match between a definition and observed events recorded in a patient text), and potentially for outcome evaluation based on the classification abilities of clinical signs. PMID:1807679
Methods and Techniques for Clinical Text Modeling and Analytics
ERIC Educational Resources Information Center
Ling, Yuan
2017-01-01
This study focuses on developing and applying methods/techniques in different aspects of the system for clinical text understanding, at both corpus and document level. We deal with two major research questions: First, we explore the question of "How to model the underlying relationships from clinical notes at corpus level?" Documents…
Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.
He, Bin; Dong, Bin; Guan, Yi; Yang, Jinfeng; Jiang, Zhipeng; Yu, Qiubin; Cheng, Jianyi; Qu, Chunyan
2017-05-01
To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. The syntactic corpus consists of 138 Chinese clinical documents with 47,426 tokens and 2612 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7693 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain. Copyright © 2017. Published by Elsevier Inc.
Text-interpreter language for flexible generation of patient notes and instructions.
Forker, T S
1992-01-01
An interpreted computer language has been developed along with a windowed user interface and multi-printer-support formatter to allow preparation of documentation of patient visits, including progress notes, prescriptions, excuses for work/school, outpatient laboratory requisitions, and patient instructions. Input is by trackball or mouse with little or no keyboard skill required. For clinical problems with specific protocols, the clinician can be prompted with problem-specific items of history, exam, and lab data to be gathered and documented. The language implements a number of text-related commands as well as branching logic and arithmetic commands. In addition to generating text, it is simple to implement arithmetic calculations such as weight-specific drug dosages; multiple branching decision-support protocols for paramedical personnel (or physicians); and calculation of clinical scores (e.g., coma or trauma scores) while simultaneously documenting the status of each component of the score. ASCII text files produced by the interpreter are available for computerized quality audit. Interpreter instructions are contained in text files users can customize with any text editor.
Ferrández, Oscar; South, Brett R; Shen, Shuying; Friedlin, F Jeffrey; Samore, Matthew H; Meystre, Stéphane M
2012-07-27
The increased use and adoption of Electronic Health Records (EHR) causes a tremendous growth in digital information useful for clinicians, researchers and many other operational purposes. However, this information is rich in Protected Health Information (PHI), which severely restricts its access and possible uses. A number of investigators have developed methods for automatically de-identifying EHR documents by removing PHI, as specified in the Health Insurance Portability and Accountability Act "Safe Harbor" method.This study focuses on the evaluation of existing automated text de-identification methods and tools, as applied to Veterans Health Administration (VHA) clinical documents, to assess which methods perform better with each category of PHI found in our clinical notes; and when new methods are needed to improve performance. We installed and evaluated five text de-identification systems "out-of-the-box" using a corpus of VHA clinical documents. The systems based on machine learning methods were trained with the 2006 i2b2 de-identification corpora and evaluated with our VHA corpus, and also evaluated with a ten-fold cross-validation experiment using our VHA corpus. We counted exact, partial, and fully contained matches with reference annotations, considering each PHI type separately, or only one unique 'PHI' category. Performance of the systems was assessed using recall (equivalent to sensitivity) and precision (equivalent to positive predictive value) metrics, as well as the F(2)-measure. Overall, systems based on rules and pattern matching achieved better recall, and precision was always better with systems based on machine learning approaches. The highest "out-of-the-box" F(2)-measure was 67% for partial matches; the best precision and recall were 95% and 78%, respectively. Finally, the ten-fold cross validation experiment allowed for an increase of the F(2)-measure to 79% with partial matches. The "out-of-the-box" evaluation of text de-identification systems provided us with compelling insight about the best methods for de-identification of VHA clinical documents. The errors analysis demonstrated an important need for customization to PHI formats specific to VHA documents. This study informed the planning and development of a "best-of-breed" automatic de-identification application for VHA clinical text.
Assessing usage patterns of electronic clinical documentation templates.
Vawdrey, David K
2008-11-06
Many vendors of electronic medical records support structured and free-text entry of clinical documents using configurable templates. At a healthcare institution comprising two large academic medical centers, a documentation management data mart and a custom, Web-accessible business intelligence application were developed to track the availability and usage of electronic documentation templates. For each medical center, template availability and usage trends were measured from November 2007 through February 2008. By February 2008, approximately 65,000 electronic notes were authored per week on the two campuses. One site had 934 available templates, with 313 being used to author at least one note. The other site had 765 templates, of which 480 were used. The most commonly used template at both campuses was a free text note called "Miscellaneous Nursing Note," which accounted for 33.3% of total documents generated at one campus and 15.2% at the other.
Afzal, Zubair; Pons, Ewoud; Kang, Ning; Sturkenboom, Miriam C J M; Schuemie, Martijn J; Kors, Jan A
2014-11-29
In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists' letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners' entries and a regular expression based temporality module. The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development.
2010-01-01
Background In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. Methods This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. Results The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. Conclusions In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication. PMID:20678228
Meystre, Stephane M; Friedlin, F Jeffrey; South, Brett R; Shen, Shuying; Samore, Matthew H
2010-08-02
In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.
WITH: a system to write clinical trials using XML and RDBMS.
Fazi, Paola; Luzi, Daniela; Manco, Mariarosaria; Ricci, Fabrizio L.; Toffoli, Giovanni; Vignetti, Marco
2002-01-01
The paper illustrates the system WITH (Write on Internet clinical Trials in Haematology) which supports the writing of a clinical trial (CT) document. The requirements of this system have been defined analysing the writing process of a CT and then modelling the content of its sections together with their logical and temporal relationships. The system WITH allows: a) editing the document text; b) re-using the text; and c) facilitating the cooperation and the collaborative writing. It is based on XML mark-up language, and on a RDBMS. This choice guarantees: a) process standardisation; b) process management; c) efficient delivery of information-based tasks; and d) explicit focus on process design. PMID:12463823
Document Exploration and Automatic Knowledge Extraction for Unstructured Biomedical Text
NASA Astrophysics Data System (ADS)
Chu, S.; Totaro, G.; Doshi, N.; Thapar, S.; Mattmann, C. A.; Ramirez, P.
2015-12-01
We describe our work on building a web-browser based document reader with built-in exploration tool and automatic concept extraction of medical entities for biomedical text. Vast amounts of biomedical information are offered in unstructured text form through scientific publications and R&D reports. Utilizing text mining can help us to mine information and extract relevant knowledge from a plethora of biomedical text. The ability to employ such technologies to aid researchers in coping with information overload is greatly desirable. In recent years, there has been an increased interest in automatic biomedical concept extraction [1, 2] and intelligent PDF reader tools with the ability to search on content and find related articles [3]. Such reader tools are typically desktop applications and are limited to specific platforms. Our goal is to provide researchers with a simple tool to aid them in finding, reading, and exploring documents. Thus, we propose a web-based document explorer, which we called Shangri-Docs, which combines a document reader with automatic concept extraction and highlighting of relevant terms. Shangri-Docsalso provides the ability to evaluate a wide variety of document formats (e.g. PDF, Words, PPT, text, etc.) and to exploit the linked nature of the Web and personal content by performing searches on content from public sites (e.g. Wikipedia, PubMed) and private cataloged databases simultaneously. Shangri-Docsutilizes Apache cTAKES (clinical Text Analysis and Knowledge Extraction System) [4] and Unified Medical Language System (UMLS) to automatically identify and highlight terms and concepts, such as specific symptoms, diseases, drugs, and anatomical sites, mentioned in the text. cTAKES was originally designed specially to extract information from clinical medical records. Our investigation leads us to extend the automatic knowledge extraction process of cTAKES for biomedical research domain by improving the ontology guided information extraction process. We will describe our experience and implementation of our system and share lessons learned from our development. We will also discuss ways in which this could be adapted to other science fields. [1] Funk et al., 2014. [2] Kang et al., 2014. [3] Utopia Documents, http://utopiadocs.com [4] Apache cTAKES, http://ctakes.apache.org
Building a common pipeline for rule-based document classification.
Patterson, Olga V; Ginter, Thomas; DuVall, Scott L
2013-01-01
Instance-based classification of clinical text is a widely used natural language processing task employed as a step for patient classification, document retrieval, or information extraction. Rule-based approaches rely on concept identification and context analysis in order to determine the appropriate class. We propose a five-step process that enables even small research teams to develop simple but powerful rule-based NLP systems by taking advantage of a common UIMA AS based pipeline for classification. Our proposed methodology coupled with the general-purpose solution provides researchers with access to the data locked in clinical text in cases of limited human resources and compact timelines.
Integrating query of relational and textual data in clinical databases: a case study.
Fisk, John M; Mutalik, Pradeep; Levin, Forrest W; Erdos, Joseph; Taylor, Caroline; Nadkarni, Prakash
2003-01-01
The authors designed and implemented a clinical data mart composed of an integrated information retrieval (IR) and relational database management system (RDBMS). Using commodity software, which supports interactive, attribute-centric text and relational searches, the mart houses 2.8 million documents that span a five-year period and supports basic IR features such as Boolean searches, stemming, and proximity and fuzzy searching. Results are relevance-ranked using either "total documents per patient" or "report type weighting." Non-curated medical text has a significant degree of malformation with respect to spelling and punctuation, which creates difficulties for text indexing and searching. Presently, the IR facilities of RDBMS packages lack the features necessary to handle such malformed text adequately. A robust IR+RDBMS system can be developed, but it requires integrating RDBMSs with third-party IR software. RDBMS vendors need to make their IR offerings more accessible to non-programmers.
Challenges and Insights in Using HIPAA Privacy Rule for Clinical Text Annotation.
Kayaalp, Mehmet; Browne, Allen C; Sagan, Pamela; McGee, Tyne; McDonald, Clement J
2015-01-01
The Privacy Rule of Health Insurance Portability and Accountability Act (HIPAA) requires that clinical documents be stripped of personally identifying information before they can be released to researchers and others. We have been manually annotating clinical text since 2008 in order to test and evaluate an algorithmic clinical text de-identification tool, NLM Scrubber, which we have been developing in parallel. Although HIPAA provides some guidance about what must be de-identified, translating those guidelines into practice is not as straightforward, especially when one deals with free text. As a result we have changed our manual annotation labels and methods six times. This paper explains why we have made those annotation choices, which have been evolved throughout seven years of practice on this field. The aim of this paper is to start a community discussion towards developing standards for clinical text annotation with the end goal of studying and comparing clinical text de-identification systems more accurately.
Benge, James; Beach, Thomas; Gladding, Connie; Maestas, Gail
2008-01-01
The Military Health System (MHS) deployed its electronic health record (EHR), AHLTA to Military Treatment Facilities (MTFs) around the world. This paper focuses on the approach and barriers to using structured text in AHLTA to document care encounters and illustrates the direct correlation between the use of structured text and achievement of expected benefits. AHLTA uses commercially available products, a health data dictionary and standardized medical terminology, enabling the capture of structured computable data. With structured text stored in the AHLTA Clinical Data Repository (CDR), the MHS has seen a return on its EHR investment with improvements in the accuracy and completeness of coding and the documentation of care provided. Determining the aspects of documentation where structured text is most beneficial, as well as the degree of structured text needed has been a significant challenge. This paper describes how the economic value framework aligns the enterprise strategic objectives with the EHR investment features, performance metrics and expected benefits. The framework analyses focus on return on investment calculations, baseline assessment and post-implementation benefits validation. Cost avoidance, revenue enhancements and operational improvements, such as evidence-based medicine and medical surveillance can be directly attributed to use structured text.
ERIC Educational Resources Information Center
Zhang, Rui
2013-01-01
The widespread adoption of Electronic Health Record (EHR) has resulted in rapid text proliferation within clinical care. Clinicians' use of copying and pasting functions in EHR systems further compounds this by creating a large amount of redundant clinical information in clinical documents. A mixture of redundant information (especially outdated…
Integrating Query of Relational and Textual Data in Clinical Databases: A Case Study
Fisk, John M.; Mutalik, Pradeep; Levin, Forrest W.; Erdos, Joseph; Taylor, Caroline; Nadkarni, Prakash
2003-01-01
Objectives: The authors designed and implemented a clinical data mart composed of an integrated information retrieval (IR) and relational database management system (RDBMS). Design: Using commodity software, which supports interactive, attribute-centric text and relational searches, the mart houses 2.8 million documents that span a five-year period and supports basic IR features such as Boolean searches, stemming, and proximity and fuzzy searching. Measurements: Results are relevance-ranked using either “total documents per patient” or “report type weighting.” Results: Non-curated medical text has a significant degree of malformation with respect to spelling and punctuation, which creates difficulties for text indexing and searching. Presently, the IR facilities of RDBMS packages lack the features necessary to handle such malformed text adequately. Conclusion: A robust IR+RDBMS system can be developed, but it requires integrating RDBMSs with third-party IR software. RDBMS vendors need to make their IR offerings more accessible to non-programmers. PMID:12509355
Towards Phenotyping of Clinical Trial Eligibility Criteria.
Löbe, Matthias; Stäubert, Sebastian; Goldberg, Colleen; Haffner, Ivonne; Winter, Alfred
2018-01-01
Medical plaintext documents contain important facts about patients, but they are rarely available for structured queries. The provision of structured information from natural language texts in addition to the existing structured data can significantly speed up the search for fulfilled inclusion criteria and thus improve the recruitment rate. This work is aimed at supporting clinical trial recruitment with text mining techniques to identify suitable subjects in hospitals. Based on the inclusion/exclusion criteria of 5 sample studies and a text corpus consisting of 212 doctor's letters and medical follow-up documentation from a university cancer center, a prototype was developed and technically evaluated using NLP procedures (UIMA) for the extraction of facts from medical free texts. It was found that although the extracted entities are not always correct (precision between 23% and 96%), they provide a decisive indication as to which patient file should be read preferentially. The prototype presented here demonstrates the technical feasibility. In order to find available, lucrative phenotypes, an in-depth evaluation is required.
Meystre, Stéphane M; Lee, Sanghoon; Jung, Chai Young; Chevrier, Raphaël D
2012-08-01
An increasing need for collaboration and resources sharing in the Natural Language Processing (NLP) research and development community motivates efforts to create and share a common data model and a common terminology for all information annotated and extracted from clinical text. We have combined two existing standards: the HL7 Clinical Document Architecture (CDA), and the ISO Graph Annotation Format (GrAF; in development), to develop such a data model entitled "CDA+GrAF". We experimented with several methods to combine these existing standards, and eventually selected a method wrapping separate CDA and GrAF parts in a common standoff annotation (i.e., separate from the annotated text) XML document. Two use cases, clinical document sections, and the 2010 i2b2/VA NLP Challenge (i.e., problems, tests, and treatments, with their assertions and relations), were used to create examples of such standoff annotation documents, and were successfully validated with the XML schemata provided with both standards. We developed a tool to automatically translate annotation documents from the 2010 i2b2/VA NLP Challenge format to GrAF, and automatically generated 50 annotation documents using this tool, all successfully validated. Finally, we adapted the XSL stylesheet provided with HL7 CDA to allow viewing annotation XML documents in a web browser, and plan to adapt existing tools for translating annotation documents between CDA+GrAF and the UIMA and GATE frameworks. This common data model may ease directly comparing NLP tools and applications, combining their output, transforming and "translating" annotations between different NLP applications, and eventually "plug-and-play" of different modules in NLP applications. Copyright © 2011 Elsevier Inc. All rights reserved.
Chiang, Michael F.; Read-Brown, Sarah; Tu, Daniel C.; Choi, Dongseok; Sanders, David S.; Hwang, Thomas S.; Bailey, Steven; Karr, Daniel J.; Cottle, Elizabeth; Morrison, John C.; Wilson, David J.; Yackel, Thomas R.
2013-01-01
Purpose: To evaluate three measures related to electronic health record (EHR) implementation: clinical volume, time requirements, and nature of clinical documentation. Comparison is made to baseline paper documentation. Methods: An academic ophthalmology department implemented an EHR in 2006. A study population was defined of faculty providers who worked the 5 months before and after implementation. Clinical volumes, as well as time length for each patient encounter, were collected from the EHR reporting system. To directly compare time requirements, two faculty providers who utilized both paper and EHR systems completed time-motion logs to record the number of patients, clinic time, and nonclinic time to complete documentation. Faculty providers and databases were queried to identify patient records containing both paper and EHR notes, from which three cases were identified to illustrate representative documentation differences. Results: Twenty-three faculty providers completed 120,490 clinical encounters during a 3-year study period. Compared to baseline clinical volume from 3 months pre-implementation, the post-implementation volume was 88% in quarter 1, 93% in year 1, 97% in year 2, and 97% in year 3. Among all encounters, 75% were completed within 1.7 days after beginning documentation. The mean total time per patient was 6.8 minutes longer with EHR than paper (P<.01). EHR documentation involved greater reliance on textual interpretation of clinical findings, whereas paper notes used more graphical representations, and EHR notes were longer and included automatically generated text. Conclusion: This EHR implementation was associated with increased documentation time, little or no increase in clinical volume, and changes in the nature of ophthalmic documentation. PMID:24167326
Natural language processing and the representation of clinical data.
Sager, N; Lyman, M; Bucknall, C; Nhan, N; Tick, L J
1994-01-01
OBJECTIVE: Develop a representation of clinical observations and actions and a method of processing free-text patient documents to facilitate applications such as quality assurance. DESIGN: The Linguistic String Project (LSP) system of New York University utilizes syntactic analysis, augmented by a sublanguage grammar and an information structure that are specific to the clinical narrative, to map free-text documents into a database for querying. MEASUREMENTS: Information precision (I-P) and information recall (I-R) were measured for queries for the presence of 13 asthma-health-care quality assurance criteria in a database generated from 59 discharge letters. RESULTS: I-P, using counts of major errors only, was 95.7% for the 28-letter training set and 98.6% for the 31-letter test set. I-R, using counts of major omissions only, was 93.9% for the training set and 92.5% for the test set. PMID:7719796
Reinert, Christiane; Kremmler, Lukas; Burock, Susen; Bogdahn, Ulrich; Wick, Wolfgang; Gleiter, Christoph H; Koller, Michael; Hau, Peter
2014-01-01
In randomised controlled trials (RCTs), patient informed consent documents are an essential cornerstone of the study flow. However, these documents are often oversized in format and content. Clinical experience suggests that study information sheets are often not used as an aid to decision-making due to their complexity. We analysed nine patient informed consent documents from clinical neuro-oncological phase III-studies running at a German Brain Tumour Centre with the objective to investigate the quality of these documents. Text length, formal layout, readability, application of ethical and legal requirements, scientific evidence and social aspects were used as rating categories. Results were assessed quantitatively by two independents investigators and were depicted using net diagrams. All patient informed consent documents were of insufficient quality in all categories except that ethical and legal requirements were fulfilled. Notably, graduate levels were required to read and understand five of nine consent documents. Quality deficits were consistent between the individual study information texts. Irrespective of formal aspects, a document that is intended to inform and motivate patients to participate in a study needs to be well-structured and understandable. We therefore strongly mandate to re-design patient informed consent documents in a patient-friendly way. Specifically, standardised components with a scientific foundation should be provided that could be retrieved at various times, adapted to the mode of treatment and the patient's knowledge, and could weigh information dependent of the stage of treatment decision. Copyright © 2013 Elsevier Ltd. All rights reserved.
Hoelzer, S; Schweiger, R K; Boettcher, H A; Tafazzoli, A G; Dudeck, J
2001-01-01
The purpose of guidelines in clinical practice is to improve the effectiveness and efficiency of clinical care. It is known that nationally or internationally produced guidelines which, in particular, do not involve medical processes at the time of consultation, do not take local factors into account, and have no consistent implementation strategy, have limited impact in changing either the behaviour of physicians, or patterns of care. The literature provides evidence for the effectiveness of computerization of CPGs for increasing compliance and improving patient outcomes. Probably the most effective concepts are knowledge-based functions for decision support or monitoring that are integrated in clinical information systems. This approach is mostly restricted by the effort required for development and maintenance of the information systems and the limited number of implemented medical rules. Most of the guidelines are text-based, and are primarily published in medical journals and posted on the internet. However, internet-published guidelines have little impact on the behaviour of physicians. It can be difficult and time-consuming to browse the internet to find (a) the correct guidelines to an existing diagnosis and (b) and adequate recommendation for a specific clinical problem. Our objective is to provide a web-based guideline service that takes as input clinical data on a particular patient and returns as output a customizable set of recommendations regarding diagnosis and treatment. Information in healthcare is to a very large extent transmitted and stored as unstructured or slightly structured text such as discharge letters, reports, forms, etc. The same applies for facilities containing medical information resources for clinical purposes and research such as text books, articles, guidelines, etc. Physicians are used to obtaining information from text-based sources. Since most guidelines are text-based, it would be practical to use a document-based solution that preserves the original cohesiveness. The lack of structure limits the automatic identification and extraction of the information contained in these resources. For this reason, we have chosen a document-based approach using eXtensible Markup Language (XML) with its schema definition and related technologies. XML empowers the applications for in-context searching. In addition it allows the same content to be represented in different ways. Our XML reference clinical data model for guidelines has been realized with the XML schema definition. The schema is used for structuring new text-based guidelines and updating existing documents. It is also used to establish search strategies on the document base. We hypothesize that enabling the physicians to query the available CPGs easily, and to get access to selected and specific information at the point of care will foster increased use. Based on current evidence we are confident that it will have substantial impact on the care provided, and will improve health outcomes.
Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.
Wu, Yonghui; Jiang, Min; Lei, Jianbo; Xu, Hua
2015-01-01
Rapid growth in electronic health records (EHRs) use has led to an unprecedented expansion of available clinical data in electronic formats. However, much of the important healthcare information is locked in the narrative documents. Therefore Natural Language Processing (NLP) technologies, e.g., Named Entity Recognition that identifies boundaries and types of entities, has been extensively studied to unlock important clinical information in free text. In this study, we investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach. We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task. The experiment results showed that the DNN with word embeddings trained from the large unlabeled corpus outperformed the state-of-the-art CRF's model in the minimal feature engineering setting, achieving the highest F1-score of 0.9280. Further analysis showed that word embeddings derived through unsupervised learning from large unlabeled corpus remarkably improved the DNN with randomized embedding, denoting the usefulness of unsupervised feature learning.
Beyond Information Retrieval—Medical Question Answering
Lee, Minsuk; Cimino, James; Zhu, Hai Ran; Sable, Carl; Shanker, Vijay; Ely, John; Yu, Hong
2006-01-01
Physicians have many questions when caring for patients, and frequently need to seek answers for their questions. Information retrieval systems (e.g., PubMed) typically return a list of documents in response to a user’s query. Frequently the number of returned documents is large and makes physicians’ information seeking “practical only ‘after hours’ and not in the clinical settings”. Question answering techniques are based on automatically analyzing thousands of electronic documents to generate short-text answers in response to clinical questions that are posed by physicians. The authors address physicians’ information needs and described the design, implementation, and evaluation of the medical question answering system (MedQA). Although our long term goal is to enable MedQA to answer all types of medical questions, currently, we currently implement MedQA to integrate information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text for definitional questions (i.e., “What is X?”). MedQA can be accessed at http://www.dbmi.columbia.edu/~yuh9001/research/MedQA.html. PMID:17238385
Development and Evaluation of a Clinical Note Section Header Terminology
Denny, Joshua C.; Miller, Randolph A.; Johnson, Kevin B.; Spickard, Anderson
2008-01-01
Clinical documentation is often expressed in natural language text, yet providers often use common organizations that segment these notes in sections, such as “history of present illness” or “physical examination.” We developed a hierarchical section header terminology, supporting mappings to LOINC and other vocabularies; it contained 1109 concepts and 4332 synonyms. Physicians evaluated it compared to LOINC and the Evaluation and Management billing schema using a randomly selected corpus of history and physical notes. Evaluated documents contained a median of 54 sections and 27 “major sections.” There were 16,196 total sections in the evaluation note corpus. The terminology contained 99.9% of the clinical sections; LOINC matched 77% of section header concepts and 20% of section header strings in those documents. The section terminology may enable better clinical note understanding and interoperability. Future development and integration into natural language processing systems is needed. PMID:18999303
The tool extracts deep phenotypic information from the clinical narrative at the document-, episode-, and patient-level. The final output is FHIR compliant patient-level phenotypic summary which can be consumed by research warehouses or the DeepPhe native visualization tool.
A classification of errors in lay comprehension of medical documents.
Keselman, Alla; Smith, Catherine Arnott
2012-12-01
Emphasis on participatory medicine requires that patients and consumers participate in tasks traditionally reserved for healthcare providers. This includes reading and comprehending medical documents, often but not necessarily in the context of interacting with Personal Health Records (PHRs). Research suggests that while giving patients access to medical documents has many benefits (e.g., improved patient-provider communication), lay people often have difficulty understanding medical information. Informatics can address the problem by developing tools that support comprehension; this requires in-depth understanding of the nature and causes of errors that lay people make when comprehending clinical documents. The objective of this study was to develop a classification scheme of comprehension errors, based on lay individuals' retellings of two documents containing clinical text: a description of a clinical trial and a typical office visit note. While not comprehensive, the scheme can serve as a foundation of further development of a taxonomy of patients' comprehension errors. Eighty participants, all healthy volunteers, read and retold two medical documents. A data-driven content analysis procedure was used to extract and classify retelling errors. The resulting hierarchical classification scheme contains nine categories and 23 subcategories. The most common error made by the participants involved incorrectly recalling brand names of medications. Other common errors included misunderstanding clinical concepts, misreporting the objective of a clinical research study and physician's findings during a patient's visit, and confusing and misspelling clinical terms. A combination of informatics support and health education is likely to improve the accuracy of lay comprehension of medical documents. Published by Elsevier Inc.
Adaptive semantic tag mining from heterogeneous clinical research texts.
Hao, T; Weng, C
2015-01-01
To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts. We develop a "plug-n-play" framework that integrates replaceable unsupervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach's recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach's adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts. Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the base- line ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed. This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.
Chen, Elizabeth S.; Maloney, Francine L.; Shilmayster, Eugene; Goldberg, Howard S.
2009-01-01
A systematic and standard process for capturing information within free-text clinical documents could facilitate opportunities for improving quality and safety of patient care, enhancing decision support, and advancing data warehousing across an enterprise setting. At Partners HealthCare System, the Medical Language Processing (MLP) services project was initiated to establish a component-based architectural model and processes to facilitate putting MLP functionality into production for enterprise consumption, promote sharing of components, and encourage reuse. Key objectives included exploring the use of an open-source framework called the Unstructured Information Management Architecture (UIMA) and leveraging existing MLP-related efforts, terminology, and document standards. This paper describes early experiences in defining the infrastructure and standards for extracting, encoding, and structuring clinical observations from a variety of clinical documents to serve enterprise-wide needs. PMID:20351830
Chen, Elizabeth S; Maloney, Francine L; Shilmayster, Eugene; Goldberg, Howard S
2009-11-14
A systematic and standard process for capturing information within free-text clinical documents could facilitate opportunities for improving quality and safety of patient care, enhancing decision support, and advancing data warehousing across an enterprise setting. At Partners HealthCare System, the Medical Language Processing (MLP) services project was initiated to establish a component-based architectural model and processes to facilitate putting MLP functionality into production for enterprise consumption, promote sharing of components, and encourage reuse. Key objectives included exploring the use of an open-source framework called the Unstructured Information Management Architecture (UIMA) and leveraging existing MLP-related efforts, terminology, and document standards. This paper describes early experiences in defining the infrastructure and standards for extracting, encoding, and structuring clinical observations from a variety of clinical documents to serve enterprise-wide needs.
PlateRunner: A Search Engine to Identify EMR Boilerplates.
Divita, Guy; Workman, T Elizabeth; Carter, Marjorie E; Redd, Andrew; Samore, Matthew H; Gundlapalli, Adi V
2016-01-01
Medical text contains boilerplated content, an artifact of pull-down forms from EMRs. Boilerplated content is the source of challenges for concept extraction on clinical text. This paper introduces PlateRunner, a search engine on boilerplates from the US Department of Veterans Affairs (VA) EMR. Boilerplates containing concepts should be identified and reviewed to recognize challenging formats, identify high yield document titles, and fine tune section zoning. This search engine has the capability to filter negated and asserted concepts, save and search query results. This tool can save queries, search results, and documents found for later analysis.
Narayana, D B Anantha; Durg, Sharanbasappa; Manohar, P Ram; Mahapatra, Anita; Aramya, A R
2017-02-02
Chyawanprash (CP), a traditional immune booster recipe, has a long history of ethnic origin, development, household preparation and usage. There are even mythological stories about the origin of this recipe including its nomenclature. In the last six decades, CP, because of entrepreneurial actions of some research Vaidyas (traditional doctors) has grown to industrial production and marketing in packed forms to a large number of consumers/patients like any food or health care product. Currently, CP has acquired a large accepted user base in India and in a few countries out-side India. Authoritative texts, recognized by the Drugs and Cosmetics Act of India, describe CP as an immunity enhancer and strength giver meant for improving lung functions in diseases with compromised immunity. This review focuses on published clinical efficacy and safety studies of CP for correlation with health benefits as documented in the authoritative texts, and also briefs on its recipes and processes. Authoritative texts were searched for recipes, processes, and other technical details of CP. Labels of marketing CP products (Indian) were studied for the health claims. Electronic search for studies of CP on efficacy and safety data were performed in PubMed/MEDLINE and DHARA (Digital Helpline for Ayurveda Research Articles), and Ayurvedic books were also searched for clinical studies. The documented clinical studies from electronic databases and Ayurvedic books evidenced that individuals who consume CP regularly for a definite period of time showed improvement in overall health status and immunity. However, most of the clinical studies in this review are of smaller sample size and short duration. Further, limitation to access and review significant data on traditional products like CP in electronic databases was noted. Randomized controlled trials of high quality with larger sample size and longer follow-up are needed to have significant evidence on the clinical use of CP as immunity booster. Additional studies involving measurement of current biomarkers of immunity pre- and post-consumption of the product as well as benefits accruing with the use of CP as an adjuvant are suggested. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Planning: supporting and optimizing clinical guidelines execution.
Anselma, Luca; Montani, Stefania
2008-01-01
A crucial feature of computerized clinical guidelines (CGs) lies in the fact that they may be used not only as conventional documents (as if they were just free text) describing general procedures that users have to follow. In fact, thanks to a description of their actions and control flow in some semiformal representation language, CGs can also take advantage of Computer Science methods and Information Technology infrastructures and techniques, to become executable documents, in the sense that they may support clinical decision making and clinical procedures execution. In order to reach this goal, some advanced planning techniques, originally developed within the Artificial Intelligence (AI) community, may be (at least partially) resorted too, after a proper adaptation to the specific CG needs has been carried out.
HGML: a hypertext guideline markup language.
Hagerty, C. G.; Pickens, D.; Kulikowski, C.; Sonnenberg, F.
2000-01-01
Existing text-based clinical practice guidelines can be difficult to put into practice. While a growing number of such documents have gained acceptance in the medical community and contain a wealth of valuable information, the time required to digest them is substantial. Yet the expressive power, subtlety and flexibility of natural language pose challenges when designing computer tools that will help in their application. At the same time, formal computer languages typically lack such expressiveness and the effort required to translate existing documents into these languages may be costly. We propose a method based on the mark-up concept for converting text-based clinical guidelines into a machine-operable form. This allows existing guidelines to be manipulated by machine, and viewed in different formats at various levels of detail according to the needs of the practitioner, while preserving their originally published form. PMID:11079898
Automatic Word Sense Disambiguation of Acronyms and Abbreviations in Clinical Texts
ERIC Educational Resources Information Center
Moon, Sungrim
2012-01-01
The use of acronyms and abbreviations is increasing profoundly in the clinical domain in large part due to the greater adoption of electronic health record (EHR) systems and increased electronic documentation within healthcare. A single acronym or abbreviation may have multiple different meanings or senses. Comprehending the proper meaning of an…
Lindemann, Elizabeth A.; Chen, Elizabeth S.; Rajamani, Sripriya; Manohar, Nivedha; Wang, Yan; Melton, Genevieve B.
2017-01-01
There has been increasing recognition of the key role of social determinants like occupation on health. Given the relatively poor understanding of occupation information in electronic health records (EHRs), we sought to characterize occupation information within free-text clinical document sources. From six distinct clinical sources, 868 total occupation-related sentences were identified for the study corpus. Building off approaches from previous studies, refined annotation guidelines were created using the National Institute for Occupational Safety and Health Occupational Data for Health data model with elements added to increase granularity. Our corpus generated 2,005 total annotations representing 39 of 41 entity types from the enhanced data model. Highest frequency entities were: Occupation Description (17.7%); Employment Status – Not Specified (12.5%); Employer Name (11.0%); Subject (9.8%); Industry Description (6.2%). Our findings support the value for standardizing entry of EHR occupation information to improve data quality for improved patient care and secondary uses of this information. PMID:29295142
It’s about This and That: A Description of Anaphoric Expressions in Clinical Text
Wang, Yan; Melton, Genevieve B.; Pakhomov, Serguei
2011-01-01
Although anaphoric expressions are very common in biomedical and clinical documents, little work has been done to systematically characterize their use in clinical text. Samples of ‘it’, ‘this’, and ‘that’ expressions occurring in inpatient clinical notes from four metropolitan hospitals were analyzed using a combination of semi-automated and manual annotation techniques. We developed a rule-based approach to filter potential non-referential expressions. A physician then manually annotated 1000 potential referential instances to determine referent status and the antecedent of each referent expression. A distributional analysis of the three referring expressions in the entire corpus of notes demonstrates a high prevalence of anaphora and large variance in distributions of referential expressions with different notes. Our results confirm that anaphoric expressions are common in clinical texts. Effective co-reference resolution with anaphoric expressions remains an important challenge in medical natural language processing research. PMID:22195211
Carrell, David S.; Halgrim, Scott; Tran, Diem-Thy; Buist, Diana S. M.; Chubak, Jessica; Chapman, Wendy W.; Savova, Guergana
2014-01-01
The increasing availability of electronic health records (EHRs) creates opportunities for automated extraction of information from clinical text. We hypothesized that natural language processing (NLP) could substantially reduce the burden of manual abstraction in studies examining outcomes, like cancer recurrence, that are documented in unstructured clinical text, such as progress notes, radiology reports, and pathology reports. We developed an NLP-based system using open-source software to process electronic clinical notes from 1995 to 2012 for women with early-stage incident breast cancers to identify whether and when recurrences were diagnosed. We developed and evaluated the system using clinical notes from 1,472 patients receiving EHR-documented care in an integrated health care system in the Pacific Northwest. A separate study provided the patient-level reference standard for recurrence status and date. The NLP-based system correctly identified 92% of recurrences and estimated diagnosis dates within 30 days for 88% of these. Specificity was 96%. The NLP-based system overlooked 5 of 65 recurrences, 4 because electronic documents were unavailable. The NLP-based system identified 5 other recurrences incorrectly classified as nonrecurrent in the reference standard. If used in similar cohorts, NLP could reduce by 90% the number of EHR charts abstracted to identify confirmed breast cancer recurrence cases at a rate comparable to traditional abstraction. PMID:24488511
PDF text classification to leverage information extraction from publication reports.
Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha
2016-06-01
Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.
Integrated clinical workstations for image and text data capture, display, and teleconsultation.
Dayhoff, R; Kuzmak, P M; Kirin, G
1994-01-01
The Department of Veterans Affairs (VA) DHCP Imaging System digitally records clinically significant diagnostic images selected by medical specialists in a variety of hospital departments, including radiology, cardiology, gastroenterology, pathology, dermatology, hematology, surgery, podiatry, dental clinic, and emergency room. These images, which include true color and gray scale images, scanned documents, and electrocardiogram waveforms, are stored on network file servers and displayed on workstations located throughout a medical center. All images are managed by the VA's hospital information system (HIS), allowing integrated displays of text and image data from all medical specialties. Two VA medical centers currently have DHCP Imaging Systems installed, and other installations are underway.
Classifying clinical notes with pain assessment using machine learning.
Fodeh, Samah Jamal; Finch, Dezon; Bouayad, Lina; Luther, Stephen L; Ling, Han; Kerns, Robert D; Brandt, Cynthia
2017-12-26
Pain is a significant public health problem, affecting millions of people in the USA. Evidence has highlighted that patients with chronic pain often suffer from deficits in pain care quality (PCQ) including pain assessment, treatment, and reassessment. Currently, there is no intelligent and reliable approach to identify PCQ indicators inelectronic health records (EHR). Hereby, we used unstructured text narratives in the EHR to derive pain assessment in clinical notes for patients with chronic pain. Our dataset includes patients with documented pain intensity rating ratings > = 4 and initial musculoskeletal diagnoses (MSD) captured by (ICD-9-CM codes) in fiscal year 2011 and a minimal 1 year of follow-up (follow-up period is 3-yr maximum); with complete data on key demographic variables. A total of 92 patients with 1058 notes was used. First, we manually annotated qualifiers and descriptors of pain assessment using the annotation schema that we previously developed. Second, we developed a reliable classifier for indicators of pain assessment in clinical note. Based on our annotation schema, we found variations in documenting the subclasses of pain assessment. In positive notes, providers mostly documented assessment of pain site (67%) and intensity of pain (57%), followed by persistence (32%). In only 27% of positive notes, did providers document a presumed etiology for the pain complaint or diagnosis. Documentation of patients' reports of factors that aggravate pain was only present in 11% of positive notes. Random forest classifier achieved the best performance labeling clinical notes with pain assessment information, compared to other classifiers; 94, 95, 94, and 94% was observed in terms of accuracy, PPV, F1-score, and AUC, respectively. Despite the wide spectrum of research that utilizes machine learning in many clinical applications, none explored using these methods for pain assessment research. In addition, previous studies using large datasets to detect and analyze characteristics of patients with various types of pain have relied exclusively on billing and coded data as the main source of information. This study, in contrast, harnessed unstructured narrative text data from the EHR to detect pain assessment clinical notes. We developed a Random forest classifier to identify clinical notes with pain assessment information. Compared to other classifiers, ours achieved the best results in most of the reported metrics. Graphical abstract Framework for detecting pain assessment in clinical notes.
Topaz, Maxim; Radhakrishnan, Kavita; Lei, Victor; Zhou, Li
2016-01-01
Effective self-management can decrease up to 50% of heart failure hospitalizations. Unfortunately, self-management by patients with heart failure remains poor. This pilot study aimed to explore the use of text-mining to identify heart failure patients with ineffective self-management. We first built a comprehensive self-management vocabulary based on the literature and clinical notes review. We then randomly selected 545 heart failure patients treated within Partners Healthcare hospitals (Boston, MA, USA) and conducted a regular expression search with the compiled vocabulary within 43,107 interdisciplinary clinical notes of these patients. We found that 38.2% (n = 208) patients had documentation of ineffective heart failure self-management in the domains of poor diet adherence (28.4%), missed medical encounters (26.4%) poor medication adherence (20.2%) and non-specified self-management issues (e.g., "compliance issues", 34.6%). We showed the feasibility of using text-mining to identify patients with ineffective self-management. More natural language processing algorithms are needed to help busy clinicians identify these patients.
AAA (2010) CAPD clinical practice guidelines: need for an update.
DeBonis, David A
2017-09-01
Review and critique of the clinical value of the AAA CAPD guidance document in light of criteria for credible and useful guidance documents, as discussed by Field and Lohr. A qualitative review of the of the AAA CAPD guidelines using a framework by Field and Lohr to assess their relative value in supporting the assessment and management of CAPD referrals. Relevant literature available through electronic search tools and published texts were used along with the AAA CAPD guidance document and the chapter by Field and Lohr. The AAA document does not meet many of the key requirements discussed by Field and Lohr. It does not reflect the current literature, fails to help clinicians understand for whom auditory processing testing and intervention would be most useful, includes contradictory suggestions which reduce clarity and appears to avoid conclusions that might cast the CAPD construct in a negative light. It also does not include input from diverse affected groups. All of these reduce the document's credibility. The AAA CAPD guidance document will need to be updated and re-conceptualised in order to provide meaningful guidance for clinicians.
Semantic Web Technology for Mapping and Applying Clinical Functional Assessment Information
2015-05-01
summaries in free text (Figure 1) or form-based documents that are accessible as PDFs (Figure 2). Because there is no coding scheme for clinical...come from specific 2 The DBQs are VBA -21-0960M-14-ARE-Back.pdf, VBA -21-0960M-9-ARE-KneeLowerLeg.pdf, VBA -21-0960A- 1-ARE-ischemic, NEURO - TBI
Huang, Yang; Lowe, Henry J.; Hersh, William R.
2003-01-01
Objective: Despite the advantages of structured data entry, much of the patient record is still stored as unstructured or semistructured narrative text. The issue of representing clinical document content remains problematic. The authors' prior work using an automated UMLS document indexing system has been encouraging but has been affected by the generally low indexing precision of such systems. In an effort to improve precision, the authors have developed a context-sensitive document indexing model to calculate the optimal subset of UMLS source vocabularies used to index each document section. This pilot study was performed to evaluate the utility of this indexing approach on a set of clinical radiology reports. Design: A set of clinical radiology reports that had been indexed manually using UMLS concept descriptors was indexed automatically by the SAPHIRE indexing engine. Using the data generated by this process the authors developed a system that simulated indexing, at the document section level, of the same document set using many permutations of a subset of the UMLS constituent vocabularies. Measurements: The precision and recall scores generated by simulated indexing for each permutation of two or three UMLS constituent vocabularies were determined. Results: While there was considerable variation in precision and recall values across the different subtypes of radiology reports, the overall effect of this indexing strategy using the best combination of two or three UMLS constituent vocabularies was an improvement in precision without significant impact of recall. Conclusion: In this pilot study a contextual indexing strategy improved overall precision in a set of clinical radiology reports. PMID:12925544
Huang, Yang; Lowe, Henry J; Hersh, William R
2003-01-01
Despite the advantages of structured data entry, much of the patient record is still stored as unstructured or semistructured narrative text. The issue of representing clinical document content remains problematic. The authors' prior work using an automated UMLS document indexing system has been encouraging but has been affected by the generally low indexing precision of such systems. In an effort to improve precision, the authors have developed a context-sensitive document indexing model to calculate the optimal subset of UMLS source vocabularies used to index each document section. This pilot study was performed to evaluate the utility of this indexing approach on a set of clinical radiology reports. A set of clinical radiology reports that had been indexed manually using UMLS concept descriptors was indexed automatically by the SAPHIRE indexing engine. Using the data generated by this process the authors developed a system that simulated indexing, at the document section level, of the same document set using many permutations of a subset of the UMLS constituent vocabularies. The precision and recall scores generated by simulated indexing for each permutation of two or three UMLS constituent vocabularies were determined. While there was considerable variation in precision and recall values across the different subtypes of radiology reports, the overall effect of this indexing strategy using the best combination of two or three UMLS constituent vocabularies was an improvement in precision without significant impact of recall. In this pilot study a contextual indexing strategy improved overall precision in a set of clinical radiology reports.
Morrison, Frances P; Li, Li; Lai, Albert M; Hripcsak, George
2009-01-01
Electronic clinical documentation can be useful for activities such as public health surveillance, quality improvement, and research, but existing methods of de-identification may not provide sufficient protection of patient data. The general-purpose natural language processor MedLEE retains medical concepts while excluding the remaining text so, in addition to processing text into structured data, it may be able provide a secondary benefit of de-identification. Without modifying the system, the authors tested the ability of MedLEE to remove protected health information (PHI) by comparing 100 outpatient clinical notes with the corresponding XML-tagged output. Of 809 instances of PHI, 26 (3.2%) were detected in output as a result of processing and identification errors. However, PHI in the output was highly transformed, much appearing as normalized terms for medical concepts, potentially making re-identification more difficult. The MedLEE processor may be a good enhancement to other de-identification systems, both removing PHI and providing coded data from clinical text.
Experimenting with semantic web services to understand the role of NLP technologies in healthcare.
Jagannathan, V
2006-01-01
NLP technologies can play a significant role in healthcare where a predominant segment of the clinical documentation is in text form. In a graduate course focused on understanding semantic web services at West Virginia University, a class project was designed with the purpose of exploring potential use for NLP-based abstraction of clinical documentation. The role of NLP-technology was simulated using human abstractors and various workflows were investigated using public domain workflow and semantic web service technologies. This poster explores the potential use of NLP and the role of workflow and semantic web technologies in developing healthcare IT environments.
Integrated clinical workstations for image and text data capture, display, and teleconsultation.
Dayhoff, R.; Kuzmak, P. M.; Kirin, G.
1994-01-01
The Department of Veterans Affairs (VA) DHCP Imaging System digitally records clinically significant diagnostic images selected by medical specialists in a variety of hospital departments, including radiology, cardiology, gastroenterology, pathology, dermatology, hematology, surgery, podiatry, dental clinic, and emergency room. These images, which include true color and gray scale images, scanned documents, and electrocardiogram waveforms, are stored on network file servers and displayed on workstations located throughout a medical center. All images are managed by the VA's hospital information system (HIS), allowing integrated displays of text and image data from all medical specialties. Two VA medical centers currently have DHCP Imaging Systems installed, and other installations are underway. PMID:7949899
Capturing structured, pulmonary disease-specific data elements in electronic health records.
Gronkiewicz, Cynthia; Diamond, Edward J; French, Kim D; Christodouleas, John; Gabriel, Peter E
2015-04-01
Electronic health records (EHRs) have the potential to improve health-care quality by allowing providers to make better decisions at the point of care based on electronically aggregated data and by facilitating clinical research. These goals are easier to achieve when key, disease-specific clinical information is documented as structured data elements (SDEs) that computers can understand and process, rather than as free-text/natural-language narrative. This article reviews the benefits of capturing disease-specific SDEs. It highlights several design and implementation considerations, including the impact on efficiency and expressivity of clinical documentation and the importance of adhering to data standards when available. Pulmonary disease-specific examples of collection instruments are provided from two commonly used commercial EHRs. Future developments that can leverage SDEs to improve clinical quality and research are discussed.
Eminaga, Okyaz; Hinkelammert, Reemt; Semjonow, Axel; Neumann, Joerg; Abbas, Mahmoud; Koepke, Thomas; Bettendorf, Olaf; Eltze, Elke; Dugas, Martin
2010-11-15
The pathology report of radical prostatectomy specimens plays an important role in clinical decisions and the prognostic evaluation in Prostate Cancer (PCa). The anatomical schema is a helpful tool to document PCa extension for clinical and research purposes. To achieve electronic documentation and analysis, an appropriate documentation model for anatomical schemas is needed. For this purpose we developed cMDX. The document architecture of cMDX was designed according to Open Packaging Conventions by separating the whole data into template data and patient data. Analogue custom XML elements were considered to harmonize the graphical representation (e.g. tumour extension) with the textual data (e.g. histological patterns). The graphical documentation was based on the four-layer visualization model that forms the interaction between different custom XML elements. Sensible personal data were encrypted with a 256-bit cryptographic algorithm to avoid misuse. In order to assess the clinical value, we retrospectively analysed the tumour extension in 255 patients after radical prostatectomy. The pathology report with cMDX can represent pathological findings of the prostate in schematic styles. Such reports can be integrated into the hospital information system. "cMDX" documents can be converted into different data formats like text, graphics and PDF. Supplementary tools like cMDX Editor and an analyser tool were implemented. The graphical analysis of 255 prostatectomy specimens showed that PCa were mostly localized in the peripheral zone (Mean: 73% ± 25). 54% of PCa showed a multifocal growth pattern. cMDX can be used for routine histopathological reporting of radical prostatectomy specimens and provide data for scientific analysis.
2010-01-01
Background The pathology report of radical prostatectomy specimens plays an important role in clinical decisions and the prognostic evaluation in Prostate Cancer (PCa). The anatomical schema is a helpful tool to document PCa extension for clinical and research purposes. To achieve electronic documentation and analysis, an appropriate documentation model for anatomical schemas is needed. For this purpose we developed cMDX. Methods The document architecture of cMDX was designed according to Open Packaging Conventions by separating the whole data into template data and patient data. Analogue custom XML elements were considered to harmonize the graphical representation (e.g. tumour extension) with the textual data (e.g. histological patterns). The graphical documentation was based on the four-layer visualization model that forms the interaction between different custom XML elements. Sensible personal data were encrypted with a 256-bit cryptographic algorithm to avoid misuse. In order to assess the clinical value, we retrospectively analysed the tumour extension in 255 patients after radical prostatectomy. Results The pathology report with cMDX can represent pathological findings of the prostate in schematic styles. Such reports can be integrated into the hospital information system. "cMDX" documents can be converted into different data formats like text, graphics and PDF. Supplementary tools like cMDX Editor and an analyser tool were implemented. The graphical analysis of 255 prostatectomy specimens showed that PCa were mostly localized in the peripheral zone (Mean: 73% ± 25). 54% of PCa showed a multifocal growth pattern. Conclusions cMDX can be used for routine histopathological reporting of radical prostatectomy specimens and provide data for scientific analysis. PMID:21078179
Using Concept Relations to Improve Ranking in Information Retrieval
Price, Susan L.; Delcambre, Lois M.
2005-01-01
Despite improved search engine technology, most searches return numerous documents not directly related to the query. This problem is mitigated if relevant documents appear high on a ranked list of search results. We propose that some queries and the underlying information needs can be modeled as relationships between concepts (relations), and we match relations in queries to relations in documents to try to improve ranking of search results. We investigate four techniques to identify two relationships important in medicine, causes and treats, to improve the ranking of medical text documents relevant to clinical questions about causation and treatment. Preliminary results suggest that identifying relation instances can improve the ranking of search results. PMID:16779114
Hanauer, David A; Miela, Gretchen; Chinnaiyan, Arul M; Chang, Alfred E; Blayney, Douglas W
2007-11-01
The American College of Surgeons mandates the maintenance of a cancer registry for hospitals seeking accreditation. At the University of Michigan Health System, more than 90% of all registry patients are identified by manual review, a method common to many institutions. We hypothesized that an automated computer system could accurately perform this time- and labor-intensive task. We created a tool to automatically scan free-text medical documents for terms relevant to cancer. We developed custom-made lists containing approximately 2,500 terms and phrases and 800 SNOMED codes. Text is processed by the Case Finding Engine (CaFE), and relevant terms are highlighted for review by a registrar and used to populate the registry database. We tested our system by comparing results from the CaFE to those by trained registrars who read through 2,200 pathology reports and marked relevant cases for the registry. The clinical documentation (eg, electronic chart notes) of an additional 476 patients was also reviewed by registrars and compared with the automated process by the CaFE. For pathology reports, the sensitivity for automated case identification was 100%, but specificity was 85.0%. For clinical documentation, sensitivity was 100% and specificity was 73.7%. Types of errors made by the CaFE were categorized to direct additional improvements. Use of the CaFE has resulted in a considerable increase in the number of cases added to the registry each month. The system has been well accepted by our registrars. CaFE can improve the accuracy and efficiency of tumor registry personnel and helps ensure that cancer cases are not overlooked.
Validating a strategy for psychosocial phenotyping using a large corpus of clinical text.
Gundlapalli, Adi V; Redd, Andrew; Carter, Marjorie; Divita, Guy; Shen, Shuying; Palmer, Miland; Samore, Matthew H
2013-12-01
To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6-0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype.
Validating a strategy for psychosocial phenotyping using a large corpus of clinical text
Gundlapalli, Adi V; Redd, Andrew; Carter, Marjorie; Divita, Guy; Shen, Shuying; Palmer, Miland; Samore, Matthew H
2013-01-01
Objective To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. Materials and methods From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. Results A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6–0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). Conclusions Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype. PMID:24169276
Simpao, Allan F; Tan, Jonathan M; Lingappan, Arul M; Gálvez, Jorge A; Morgan, Sherry E; Krall, Michael A
2017-10-01
Anesthesia information management systems (AIMS) are sophisticated hardware and software technology solutions that can provide electronic feedback to anesthesia providers. This feedback can be tailored to provide clinical decision support (CDS) to aid clinicians with patient care processes, documentation compliance, and resource utilization. We conducted a systematic review of peer-reviewed articles on near real-time and point-of-care CDS within AIMS using the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols. Studies were identified by searches of the electronic databases Medline and EMBASE. Two reviewers screened studies based on title, abstract, and full text. Studies that were similar in intervention and desired outcome were grouped into CDS categories. Three reviewers graded the evidence within each category. The final analysis included 25 articles on CDS as implemented within AIMS. CDS categories included perioperative antibiotic prophylaxis, post-operative nausea and vomiting prophylaxis, vital sign monitors and alarms, glucose management, blood pressure management, ventilator management, clinical documentation, and resource utilization. Of these categories, the reviewers graded perioperative antibiotic prophylaxis and clinical documentation as having strong evidence per the peer reviewed literature. There is strong evidence for the inclusion of near real-time and point-of-care CDS in AIMS to enhance compliance with perioperative antibiotic prophylaxis and clinical documentation. Additional research is needed in many other areas of AIMS-based CDS.
Clinical extracts of biomedical literature for patient-centered problem solving.
Florance, V
1996-01-01
This paper reports on a four-part qualitative research project aimed at designing an online document surrogate tailored to the needs of physicians seeking biomedical literature for use in clinical problem solving. The clinical extract, designed in collaboration with three practicing physicians, combines traditional elements of the MEDLINE record (e.g., title, author, source, abstract) with new elements (e.g., table captions, text headings, case profiles) suggested by the physicians. Specifications for the prototype clinical extract were developed through a series of relevance-scoring exercises and semi-structured interviews. For six clinical questions, three physicians assessed the applicability of selected articles and their document surrogates, articulating relevance criteria and reasons for their judgments. A prototype clinical extract based on their suggestions was developed, tested, evaluated, and revised. The final version includes content and format aids to make the extract easy to use. The goals, methods, and outcomes of the research study are summarized, and a template of the final design is provided. PMID:8883986
State-of-the-art Anonymization of Medical Records Using an Iterative Machine Learning Framework
Szarvas, György; Farkas, Richárd; Busa-Fekete, Róbert
2007-01-01
Objective The anonymization of medical records is of great importance in the human life sciences because a de-identified text can be made publicly available for non-hospital researchers as well, to facilitate research on human diseases. Here the authors have developed a de-identification model that can successfully remove personal health information (PHI) from discharge records to make them conform to the guidelines of the Health Information Portability and Accountability Act. Design We introduce here a novel, machine learning-based iterative Named Entity Recognition approach intended for use on semi-structured documents like discharge records. Our method identifies PHI in several steps. First, it labels all entities whose tags can be inferred from the structure of the text and it then utilizes this information to find further PHI phrases in the flow text parts of the document. Measurements Following the standard evaluation method of the first Workshop on Challenges in Natural Language Processing for Clinical Data, we used token-level Precision, Recall and Fβ=1 measure metrics for evaluation. Results Our system achieved outstanding accuracy on the standard evaluation dataset of the de-identification challenge, with an F measure of 99.7534% for the best submitted model. Conclusion We can say that our system is competitive with the current state-of-the-art solutions, while we describe here several techniques that can be beneficial in other tasks that need to handle structured documents such as clinical records. PMID:17823086
NASA Astrophysics Data System (ADS)
Kessel, Kerstin A.; Bougatf, Nina; Bohn, Christian; Engelmann, Uwe; Oetzel, Dieter; Bendl, Rolf; Debus, Jürgen; Combs, Stephanie E.
2012-02-01
Conducting clinical studies is rather difficult because of the large variety of voluminous datasets, different documentation styles, and various information systems, especially in radiation oncology. In this paper, we describe our development of a web-based documentation system with first approaches of automatic statistical analyses for transnational and multicenter clinical studies in particle therapy. It is possible to have immediate access to all patient information and exchange, store, process, and visualize text data, all types of DICOM images, especially DICOM RT, and any other multimedia data. Accessing the documentation system and submitting clinical data is possible for internal and external users (e.g. referring physicians from abroad, who are seeking the new technique of particle therapy for their patients). Thereby, security and privacy protection is ensured with the encrypted https protocol, client certificates, and an application gateway. Furthermore, all data can be pseudonymized. Integrated into the existing hospital environment, patient data is imported via various interfaces over HL7-messages and DICOM. Several further features replace manual input wherever possible and ensure data quality and entirety. With a form generator, studies can be individually designed to fit specific needs. By including all treated patients (also non-study patients), we gain the possibility for overall large-scale, retrospective analyses. Having recently begun documentation of our first six clinical studies, it has become apparent that the benefits lie in the simplification of research work, better study analyses quality and ultimately, the improvement of treatment concepts by evaluating the effectiveness of particle therapy.
Terminology extraction from medical texts in Polish
2014-01-01
Background Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Results Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Conclusions Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts. PMID:24976943
Terminology extraction from medical texts in Polish.
Marciniak, Małgorzata; Mykowiecka, Agnieszka
2014-01-01
Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were able to filter out all improperly constructed noun phrases from the top of the list. Careful choice of noun phrases is crucial to the usefulness of the created terminological resource in applications such as lexicon construction or acquisition of semantic relations from texts.
Wilson, Richard A.; Chapman, Wendy W.; DeFries, Shawn J.; Becich, Michael J.; Chapman, Brian E.
2010-01-01
Background: Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents. Methods: Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system. Results: Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively. Conclusion: We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic- Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations. PMID:21031012
Rapid Implementation of Inpatient Electronic Physician Documentation at an Academic Hospital
Hahn, J.S.; Bernstein, J.A.; McKenzie, R.B.; King, B.J.; Longhurst, C.A.
2012-01-01
Electronic physician documentation is an essential element of a complete electronic medical record (EMR). At Lucile Packard Children’s Hospital, a teaching hospital affiliated with Stanford University, we implemented an inpatient electronic documentation system for physicians over a 12-month period. Using an EMR-based free-text editor coupled with automated import of system data elements, we were able to achieve voluntary, widespread adoption of the electronic documentation process. When given the choice between electronic versus dictated report creation, the vast majority of users preferred the electronic method. In addition to increasing the legibility and accessibility of clinical notes, we also decreased the volume of dictated notes and scanning of handwritten notes, which provides the opportunity for cost savings to the institution. PMID:23620718
Shiffman, Richard N; Michel, George; Essaihi, Abdelwaheb; Thornquist, Elizabeth
2004-01-01
A gap exists between the information contained in published clinical practice guidelines and the knowledge and information that are necessary to implement them. This work describes a process to systematize and make explicit the translation of document-based knowledge into workflow-integrated clinical decision support systems. This approach uses the Guideline Elements Model (GEM) to represent the guideline knowledge. Implementation requires a number of steps to translate the knowledge contained in guideline text into a computable format and to integrate the information into clinical workflow. The steps include: (1) selection of a guideline and specific recommendations for implementation, (2) markup of the guideline text, (3) atomization, (4) deabstraction and (5) disambiguation of recommendation concepts, (6) verification of rule set completeness, (7) addition of explanations, (8) building executable statements, (9) specification of origins of decision variables and insertions of recommended actions, (10) definition of action types and selection of associated beneficial services, (11) choice of interface components, and (12) creation of requirement specification. The authors illustrate these component processes using examples drawn from recent experience translating recommendations from the National Heart, Lung, and Blood Institute's guideline on management of chronic asthma into a workflow-integrated decision support system that operates within the Logician electronic health record system. Using the guideline document as a knowledge source promotes authentic translation of domain knowledge and reduces the overall complexity of the implementation task. From this framework, we believe that a better understanding of activities involved in guideline implementation will emerge.
Terminology Services: Standard Terminologies to Control Health Vocabulary.
González Bernaldo de Quirós, Fernán; Otero, Carlos; Luna, Daniel
2018-04-22
Healthcare Information Systems should capture clinical data in a structured and preferably coded format. This is crucial for data exchange between health information systems, epidemiological analysis, quality and research, clinical decision support systems, administrative functions, among others. Structured data entry is an obstacle for the usability of electronic health record (EHR) applications and their acceptance by physicians who prefer to document patient EHRs using "free text". Natural language allows for rich expressiveness but at the same time is ambiguous; it has great dependence on context and uses jargon and acronyms. Although much progress has been made in knowledge and natural language processing techniques, the result is not yet satisfactory enough for the use of free text in all dimensions of clinical documentation. In order to address the trade-off between capturing data with free text and at the same time coding data for computer processing, numerous terminological systems for the systematic recording of clinical data have been developed. The purpose of terminology services consists of representing facts that happen in the real world through database management in order to allow for semantic interoperability and computerized applications. These systems interrelate concepts of a particular domain and provide references to related terms with standards codes. In this way, standard terminologies allow the creation of a controlled medical vocabulary, making terminology services a fundamental component for health data management in the healthcare environment. The Hospital Italiano de Buenos Aires has been working in the development of its own terminology server. This work describes its experience in the field. Georg Thieme Verlag KG Stuttgart.
Rosenthal, David I
2013-06-01
With widespread adoption of electronic health records (EHRs) and electronic clinical documentation, health care organizations now have greater faculty to review clinical data and evaluate the efficacy of quality improvement efforts. Unfortunately, I believe there is a fundamental gap between actual health care delivery and what we document in the current EHR systems. This process of capturing the patient encounter, which I'll refer to as transcription, is prone to significant data loss due to inadequate methods of data capture, multiple points of view, and bias and subjectivity in the transcriptional process. Our current EHR, text-based clinical documentation systems are lossy abstractions - one sided accounts of what take place between patients and providers. Our clinical notes contain the breadcrumbs of relationships, conversations, physical exams, and procedures but often lack the ability to capture the form, the emotions, the images, the nonverbal communication, and the actual narrative of interactions between human beings. I believe that a video record, in conjunction with objective transcriptional services and other forms of data capture, may provide a closer approximation to the truth of health care delivery and may be a valuable tool for healthcare improvement. Copyright © 2013 Elsevier Inc. All rights reserved.
Interactive publications: creation and usage
NASA Astrophysics Data System (ADS)
Thoma, George R.; Ford, Glenn; Chung, Michael; Vasudevan, Kirankumar; Antani, Sameer
2006-02-01
As envisioned here, an "interactive publication" has similarities to multimedia documents that have been in existence for a decade or more, but possesses specific differentiating characteristics. In common usage, the latter refers to online entities that, in addition to text, consist of files of images and video clips residing separately in databases, rarely providing immediate context to the document text. While an interactive publication has many media objects as does the "traditional" multimedia document, it is a self-contained document, either as a single file with media files embedded within it, or as a "folder" containing tightly linked media files. The main characteristic that differentiates an interactive publication from a traditional multimedia document is that the reader would be able to reuse the media content for analysis and presentation, and to check the underlying data and possibly derive alternative conclusions leading, for example, to more in-depth peer reviews. We have created prototype publications containing paginated text and several media types encountered in the biomedical literature: 3D animations of anatomic structures; graphs, charts and tabular data; cell development images (video sequences); and clinical images such as CT, MRI and ultrasound in the DICOM format. This paper presents developments to date including: a tool to convert static tables or graphs into interactive entities, authoring procedures followed to create prototypes, and advantages and drawbacks of each of these platforms. It also outlines future work including meeting the challenge of network distribution for these large files.
Han, Yi; Faulkner, Melissa Spezia; Fritz, Heather; Fadoju, Doris; Muir, Andrew; Abowd, Gregory D.; Head, Lauren; Arriaga, Rosa I.
2015-01-01
Adolescents with type 1 diabetes typically receive clinical care every 3 months. Between visits, diabetes-related issues may not be frequently reflected, learned, and documented by the patients, limiting their self-awareness and knowledge about their condition. We designed a text-messaging system to help resolve this problem. In a pilot, randomized controlled trial with 30 adolescents, we examined the effect of text messages about symptom awareness and diabetes knowledge on glucose control and quality of life. The intervention group that received more text messages between visits had significant improvements in quality of life. PMID:25720675
Löpprich, Martin; Krauss, Felix; Ganzinger, Matthias; Senghas, Karsten; Riezler, Stefan; Knaup, Petra
2016-08-05
In the Multiple Myeloma clinical registry at Heidelberg University Hospital, most data are extracted from discharge letters. Our aim was to analyze if it is possible to make the manual documentation process more efficient by using methods of natural language processing for multiclass classification of free-text diagnostic reports to automatically document the diagnosis and state of disease of myeloma patients. The first objective was to create a corpus consisting of free-text diagnosis paragraphs of patients with multiple myeloma from German diagnostic reports, and its manual annotation of relevant data elements by documentation specialists. The second objective was to construct and evaluate a framework using different NLP methods to enable automatic multiclass classification of relevant data elements from free-text diagnostic reports. The main diagnoses paragraph was extracted from the clinical report of one third randomly selected patients of the multiple myeloma research database from Heidelberg University Hospital (in total 737 selected patients). An EDC system was setup and two data entry specialists performed independently a manual documentation of at least nine specific data elements for multiple myeloma characterization. Both data entries were compared and assessed by a third specialist and an annotated text corpus was created. A framework was constructed, consisting of a self-developed package to split multiple diagnosis sequences into several subsequences, four different preprocessing steps to normalize the input data and two classifiers: a maximum entropy classifier (MEC) and a support vector machine (SVM). In total 15 different pipelines were examined and assessed by a ten-fold cross-validation, reiterated 100 times. For quality indication the average error rate and the average F1-score were conducted. For significance testing the approximate randomization test was used. The created annotated corpus consists of 737 different diagnoses paragraphs with a total number of 865 coded diagnosis. The dataset is publicly available in the supplementary online files for training and testing of further NLP methods. Both classifiers showed low average error rates (MEC: 1.05; SVM: 0.84) and high F1-scores (MEC: 0.89; SVM: 0.92). However the results varied widely depending on the classified data element. Preprocessing methods increased this effect and had significant impact on the classification, both positive and negative. The automatic diagnosis splitter increased the average error rate significantly, even if the F1-score decreased only slightly. The low average error rates and high average F1-scores of each pipeline demonstrate the suitability of the investigated NPL methods. However, it was also shown that there is no best practice for an automatic classification of data elements from free-text diagnostic reports.
Meystre, Stéphane M; Ferrández, Óscar; Friedlin, F Jeffrey; South, Brett R; Shen, Shuying; Samore, Matthew H
2014-08-01
As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic PHI annotations from our best-of-breed VHA clinical text de-identification system (nicknamed 'BoB'). Overall, only 0.81% of the clinical information exactly overlapped with PHI, and 1.78% partly overlapped. We conclude that automated text de-identification's impact on clinical information is small, but not negligible, and that improved clinical acronyms and eponyms disambiguation could significantly reduce this impact. Copyright © 2014 Elsevier Inc. All rights reserved.
Jackson MSc, Richard G.; Ball, Michael; Patel, Rashmi; Hayes, Richard D.; Dobson, Richard J.B.; Stewart, Robert
2014-01-01
Observational research using data from electronic health records (EHR) is a rapidly growing area, which promises both increased sample size and data richness - therefore unprecedented study power. However, in many medical domains, large amounts of potentially valuable data are contained within the free text clinical narrative. Manually reviewing free text to obtain desired information is an inefficient use of researcher time and skill. Previous work has demonstrated the feasibility of applying Natural Language Processing (NLP) to extract information. However, in real world research environments, the demand for NLP skills outweighs supply, creating a bottleneck in the secondary exploitation of the EHR. To address this, we present TextHunter, a tool for the creation of training data, construction of concept extraction machine learning models and their application to documents. Using confidence thresholds to ensure high precision (>90%), we achieved recall measurements as high as 99% in real world use cases. PMID:25954379
Hamilton, Samina; Bernstein, Aaron B; Blakey, Graham; Fagan, Vivien; Farrow, Tracy; Jordan, Debbie; Seiler, Walther; Shannon, Anna; Gertel, Art
2016-01-01
Interventional clinical studies conducted in the regulated drug research environment are reported using International Council for Harmonisation (ICH) regulatory guidance documents: ICH E3 on the structure and content of clinical study reports (CSRs) published in 1995 and ICH E3 supplementary Questions & Answers (Q & A) published in 2012.Since the ICH guidance documents were published, there has been heightened awareness of the importance of disclosure of clinical study results. The use of the CSR as a key source document to fulfil emerging obligations has resulted in a re-examination of how ICH guidelines are applied in CSR preparation. The dynamic regulatory and modern drug development environments create emerging reporting challenges. Regulatory medical writing and statistical professionals developed Clarity and Openness in Reporting: E3-based (CORE) Reference over a 2-year period. Stakeholders contributing expertise included a global industry association, regulatory agency, patient advocate, academic and Principal Investigator representatives. CORE Reference should help authors navigate relevant guidelines as they create CSR content relevant for today's studies. It offers practical suggestions for developing CSRs that will require minimum redaction and modification prior to public disclosure.CORE Reference comprises a Preface, followed by the actual resource. The Preface clarifies intended use and underlying principles that inform resource utility. The Preface lists references contributing to development of the resource, which broadly fall into 'regulatory' and 'public disclosure' categories. The resource includes ICH E3 guidance text, ICH E3 Q & A 2012-derived guidance text and CORE Reference text, distinguished from one another through the use of shading. Rationale comments are used throughout for clarification purposes.A separate mapping tool comparing ICH E3 sectional structure and CORE Reference sectional structure is also provided.Together, CORE Reference and the mapping tool constitute the user manual. This publication is intended to enhance the use, understanding and dissemination of CORE Reference.The CORE Reference user manual and the associated website (http://www.core-reference.org) should improve the reporting of interventional clinical studies.Periodic updates of CORE Reference are planned to maintain its relevance. CORE Reference was registered with http://www.equator-network.org on 23 March 2015.
Medical Language Processing for Knowledge Representation and Retrievals
Lyman, Margaret; Sager, Naomi; Chi, Emile C.; Tick, Leo J.; Nhan, Ngo Thanh; Su, Yun; Borst, Francois; Scherrer, Jean-Raoul
1989-01-01
The Linguistic String Project-Medical Language Processor, a system for computer analysis of narrative patient documents in English, is being adapted for French Lettres de Sortie. The system converts the free-text input to a semantic representation which is then mapped into a relational database. Retrievals of clinical data from the database are described.
NASA Technical Reports Server (NTRS)
Grams, R. R.
1982-01-01
A system designed to access a large range of available medical textbook information in an online interactive fashion is described. A high level query type database manager, INQUIRE, is used. Operating instructions, system flow diagrams, database descriptions, text generation, and error messages are discussed. User information is provided.
Calvert, Melanie; Kyte, Derek; Duffy, Helen; Gheorghe, Adrian; Mercieca-Bebber, Rebecca; Ives, Jonathan; Draper, Heather; Brundage, Michael; Blazeby, Jane; King, Madeleine
2014-01-01
Background Evidence suggests there are inconsistencies in patient-reported outcome (PRO) assessment and reporting in clinical trials, which may limit the use of these data to inform patient care. For trials with a PRO endpoint, routine inclusion of key PRO information in the protocol may help improve trial conduct and the reporting and appraisal of PRO results; however, it is currently unclear exactly what PRO-specific information should be included. The aim of this review was to summarize the current PRO-specific guidance for clinical trial protocol developers. Methods and Findings We searched the MEDLINE, EMBASE, CINHAL and Cochrane Library databases (inception to February 2013) for PRO-specific guidance regarding trial protocol development. Further guidance documents were identified via Google, Google scholar, requests to members of the UK Clinical Research Collaboration registered clinical trials units and international experts. Two independent investigators undertook title/abstract screening, full text review and data extraction, with a third involved in the event of disagreement. 21,175 citations were screened and 54 met the inclusion criteria. Guidance documents were difficult to access: electronic database searches identified just 8 documents, with the remaining 46 sourced elsewhere (5 from citation tracking, 27 from hand searching, 7 from the grey literature review and 7 from experts). 162 unique PRO-specific protocol recommendations were extracted from included documents. A further 10 PRO recommendations were identified relating to supporting trial documentation. Only 5/162 (3%) recommendations appeared in ≥50% of guidance documents reviewed, indicating a lack of consistency. Conclusions PRO-specific protocol guidelines were difficult to access, lacked consistency and may be challenging to implement in practice. There is a need to develop easily accessible consensus-driven PRO protocol guidance. Guidance should be aimed at ensuring key PRO information is routinely included in appropriate trial protocols, in order to facilitate rigorous collection/reporting of PRO data, to effectively inform patient care. PMID:25333995
Shiffman, Richard N.; Michel, George; Essaihi, Abdelwaheb; Thornquist, Elizabeth
2004-01-01
Objective: A gap exists between the information contained in published clinical practice guidelines and the knowledge and information that are necessary to implement them. This work describes a process to systematize and make explicit the translation of document-based knowledge into workflow-integrated clinical decision support systems. Design: This approach uses the Guideline Elements Model (GEM) to represent the guideline knowledge. Implementation requires a number of steps to translate the knowledge contained in guideline text into a computable format and to integrate the information into clinical workflow. The steps include: (1) selection of a guideline and specific recommendations for implementation, (2) markup of the guideline text, (3) atomization, (4) deabstraction and (5) disambiguation of recommendation concepts, (6) verification of rule set completeness, (7) addition of explanations, (8) building executable statements, (9) specification of origins of decision variables and insertions of recommended actions, (10) definition of action types and selection of associated beneficial services, (11) choice of interface components, and (12) creation of requirement specification. Results: The authors illustrate these component processes using examples drawn from recent experience translating recommendations from the National Heart, Lung, and Blood Institute's guideline on management of chronic asthma into a workflow-integrated decision support system that operates within the Logician electronic health record system. Conclusion: Using the guideline document as a knowledge source promotes authentic translation of domain knowledge and reduces the overall complexity of the implementation task. From this framework, we believe that a better understanding of activities involved in guideline implementation will emerge. PMID:15187061
Ambert, Kyle H; Cohen, Aaron M
2009-01-01
OBJECTIVE Free-text clinical reports serve as an important part of patient care management and clinical documentation of patient disease and treatment status. Free-text notes are commonplace in medical practice, but remain an under-used source of information for clinical and epidemiological research, as well as personalized medicine. The authors explore the challenges associated with automatically extracting information from clinical reports using their submission to the Integrating Informatics with Biology and the Bedside (i2b2) 2008 Natural Language Processing Obesity Challenge Task. DESIGN A text mining system for classifying patient comorbidity status, based on the information contained in clinical reports. The approach of the authors incorporates a variety of automated techniques, including hot-spot filtering, negated concept identification, zero-vector filtering, weighting by inverse class-frequency, and error-correcting of output codes with linear support vector machines. MEASUREMENTS Performance was evaluated in terms of the macroaveraged F1 measure. RESULTS The automated system performed well against manual expert rule-based systems, finishing fifth in the Challenge's intuitive task, and 13(th) in the textual task. CONCLUSIONS The system demonstrates that effective comorbidity status classification by an automated system is possible.
Beyond Readability: Investigating Coherence of Clinical Text for Consumers
Hetzel, Scott; Dalrymple, Prudence; Keselman, Alla
2011-01-01
Background A basic tenet of consumer health informatics is that understandable health resources empower the public. Text comprehension holds great promise for helping to characterize consumer problems in understanding health texts. The need for efficient ways to assess consumer-oriented health texts and the availability of computationally supported tools led us to explore the effect of various text characteristics on readers’ understanding of health texts, as well as to develop novel approaches to assessing these characteristics. Objective The goal of this study was to compare the impact of two different approaches to enhancing readability, and three interventions, on individuals’ comprehension of short, complex passages of health text. Methods Participants were 80 university staff, faculty, or students. Each participant was asked to “retell” the content of two health texts: one a clinical trial in the domain of diabetes mellitus, and the other typical Visit Notes. These texts were transformed for the intervention arms of the study. Two interventions provided terminology support via (1) standard dictionary or (2) contextualized vocabulary definitions. The third intervention provided coherence improvement. We assessed participants’ comprehension of the clinical texts through propositional analysis, an open-ended questionnaire, and analysis of the number of errors made. Results For the clinical trial text, the effect of text condition was not significant in any of the comparisons, suggesting no differences in recall, despite the varying levels of support (P = .84). For the Visit Note, however, the difference in the median total propositions recalled between the Coherent and the (Original + Dictionary) conditions was significant (P = .04). This suggests that participants in the Coherent condition recalled more of the original Visit Notes content than did participants in the Original and the Dictionary conditions combined. However, no difference was seen between (Original + Dictionary) and Vocabulary (P = .36) nor Coherent and Vocabulary (P = .62). No statistically significant effect of any document transformation was found either in the open-ended questionnaire (clinical trial: P = .86, Visit Note: P = .20) or in the error rate (clinical trial: P = .47, Visit Note: P = .25). However, post hoc power analysis suggested that increasing the sample size by approximately 6 participants per condition would result in a significant difference for the Visit Note, but not for the clinical trial text. Conclusions Statistically, the results of this study attest that improving coherence has a small effect on consumer comprehension of clinical text, but the task is extremely labor intensive and not scalable. Further research is needed using texts from more diverse clinical domains and more heterogeneous participants, including actual patients. Since comprehensibility of clinical text appears difficult to automate, informatics support tools may most productively support the health care professionals tasked with making clinical information understandable to patients. PMID:22138127
Wong, Newton A C S; Amary, Fernanda; Butler, Rachel; Byers, Richard; Gonzalez, David; Haynes, Harry R; Ilyas, Mohammad; Salto-Tellez, Manuel; Taniere, Philippe
2018-05-01
The use of biologics targeted to the human epidermal growth factor receptor 2 (HER2) protein is the latest addition to the armamentarium used to fight advanced gastric or gastro-oesophageal junction adenocarcinoma. The decision to treat with the biologic trastuzumab is completely dependent on HER2 testing of tumour tissue. In 2017, the College of American Pathologists, American Society for Clinical Pathology and the American Society of Clinical Oncology jointly published guidelines for HER2 testing and clinical decision making in gastro-oesophageal adenocarcinoma. The Association of Clinical Pathologists Molecular Pathology and Diagnostics Committee has issued the following document as a commentary of these guidelines and, in parallel, to provide guidance on HER2 testing in National Health Service pathology departments within the UK. This guidance covers issues related to case selection, preanalytical aspects, analysis and interpretation of such HER2 testing. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Jones, B E; South, B R; Shao, Y; Lu, C C; Leng, J; Sauer, B C; Gundlapalli, A V; Samore, M H; Zeng, Q
2018-01-01
Identifying pneumonia using diagnosis codes alone may be insufficient for research on clinical decision making. Natural language processing (NLP) may enable the inclusion of cases missed by diagnosis codes. This article (1) develops a NLP tool that identifies the clinical assertion of pneumonia from physician emergency department (ED) notes, and (2) compares classification methods using diagnosis codes versus NLP against a gold standard of manual chart review to identify patients initially treated for pneumonia. Among a national population of ED visits occurring between 2006 and 2012 across the Veterans Affairs health system, we extracted 811 physician documents containing search terms for pneumonia for training, and 100 random documents for validation. Two reviewers annotated span- and document-level classifications of the clinical assertion of pneumonia. An NLP tool using a support vector machine was trained on the enriched documents. We extracted diagnosis codes assigned in the ED and upon hospital discharge and calculated performance characteristics for diagnosis codes, NLP, and NLP plus diagnosis codes against manual review in training and validation sets. Among the training documents, 51% contained clinical assertions of pneumonia; in the validation set, 9% were classified with pneumonia, of which 100% contained pneumonia search terms. After enriching with search terms, the NLP system alone demonstrated a recall/sensitivity of 0.72 (training) and 0.55 (validation), and a precision/positive predictive value (PPV) of 0.89 (training) and 0.71 (validation). ED-assigned diagnostic codes demonstrated lower recall/sensitivity (0.48 and 0.44) but higher precision/PPV (0.95 in training, 1.0 in validation); the NLP system identified more "possible-treated" cases than diagnostic coding. An approach combining NLP and ED-assigned diagnostic coding classification achieved the best performance (sensitivity 0.89 and PPV 0.80). System-wide application of NLP to clinical text can increase capture of initial diagnostic hypotheses, an important inclusion when studying diagnosis and clinical decision-making under uncertainty. Schattauer GmbH Stuttgart.
2012-01-01
Background To establish a common database on particle therapy for the evaluation of clinical studies integrating a large variety of voluminous datasets, different documentation styles, and various information systems, especially in the field of radiation oncology. Methods We developed a web-based documentation system for transnational and multicenter clinical studies in particle therapy. 560 patients have been treated from November 2009 to September 2011. Protons, carbon ions or a combination of both, as well as a combination with photons were applied. To date, 12 studies have been initiated and more are in preparation. Results It is possible to immediately access all patient information and exchange, store, process, and visualize text data, any DICOM images and multimedia data. Accessing the system and submitting clinical data is possible for internal and external users. Integrated into the hospital environment, data is imported both manually and automatically. Security and privacy protection as well as data validation and verification are ensured. Studies can be designed to fit individual needs. Conclusions The described database provides a basis for documentation of large patient groups with specific and specialized questions to be answered. Having recently begun electronic documentation, it has become apparent that the benefits lie in the user-friendly and timely workflow for documentation. The ultimate goal is a simplification of research work, better study analyses quality and eventually, the improvement of treatment concepts by evaluating the effectiveness of particle therapy. PMID:22828013
Louhi 2010: Special issue on Text and Data Mining of Health Documents
2011-01-01
The papers presented in this supplement focus and reflect on computer use in every-day clinical work in hospitals and clinics such as electronic health record systems, pre-processing for computer aided summaries, clinical coding, computer decision systems, as well as related ethical concerns and security. Much of this work concerns itself by necessity with incorporation and development of language processing tools and methods, and as such this supplement aims at providing an arena for reporting on development in a diversity of languages. In the supplement we can read about some of the challenges identified above. PMID:21992545
The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents
NASA Astrophysics Data System (ADS)
Gunawan, D.; Sembiring, C. A.; Budiman, M. A.
2018-03-01
Rapidly increasing number of web pages or documents leads to topic specific filtering in order to find web pages or documents efficiently. This is a preliminary research that uses cosine similarity to implement text relevance in order to find topic specific document. This research is divided into three parts. The first part is text-preprocessing. In this part, the punctuation in a document will be removed, then convert the document to lower case, implement stop word removal and then extracting the root word by using Porter Stemming algorithm. The second part is keywords weighting. Keyword weighting will be used by the next part, the text relevance calculation. Text relevance calculation will result the value between 0 and 1. The closer value to 1, then both documents are more related, vice versa.
Chapman, Brian E.; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W.
2011-01-01
In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes’ classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes’ classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes’ classifier using bigrams. PMID:21459155
2012-01-01
Objectives This study demonstrates the feasibility of using expert system shells for rapid clinical decision support module development. Methods A readily available expert system shell was used to build a simple rule-based system for the crude diagnosis of vaginal discharge. Pictures and 'canned text explanations' are extensively used throughout the program to enhance its intuitiveness and educational dimension. All the steps involved in developing the system are documented. Results The system runs under Microsoft Windows and is available as a free download at http://healthcybermap.org/vagdisch.zip (the distribution archive includes both the program's executable and the commented knowledge base source as a text document). The limitations of the demonstration system, such as the lack of provisions for assessing uncertainty or various degrees of severity of a sign or symptom, are discussed in detail. Ways of improving the system, such as porting it to the Web and packaging it as an app for smartphones and tablets, are also presented. Conclusions An easy-to-use expert system shell enables clinicians to rapidly become their own 'knowledge engineers' and develop concise evidence-based decision support modules of simple to moderate complexity, targeting clinical practitioners, medical and nursing students, as well as patients, their lay carers and the general public (where appropriate). In the spirit of the social Web, it is hoped that an online repository can be created to peer review, share and re-use knowledge base modules covering various clinical problems and algorithms, as a service to the clinical community. PMID:23346475
Ramanan, S V; Radhakrishna, Kedar; Waghmare, Abijeet; Raj, Tony; Nathan, Senthil P; Sreerama, Sai Madhukar; Sampath, Sriram
2016-08-01
Electronic Health Record (EHR) use in India is generally poor, and structured clinical information is mostly lacking. This work is the first attempt aimed at evaluating unstructured text mining for extracting relevant clinical information from Indian clinical records. We annotated a corpus of 250 discharge summaries from an Intensive Care Unit (ICU) in India, with markups for diseases, procedures, and lab parameters, their attributes, as well as key demographic information and administrative variables such as patient outcomes. In this process, we have constructed guidelines for an annotation scheme useful to clinicians in the Indian context. We evaluated the performance of an NLP engine, Cocoa, on a cohort of these Indian clinical records. We have produced an annotated corpus of roughly 90 thousand words, which to our knowledge is the first tagged clinical corpus from India. Cocoa was evaluated on a test corpus of 50 documents. The overlap F-scores across the major categories, namely disease/symptoms, procedures, laboratory parameters and outcomes, are 0.856, 0.834, 0.961 and 0.872 respectively. These results are competitive with results from recent shared tasks based on US records. The annotated corpus and associated results from the Cocoa engine indicate that unstructured text mining is a viable method for cohort analysis in the Indian clinical context, where structured EHR records are largely absent.
Jürgens, Clemens; Grossjohann, Rico; Czepita, Damian; Tost, Frank
2009-01-01
Graphic documentation of retinal examination results in clinical ophthalmological practice is often depicted using pictures or in handwritten form. Popular software products used to describe changes in the fundus do not vary much from simple graphic programs that enable to insert, scale and edit basic graphic elements such as: a circle, rectangle, arrow or text. Displaying the results of retinal examinations in a unified way is difficult to achieve. Therefore, we devised and implemented modern software tools for this purpose. A computer program enabling to quickly and intuitively form graphs of the fundus, that can be digitally archived or printed was created. Especially for the needs of ophthalmological clinics, a set of standard digital symbols used to document the results of retinal examinations was developed and installed in a library of graphic symbols. These symbols are divided into the following categories: preoperative, postoperative, neovascularization, retinopathy of prematurity. The appropriate symbol can be selected with a click of the mouse and dragged-and-dropped on the canvas of the fundus. Current forms of documenting results of retinal examinations are unsatisfactory, due to the fact that they are time consuming and imprecise. Unequivocal interpretation is difficult or in some cases impossible. Using the developed computer program a sketch of the fundus can be created much more quickly than by hand drawing. Additionally the quality of the medica documentation using a system of well described and standardized symbols will be enhanced. (1) Graphic symbols used to document the results of retinal examinations are a part of everyday clinical practice. (2) The designed computer program will allow quick and intuitive graphical creation of fundus sketches that can be either digitally archived or printed.
Smart Extraction and Analysis System for Clinical Research.
Afzal, Muhammad; Hussain, Maqbool; Khan, Wajahat Ali; Ali, Taqdir; Jamshed, Arif; Lee, Sungyoung
2017-05-01
With the increasing use of electronic health records (EHRs), there is a growing need to expand the utilization of EHR data to support clinical research. The key challenge in achieving this goal is the unavailability of smart systems and methods to overcome the issue of data preparation, structuring, and sharing for smooth clinical research. We developed a robust analysis system called the smart extraction and analysis system (SEAS) that consists of two subsystems: (1) the information extraction system (IES), for extracting information from clinical documents, and (2) the survival analysis system (SAS), for a descriptive and predictive analysis to compile the survival statistics and predict the future chance of survivability. The IES subsystem is based on a novel permutation-based pattern recognition method that extracts information from unstructured clinical documents. Similarly, the SAS subsystem is based on a classification and regression tree (CART)-based prediction model for survival analysis. SEAS is evaluated and validated on a real-world case study of head and neck cancer. The overall information extraction accuracy of the system for semistructured text is recorded at 99%, while that for unstructured text is 97%. Furthermore, the automated, unstructured information extraction has reduced the average time spent on manual data entry by 75%, without compromising the accuracy of the system. Moreover, around 88% of patients are found in a terminal or dead state for the highest clinical stage of disease (level IV). Similarly, there is an ∼36% probability of a patient being alive if at least one of the lifestyle risk factors was positive. We presented our work on the development of SEAS to replace costly and time-consuming manual methods with smart automatic extraction of information and survival prediction methods. SEAS has reduced the time and energy of human resources spent unnecessarily on manual tasks.
SFINX-a drug-drug interaction database designed for clinical decision support systems.
Böttiger, Ylva; Laine, Kari; Andersson, Marine L; Korhonen, Tuomas; Molin, Björn; Ovesjö, Marie-Louise; Tirkkonen, Tuire; Rane, Anders; Gustafsson, Lars L; Eiermann, Birgit
2009-06-01
The aim was to develop a drug-drug interaction database (SFINX) to be integrated into decision support systems or to be used in website solutions for clinical evaluation of interactions. Key elements such as substance properties and names, drug formulations, text structures and references were defined before development of the database. Standard operating procedures for literature searches, text writing rules and a classification system for clinical relevance and documentation level were determined. ATC codes, CAS numbers and country-specific codes for substances were identified and quality assured to ensure safe integration of SFINX into other data systems. Much effort was put into giving short and practical advice regarding clinically relevant drug-drug interactions. SFINX includes over 8,000 interaction pairs and is integrated into Swedish and Finnish computerised decision support systems. Over 31,000 physicians and pharmacists are receiving interaction alerts through SFINX. User feedback is collected for continuous improvement of the content. SFINX is a potentially valuable tool delivering instant information on drug interactions during prescribing and dispensing.
Text extraction method for historical Tibetan document images based on block projections
NASA Astrophysics Data System (ADS)
Duan, Li-juan; Zhang, Xi-qun; Ma, Long-long; Wu, Jian
2017-11-01
Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.
Visualizing unstructured patient data for assessing diagnostic and therapeutic history.
Deng, Yihan; Denecke, Kerstin
2014-01-01
Having access to relevant patient data is crucial for clinical decision making. The data is often documented in unstructured texts and collected in the electronic health record. In this paper, we evaluate an approach to visualize information extracted from clinical documents by means of tag cloud. Tag clouds will be generated using a bag of word approach and by exploiting part of speech tags. For a real word data set comprising radiological reports, pathological reports and surgical operation reports, tag clouds are generated and a questionnaire-based study is conducted as evaluation. Feedback from the physicians shows that the tag cloud visualization is an effective and rapid approach to represent relevant parts of unstructured patient data. To handle the different medical narratives, we have summarized several possible improvements according to the user feedback and evaluation results.
Graph-based biomedical text summarization: An itemset mining and sentence clustering approach.
Nasr Azadani, Mozhgan; Ghadiri, Nasser; Davoodijam, Ensieh
2018-06-12
Automatic text summarization offers an efficient solution to access the ever-growing amounts of both scientific and clinical literature in the biomedical domain by summarizing the source documents while maintaining their most informative contents. In this paper, we propose a novel graph-based summarization method that takes advantage of the domain-specific knowledge and a well-established data mining technique called frequent itemset mining. Our summarizer exploits the Unified Medical Language System (UMLS) to construct a concept-based model of the source document and mapping the document to the concepts. Then, it discovers frequent itemsets to take the correlations among multiple concepts into account. The method uses these correlations to propose a similarity function based on which a represented graph is constructed. The summarizer then employs a minimum spanning tree based clustering algorithm to discover various subthemes of the document. Eventually, it generates the final summary by selecting the most informative and relative sentences from all subthemes within the text. We perform an automatic evaluation over a large number of summaries using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results demonstrate that the proposed summarization system outperforms various baselines and benchmark approaches. The carried out research suggests that the incorporation of domain-specific knowledge and frequent itemset mining equips the summarization system in a better way to address the informativeness measurement of the sentences. Moreover, clustering the graph nodes (sentences) can enable the summarizer to target different main subthemes of a source document efficiently. The evaluation results show that the proposed approach can significantly improve the performance of the summarization systems in the biomedical domain. Copyright © 2018. Published by Elsevier Inc.
Turchin, Alexander; Shubina, Maria; Breydo, Eugene; Pendergrass, Merri L; Einbinder, Jonathan S
2009-01-01
OBJECTIVE To compare information obtained from narrative and structured electronic sources using anti-hypertensive medication intensification as an example clinical issue of interest. DESIGN A retrospective cohort study of 5,634 hypertensive patients with diabetes from 2000 to 2005. MEASUREMENTS The authors determined the fraction of medication intensification events documented in both narrative and structured data in the electronic medical record. The authors analyzed the relationship between provider characteristics and concordance between intensifications in narrative and structured data. As there is no gold standard data source for medication information, the authors clinically validated medication intensification information by assessing the relationship between documented medication intensification and the patients' blood pressure in univariate and multivariate models. RESULTS Overall, 5,627 (30.9%) of 18,185 medication intensification events were documented in both sources. For a medication intensification event documented in narrative notes the probability of a concordant entry in structured records increased by 11% for each study year (p < 0.0001) and decreased by 19% for each decade of provider age (p = 0.035). In a multivariate model that adjusted for patient demographics and intraphysician correlations, an increase of one medication intensification per month documented in either narrative or structured data were associated with a 5-8 mm Hg monthly decrease in systolic and 1.5-4 mm Hg decrease in diastolic blood pressure (p < 0.0001 for all). CONCLUSION Narrative and structured electronic data sources provide complementary information on anti-hypertensive medication intensification. Clinical validity of information in both sources was demonstrated by correlation with changes in blood pressure.
A Study of Asynchronous Mobile-Enabled SMS Text Psychotherapy.
Hull, Thomas D; Mahan, Kush
2017-03-01
Many obstacles to obtaining psychotherapy continue to diminish its reach despite its documented positive effects. Using short message service (SMS) texting and Web platforms to enable licensed psychotherapists to deliver therapy directly to the lived context of the client is one possible solution. Employing a feasibility study design, this pilot trial further evaluated the external validity for treatment outcomes of text therapy and extended findings to include mobile-enabled text platforms. Adults seeking text therapy treatment for a variety of disorders were recruited from a text therapy service (N = 57). Clinical outcomes were measured using the General Health Questionnaire-12 (GHQ-12) through 15 weeks of treatment. A process variable, the therapeutic alliance, was measured with the Working Alliance Inventory. Treatment acceptability was assessed with ratings of satisfaction for several aspects of the treatment, including affordability, effectiveness, convenience, wait times to receiving treatment, and cost-effectiveness. Results indicate evidence for the effectiveness of the intervention (GHQ-12, Cohen's d = 1.3). Twenty-five (46%) participants experienced clinically significant symptom remission. Therapeutic alliance scores were lower than those found in traditional treatment settings, but still predicted symptom improvement (R 2 = 0.299). High levels of satisfaction with text therapy were reported on dimensions of affordability, convenience, and effectiveness. Cost-effectiveness analyses suggest that text therapy is 42.2% the cost of traditional services and offers much reduced wait times. Mobile-enabled asynchronous text therapy with a licensed therapist is an acceptable and clinically beneficial medium for individuals with various diagnoses and histories of psychological distress.
Using text-mining techniques in electronic patient records to identify ADRs from medicine use.
Warrer, Pernille; Hansen, Ebba Holme; Juhl-Jensen, Lars; Aagaard, Lise
2012-05-01
This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focused on detecting ADRs, excluding those that investigated adverse events not related to medicine use. We extracted information on study populations, EPR data sources, frequencies and types of the identified ADRs, medicines associated with ADRs, text-mining algorithms used and their performance. Seven studies, all from the United States, were eligible for inclusion in the review. Studies were published from 2001, the majority between 2009 and 2010. Text-mining techniques varied over time from simple free text searching of outpatient visit notes and inpatient discharge summaries to more advanced techniques involving natural language processing (NLP) of inpatient discharge summaries. Performance appeared to increase with the use of NLP, although many ADRs were still missed. Due to differences in study design and populations, various types of ADRs were identified and thus we could not make comparisons across studies. The review underscores the feasibility and potential of text mining to investigate narrative documents in EPRs for ADRs. However, more empirical studies are needed to evaluate whether text mining of EPRs can be used systematically to collect new information about ADRs. © 2011 The Authors. British Journal of Clinical Pharmacology © 2011 The British Pharmacological Society.
DECISION-COMPONENTS OF NICE'S TECHNOLOGY APPRAISALS ASSESSMENT FRAMEWORK.
de Folter, Joost; Trusheim, Mark; Jonsson, Pall; Garner, Sarah
2018-01-01
Value assessment frameworks have gained prominence recently in the context of U.S. healthcare. Such frameworks set out a series of factors that are considered in funding decisions. The UK's National Institute of Health and Care Excellence (NICE) is an established health technology assessment (HTA) agency. We present a novel application of text analysis that characterizes NICE's Technology Appraisals in the context of the newer assessment frameworks and present the results in a visual way. A total of 243 documents of NICE's medicines guidance from 2007 to 2016 were analyzed. Text analysis was used to identify a hierarchical set of decision factors considered in the assessments. The frequency of decision factors stated in the documents was determined and their association with terms related to uncertainty. The results were incorporated into visual representations of hierarchical factors. We identified 125 decision factors, and hierarchically grouped these into eight domains: Clinical Effectiveness, Cost Effectiveness, Condition, Current Practice, Clinical Need, New Treatment, Studies, and Other Factors. Textual analysis showed all domains appeared consistently in the guidance documents. Many factors were commonly associated with terms relating to uncertainty. A series of visual representations was created. This study reveals the complexity and consistency of NICE's decision-making processes and demonstrates that cost effectiveness is not the only decision-criteria. The study highlights the importance of processes and methodology that can take both quantitative and qualitative information into account. Visualizations can help effectively communicate this complex information during the decision-making process and subsequently to stakeholders.
NASA Astrophysics Data System (ADS)
Hadyan, Fadhlil; Shaufiah; Arif Bijaksana, Moch.
2017-01-01
Automatic summarization is a system that can help someone to take the core information of a long text instantly. The system can help by summarizing text automatically. there’s Already many summarization systems that have been developed at this time but there are still many problems in those system. In this final task proposed summarization method using document index graph. This method utilizes the PageRank and HITS formula used to assess the web page, adapted to make an assessment of words in the sentences in a text document. The expected outcome of this final task is a system that can do summarization of a single document, by utilizing document index graph with TextRank and HITS to improve the quality of the summary results automatically.
Pierides, Kristen; Duggan, Paul; Chur-Hansen, Anna; Gilson, Amaya
2013-05-01
Clinical competencies in obstetrics and gynaecology have not been clearly defined for Australian medical students, the growing numbers of which may impact clinical teaching. Our aim was to administer and validate a competencies list, for self-evaluation by medical students of their confidence to manage common clinical tasks in obstetrics and gynaecology; to evaluate students' views on course changes that may result from increasing class sizes. A draft list of competencies was peer-reviewed, and discussed at two student focus groups. The resultant list was administered as part of an 81 item online survey. Sixty-eight percent (N = 172) of those eligible completed the survey. Most respondents (75.8%) agreed or strongly agreed that they felt confident and well equipped to recognise and manage most common and important obstetric and gynaecological conditions. Confidence was greater for women, and for those who received a higher assessment grade. Free-text data highlight reasons for lack of clinical experience that may impact perceived confidence. The document listing competencies for medical students and educators is useful for discussions around a national curriculum in obstetrics and gynaecology in medical schools, including the best methods of delivery, particularly in the context of increasing student numbers.
Semantator: semantic annotator for converting biomedical text to linked data.
Tao, Cui; Song, Dezhao; Sharma, Deepak; Chute, Christopher G
2013-10-01
More than 80% of biomedical data is embedded in plain text. The unstructured nature of these text-based documents makes it challenging to easily browse and query the data of interest in them. One approach to facilitate browsing and querying biomedical text is to convert the plain text to a linked web of data, i.e., converting data originally in free text to structured formats with defined meta-level semantics. In this paper, we introduce Semantator (Semantic Annotator), a semantic-web-based environment for annotating data of interest in biomedical documents, browsing and querying the annotated data, and interactively refining annotation results if needed. Through Semantator, information of interest can be either annotated manually or semi-automatically using plug-in information extraction tools. The annotated results will be stored in RDF and can be queried using the SPARQL query language. In addition, semantic reasoners can be directly applied to the annotated data for consistency checking and knowledge inference. Semantator has been released online and was used by the biomedical ontology community who provided positive feedbacks. Our evaluation results indicated that (1) Semantator can perform the annotation functionalities as designed; (2) Semantator can be adopted in real applications in clinical and transactional research; and (3) the annotated results using Semantator can be easily used in Semantic-web-based reasoning tools for further inference. Copyright © 2013 Elsevier Inc. All rights reserved.
Spatial Paradigm for Information Retrieval and Exploration
DOE Office of Scientific and Technical Information (OSTI.GOV)
The SPIRE system consists of software for visual analysis of primarily text based information sources. This technology enables the content analysis of text documents without reading all the documents. It employs several algorithms for text and word proximity analysis. It identifies the key themes within the text documents. From this analysis, it projects the results onto a visual spatial proximity display (Galaxies or Themescape) where items (documents and/or themes) visually close to each other are known to have content which is close to each other. Innovative interaction techniques then allow for dynamic visual analysis of large text based information spaces.
SPIRE1.03. Spatial Paradigm for Information Retrieval and Exploration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adams, K.J.; Bohn, S.; Crow, V.
The SPIRE system consists of software for visual analysis of primarily text based information sources. This technology enables the content analysis of text documents without reading all the documents. It employs several algorithms for text and word proximity analysis. It identifies the key themes within the text documents. From this analysis, it projects the results onto a visual spatial proximity display (Galaxies or Themescape) where items (documents and/or themes) visually close to each other are known to have content which is close to each other. Innovative interaction techniques then allow for dynamic visual analysis of large text based information spaces.
Text Mining in Biomedical Domain with Emphasis on Document Clustering.
Renganathan, Vinaitheerthan
2017-07-01
With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
Feature extraction for document text using Latent Dirichlet Allocation
NASA Astrophysics Data System (ADS)
Prihatini, P. M.; Suryawan, I. K.; Mandia, IN
2018-01-01
Feature extraction is one of stages in the information retrieval system that used to extract the unique feature values of a text document. The process of feature extraction can be done by several methods, one of which is Latent Dirichlet Allocation. However, researches related to text feature extraction using Latent Dirichlet Allocation method are rarely found for Indonesian text. Therefore, through this research, a text feature extraction will be implemented for Indonesian text. The research method consists of data acquisition, text pre-processing, initialization, topic sampling and evaluation. The evaluation is done by comparing Precision, Recall and F-Measure value between Latent Dirichlet Allocation and Term Frequency Inverse Document Frequency KMeans which commonly used for feature extraction. The evaluation results show that Precision, Recall and F-Measure value of Latent Dirichlet Allocation method is higher than Term Frequency Inverse Document Frequency KMeans method. This shows that Latent Dirichlet Allocation method is able to extract features and cluster Indonesian text better than Term Frequency Inverse Document Frequency KMeans method.
EMERSE: The Electronic Medical Record Search Engine
Hanauer, David A.
2006-01-01
EMERSE (The Electronic Medical Record Search Engine) is an intuitive, powerful search engine for free-text documents in the electronic medical record. It offers multiple options for creating complex search queries yet has an interface that is easy enough to be used by those with minimal computer experience. EMERSE is ideal for retrospective chart reviews and data abstraction and may have potential for clinical care as well.
Script-independent text line segmentation in freestyle handwritten documents.
Li, Yi; Zheng, Yefeng; Doermann, David; Jaeger, Stefan; Li, Yi
2008-08-01
Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map, where each element represents the probability that the underlying pixel belongs to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike connected component based methods ( [1], [2] for example), the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts, such as Arabic, Chinese, Korean, and Hindi, demonstrate that our algorithm consistently outperforms previous methods [1]-[3]. Further experiments show the proposed algorithm is robust to scale change, rotation, and noise.
Current perspectives on the role of the pharmacist in heart failure management
Cheng, Judy WM
2017-01-01
Pharmacists play an important role within a multidisciplinary health care team in the care of patients with heart failure (HF). It has been evaluated and documented that pharmacists providing medication reconciliation especially during transition of care, educating patients on their medications, and providing collaborative medication management lead to positive changes in the patient outcomes, including but not limited to decreasing in hospitalizations and read-missions. It is foreseeable that pharmacist roles will continue to expand as new treatment and innovative care are developed for HF patients. I reviewed published role of pharmacists in the care of HF patients. MEDLINE and Current Content database (both from 1966 – December 31, 2017) were utilized to identify peer-reviewed clinical trials, descriptive studies, and review articles published in English using the following search terms: pharmacists, clinical pharmacy, HF, and cardiomyopathy. Citations from available articles were also reviewed for additional references. Preliminary search revealed 31 studies and 55 reviews. They were further reviewed by title and abstract as well as full text to remove irrelevant articles. At the end, 24 of these clinical trials and systematic reviews are described in the following text and Table 1 summarizes 16 pertinent clinical trials. Some roles that are currently being explored include medication management in patients with mechanical circulatory support for end-stage HF, where pharmacokinetics and pharmacodynamics of medications can change, medication management in ambulatory intravenous diuretic clinics, and comprehensive medication management in patients’ home settings. Pharmacists should continue to explore and prospectively evaluate their role in the care of this patient population, including documenting their interventions, and impact to economic and patient outcomes. PMID:29594034
Ilic, Nina; Savic, Snezana; Siegel, Evan; Atkinson, Kerry; Tasic, Ljiljana
2012-12-01
A wide range of regulatory standards applicable to production and use of tissues, cells, and other biologics (or biologicals), as advanced therapies, indicates considerable interest in the regulation of these products. The objective of this study was to analyze and compare high-tier documents within the Australian, European, and U.S. biologic drug regulatory environments using qualitative methodology. Eighteen high-tier documents from the European Medicines Agency (EMA), U.S. Food and Drug Administration (FDA), and Therapeutic Goods Administration (TGA) regulatory frameworks were subject to automated text analysis. Selected documents were consistent with the legal requirements for manufacturing and use of biologic drugs in humans and fall into six different categories. Concepts, themes, and their co-occurrence were identified and compared. The most frequent concepts in TGA, FDA, and EMA frameworks were "biological," "product," and "medicinal," respectively. This was consistent with the previous manual terminology search. Good Manufacturing Practice documents, across frameworks, identified "quality" and "appropriate" as main concepts, whereas in Good Clinical Practice (GCP) documents it was "clinical," followed by "trial," "subjects," "sponsor," and "data." GCP documents displayed considerably higher concordance between different regulatory frameworks, as demonstrated by a smaller number of concepts, similar size, and similar distance between them. Although high-tier documents often use different terminology, they share concepts and themes. This paper may be a modest contribution to the recognition of similarities and differences between analyzed regulatory documents. It may also fill the literature gap and provide some foundation for future comparative research of biologic drug regulations on a global level.
Document segmentation for high-quality printing
NASA Astrophysics Data System (ADS)
Ancin, Hakan
1997-04-01
A technique to segment dark texts on light background of mixed mode color documents is presented. This process does not perceptually change graphics and photo regions. Color documents are scanned and printed from various media which usually do not have clean background. This is especially the case for the printouts generated from thin magazine samples, these printouts usually include text and figures form the back of the page, which is called bleeding. Removal of bleeding artifacts improves the perceptual quality of the printed document and reduces the color ink usage. By detecting the light background of the document, these artifacts are removed from background regions. Also detection of dark text regions enables the halftoning algorithms to use true black ink for the black text pixels instead of composite black. The processed document contains sharp black text on white background, resulting improved perceptual quality and better ink utilization. The described method is memory efficient and requires a small number of scan lines of high resolution color documents during processing.
Chapman, Brian E; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W
2011-10-01
In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams. Copyright © 2011 Elsevier Inc. All rights reserved.
Computation of term dominance in text documents
Bauer, Travis L [Albuquerque, NM; Benz, Zachary O [Albuquerque, NM; Verzi, Stephen J [Albuquerque, NM
2012-04-24
An improved entropy-based term dominance metric useful for characterizing a corpus of text documents, and is useful for comparing the term dominance metrics of a first corpus of documents to a second corpus having a different number of documents.
Kotecha, Aachal; Longstaff, Simon; Azuara-Blanco, Augusto; Kirwan, James F; Morgan, James Edwards; Spencer, Anne Fiona; Foster, Paul J
2018-04-01
To obtain consensus opinion for the development of a standards framework for the development and implementation of virtual clinics for glaucoma monitoring in the UK using a modified Delphi methodology. A modified Delphi technique was used that involved sampling members of the UK Glaucoma and Eire Society (UKEGS). The first round scored the strength of agreement to a series of standards statements using a 9-point Likert scale. The revised standards were subjected to a second round of scoring and free-text comment. The final standards were discussed and agreed by an expert panel consisting of seven glaucoma subspecialists from across the UK. A version of the standards was submitted to external stakeholders for a 3-month consultation. There was a 44% response rate of UKEGS members to rounds 1 and 2, consisting largely of consultant ophthalmologists with a specialist interest in glaucoma. The final version of the standards document was validated by stakeholder consultation and contains four sections pertaining to the patient groups, testing methods, staffing requirements and governance structure of NHS secondary care glaucoma virtual clinic models. Use of a modified Delphi approach has provided consensus agreement for the standards required for the development of virtual clinics to monitor glaucoma in the UK. It is anticipated that this document will be useful as a guide for those implementing this model of service delivery. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts
Zhang, Shaodian; Elhadad, Nóemie
2013-01-01
Named entity recognition is a crucial component of biomedical natural language processing, enabling information extraction and ultimately reasoning over and knowledge discovery from text. Much progress has been made in the design of rule-based and supervised tools, but they are often genre and task dependent. As such, adapting them to different genres of text or identifying new types of entities requires major effort in re-annotation or rule development. In this paper, we propose an unsupervised approach to extracting named entities from biomedical text. We describe a stepwise solution to tackle the challenges of entity boundary detection and entity type classification without relying on any handcrafted rules, heuristics, or annotated data. A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text. Classification of candidate entities into categories of interest is carried out by leveraging principles from distributional semantics. Experiments show that our system, especially the entity classification step, yields competitive results on two popular biomedical datasets of clinical notes and biological literature, and outperforms a baseline dictionary match approach. Detailed error analysis provides a road map for future work. PMID:23954592
Duplicate document detection in DocBrowse
NASA Astrophysics Data System (ADS)
Chalana, Vikram; Bruce, Andrew G.; Nguyen, Thien
1998-04-01
Duplicate documents are frequently found in large databases of digital documents, such as those found in digital libraries or in the government declassification effort. Efficient duplicate document detection is important not only to allow querying for similar documents, but also to filter out redundant information in large document databases. We have designed three different algorithm to identify duplicate documents. The first algorithm is based on features extracted from the textual content of a document, the second algorithm is based on wavelet features extracted from the document image itself, and the third algorithm is a combination of the first two. These algorithms are integrated within the DocBrowse system for information retrieval from document images which is currently under development at MathSoft. DocBrowse supports duplicate document detection by allowing (1) automatic filtering to hide duplicate documents, and (2) ad hoc querying for similar or duplicate documents. We have tested the duplicate document detection algorithms on 171 documents and found that text-based method has an average 11-point precision of 97.7 percent while the image-based method has an average 11- point precision of 98.9 percent. However, in general, the text-based method performs better when the document contains enough high-quality machine printed text while the image- based method performs better when the document contains little or no quality machine readable text.
Identifying and Overcoming Obstacles to Point-of-Care Data Collection for Eye Care Professionals
Lobach, David F.; Silvey, Garry M.; Macri, Jennifer M.; Hunt, Megan; Kacmaz, Roje O.; Lee, Paul P.
2005-01-01
Supporting data entry by clinicians is considered one of the greatest challenges in implementing electronic health records. In this paper we describe a formative evaluation study using three different methodologies through which we identified obstacles to point-of-care data entry for eye care and then used the formative process to develop and test solutions to overcome these obstacles. The greatest obstacles were supporting free text annotation of clinical observations and accommodating the creation of detailed diagrams in multiple colors. To support free text entry, we arrived at an approach that captures an image of a free text note and associates this image with related data elements in an encounter note. The detailed diagrams included a color pallet that allowed changing pen color with a single stroke and also captured the diagrams as an image associated with related data elements. During observed sessions with simulated patients, these approaches satisfied the clinicians’ documentation needs by capturing the full range of clinical complexity that arises in practice. PMID:16779083
Text Mining in Biomedical Domain with Emphasis on Document Clustering
2017-01-01
Objectives With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. Methods This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Results Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Conclusions Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise. PMID:28875048
Investigation into Text Classification With Kernel Based Schemes
2010-03-01
Document Matrix TDMs Term-Document Matrices TMG Text to Matrix Generator TN True Negative TP True Positive VSM Vector Space Model xxii THIS PAGE...are represented as a term-document matrix, common evaluation metrics, and the software package Text to Matrix Generator ( TMG ). The classifier...AND METRICS This chapter introduces the indexing capabilities of the Text to Matrix Generator ( TMG ) Toolbox. Specific attention is placed on the
Albuquerque, Kevin; Rodgers, Kellie; Spangler, Ann; Rahimi, Asal; Willett, DuWayne
2018-03-01
The on-treatment visit (OTV) for radiation oncology is essential for patient management. Radiation toxicities recorded during the OTV may be inconsistent because of the use of free text and the lack of treatment site-specific templates. We developed a radiation oncology toxicity recording instrument (ROTOX) in a health system electronic medical record (EMR). Our aims were to assess improvement in documentation of toxicities and to develop clinic toxicity benchmarks. A ROTOX that was based on National Cancer Institute Common Terminology Criteria for Adverse Events (version 4.0) with flow-sheet functionality was developed in the EMR. Improvement in documentation was assessed at various time intervals. High-grade toxicities (ie, grade ≥ 3 by CTCAE) by site were audited to develop benchmarks and to track nursing and physician actions taken in response to these. A random sample of OTV notes from each clinic physician before ROTOX implementation was reviewed and assigned a numerical document quality score (DQS) that was based on completeness and comprehensiveness of toxicity grading. The mean DQS improved from an initial level of 41% to 99% (of the maximum possible DQS) when resampled at 6 months post-ROTOX. This high-level DQS was maintained 3 years after ROTOX implementation at 96% of the maximum. For months 7 to 9 after implementation (during a 3-month period), toxicity grading was recorded in 4,443 OTVs for 698 unique patients; 107 episodes of high-grade toxicity were identified during this period, and toxicity-specific intervention was documented in 95%. An EMR-based ROTOX enables consistent recording of treatment toxicity. In a uniform sample of patients, local population toxicity benchmarks can be developed, and clinic response can be tracked.
Goldacre, Ben; Gray, Jonathan
2016-04-08
OpenTrials is a collaborative and open database for all available structured data and documents on all clinical trials, threaded together by individual trial. With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial: registry entries; links, abstracts, or texts of academic journal papers; portions of regulatory documents describing individual trials; structured data on methods and results extracted by systematic reviewers or other researchers; clinical study reports; and additional documents such as blank consent forms, blank case report forms, and protocols. The intention is to create an open, freely re-usable index of all such information and to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine. The project has phase I funding. This will allow us to create a practical data schema and populate the database initially through web-scraping, basic record linkage techniques, crowd-sourced curation around selected drug areas, and import of existing sources of structured and documents. It will also allow us to create user-friendly web interfaces onto the data and conduct user engagement workshops to optimise the database and interface designs. Where other projects have set out to manually and perfectly curate a narrow range of information on a smaller number of trials, we aim to use a broader range of techniques and attempt to match a very large quantity of information on all trials. We are currently seeking feedback and additional sources of structured data.
Automated Non-Alphanumeric Symbol Resolution in Clinical Texts
Moon, SungRim; Pakhomov, Serguei; Ryan, James; Melton, Genevieve B.
2011-01-01
Although clinical texts contain many symbols, relatively little attention has been given to symbol resolution by medical natural language processing (NLP) researchers. Interpreting the meaning of symbols may be viewed as a special case of Word Sense Disambiguation (WSD). One thousand instances of four common non-alphanumeric symbols (‘+’, ‘–’, ‘/’, and ‘#’) were randomly extracted from a clinical document repository and annotated by experts. The symbols and their surrounding context, in addition to bag-of-Words (BoW), and heuristic rules were evaluated as features for the following classifiers: Naïve Bayes, Support Vector Machine, and Decision Tree, using 10-fold cross-validation. Accuracies for ‘+’, ‘–’, ‘/’, and ‘#’ were 80.11%, 80.22%, 90.44%, and 95.00% respectively, with Naïve Bayes. While symbol context contributed the most, BoW was also helpful for disambiguation of some symbols. Symbol disambiguation with supervised techniques can be implemented with reasonable accuracy as a module for medical NLP systems. PMID:22195157
LAMDA at TREC CDS track 2015: Clinical Decision Support Track
2015-11-20
outperforms all the other vector space models supported by Elasticsearch. MetaMap is the online tool that maps biomedical text to the Metathesaurus, and...cases. The medical knowledge consists of 700,000 biomedical documents supported by the PubMed Central [3] which is online digital database freely...Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Science, ICT , and Future Planning (MSIP
EMERSE: The Electronic Medical Record Search Engine
Hanauer, David A.
2006-01-01
EMERSE (The Electronic Medical Record Search Engine) is an intuitive, powerful search engine for free-text documents in the electronic medical record. It offers multiple options for creating complex search queries yet has an interface that is easy enough to be used by those with minimal computer experience. EMERSE is ideal for retrospective chart reviews and data abstraction and may have potential for clinical care as well. PMID:17238560
Document of standardization of enteral nutrition access in adults.
Arribas, Lorena; Frías, Laura; Creus, Gloria; Parejo, Juana; Urzola, Carmen; Ashbaugh, Rosana; Pérez-Portabella, Cleofé; Cuerda, Cristina
2014-07-01
The group of standardization and protocols of the Spanish Society of Parenteral and Enteral Nutrition (SENPE) published in 2011 a consensus document SENPE/SEGHNP/ANECIPN/SECP on enteral access for paediatric nutritional support. Along the lines of this document, we have developed another document on adult patients to homogenize the clinical practice and improve the quality of care in enteral access in this age group. The working group included health professionals (nurses, dietitians and doctor) with extensive experience in enteral nutrition and access. We tried to find scientific evidence through a literature review and we used the criteria of the Agency for Health-care Research and Quality (AHRQ) to classify the evidence (Grade of Recommendation A, B or C). Later the document was reviewed by external experts to the group and requested the endorsement of the Scientific and Educational Committee (CCE) and the group of home artificial nutrition (NADYA) of the SENPE. The full text will be published as a monograph number in this journal. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.
Techniques of Document Management: A Review of Text Retrieval and Related Technologies.
ERIC Educational Resources Information Center
Veal, D. C.
2001-01-01
Reviews present and possible future developments in the techniques of electronic document management, the major ones being text retrieval and scanning and OCR (optical character recognition). Also addresses document acquisition, indexing and thesauri, publishing and dissemination standards, impact of the Internet, and the document management…
2013-01-01
Background Clinical competencies in obstetrics and gynaecology have not been clearly defined for Australian medical students, the growing numbers of which may impact clinical teaching. Our aim was to administer and validate a competencies list, for self-evaluation by medical students of their confidence to manage common clinical tasks in obstetrics and gynaecology; to evaluate students’ views on course changes that may result from increasing class sizes. Methods A draft list of competencies was peer-reviewed, and discussed at two student focus groups. The resultant list was administered as part of an 81 item online survey. Results Sixty-eight percent (N = 172) of those eligible completed the survey. Most respondents (75.8%) agreed or strongly agreed that they felt confident and well equipped to recognise and manage most common and important obstetric and gynaecological conditions. Confidence was greater for women, and for those who received a higher assessment grade. Free-text data highlight reasons for lack of clinical experience that may impact perceived confidence. Conclusions The document listing competencies for medical students and educators is useful for discussions around a national curriculum in obstetrics and gynaecology in medical schools, including the best methods of delivery, particularly in the context of increasing student numbers. PMID:23634953
Querying archetype-based EHRs by search ontology-based XPath engineering.
Kropf, Stefan; Uciteli, Alexandr; Schierle, Katrin; Krücken, Peter; Denecke, Kerstin; Herre, Heinrich
2018-05-11
Legacy data and new structured data can be stored in a standardized format as XML-based EHRs on XML databases. Querying documents on these databases is crucial for answering research questions. Instead of using free text searches, that lead to false positive results, the precision can be increased by constraining the search to certain parts of documents. A search ontology-based specification of queries on XML documents defines search concepts and relates them to parts in the XML document structure. Such query specification method is practically introduced and evaluated by applying concrete research questions formulated in natural language on a data collection for information retrieval purposes. The search is performed by search ontology-based XPath engineering that reuses ontologies and XML-related W3C standards. The key result is that the specification of research questions can be supported by the usage of search ontology-based XPath engineering. A deeper recognition of entities and a semantic understanding of the content is necessary for a further improvement of precision and recall. Key limitation is that the application of the introduced process requires skills in ontology and software development. In future, the time consuming ontology development could be overcome by implementing a new clinical role: the clinical ontologist. The introduced Search Ontology XML extension connects Search Terms to certain parts in XML documents and enables an ontology-based definition of queries. Search ontology-based XPath engineering can support research question answering by the specification of complex XPath expressions without deep syntax knowledge about XPaths.
Detection of text strings from mixed text/graphics images
NASA Astrophysics Data System (ADS)
Tsai, Chien-Hua; Papachristou, Christos A.
2000-12-01
A robust system for text strings separation from mixed text/graphics images is presented. Based on a union-find (region growing) strategy the algorithm is thus able to classify the text from graphics and adapts to changes in document type, language category (e.g., English, Chinese and Japanese), text font style and size, and text string orientation within digital images. In addition, it allows for a document skew that usually occurs in documents, without skew correction prior to discrimination while these proposed methods such a projection profile or run length coding are not always suitable for the condition. The method has been tested with a variety of printed documents from different origins with one common set of parameters, and the experimental results of the performance of the algorithm in terms of computational efficiency are demonstrated by using several tested images from the evaluation.
Context-sensitive medical information retrieval.
Auerbuch, Mordechai; Karson, Tom H; Ben-Ami, Benjamin; Maimon, Oded; Rokach, Lior
2004-01-01
Substantial medical data such as pathology reports, operative reports, discharge summaries, and radiology reports are stored in textual form. Databases containing free-text medical narratives often need to be searched to find relevant information for clinical and research purposes. Terms that appear in these documents tend to appear in different contexts. The con-text of negation, a negative finding, is of special importance, since many of the most frequently described findings are those denied by the patient or subsequently "ruled out." Hence, when searching free-text narratives for patients with a certain medical condition, if negation is not taken into account, many of the retrieved documents will be irrelevant. The purpose of this work is to develop a methodology for automated learning of negative context patterns in medical narratives and test the effect of context identification on the performance of medical information retrieval. The algorithm presented significantly improves the performance of information retrieval done on medical narratives. The precision im-proves from about 60%, when using context-insensitive retrieval, to nearly 100%. The impact on recall is only minor. In addition, context-sensitive queries enable the user to search for terms in ways not otherwise available
Graph-based layout analysis for PDF documents
NASA Astrophysics Data System (ADS)
Xu, Canhui; Tang, Zhi; Tao, Xin; Li, Yun; Shi, Cao
2013-03-01
To increase the flexibility and enrich the reading experience of e-book on small portable screens, a graph based method is proposed to perform layout analysis on Portable Document Format (PDF) documents. Digital born document has its inherent advantages like representing texts and fractional images in explicit form, which can be straightforwardly exploited. To integrate traditional image-based document analysis and the inherent meta-data provided by PDF parser, the page primitives including text, image and path elements are processed to produce text and non text layer for respective analysis. Graph-based method is developed in superpixel representation level, and page text elements corresponding to vertices are used to construct an undirected graph. Euclidean distance between adjacent vertices is applied in a top-down manner to cut the graph tree formed by Kruskal's algorithm. And edge orientation is then used in a bottom-up manner to extract text lines from each sub tree. On the other hand, non-textual objects are segmented by connected component analysis. For each segmented text and non-text composite, a 13-dimensional feature vector is extracted for labelling purpose. The experimental results on selected pages from PDF books are presented.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-01-07
...] Draft Guidance for Industry on Electronic Source Documentation in Clinical Investigations; Availability... Documentation in Clinical Investigations.'' This document provides guidance to sponsors, contract research organizations (CROs), data management centers, and clinical investigators on capturing, using, and archiving...
Telemetry Standards, RCC Standard 106-17, Annex A.1, Pulse Amplitude Modulation Standards
2017-07-01
conform to either Figure Error! No text of specified style in document.-1 or Figure Error! No text of specified style in document.-2. Figure Error...No text of specified style in document.-1. 50 percent duty cycle PAM with amplitude synchronization A 20-25 percent deviation reserved for pulse...synchronization is recommended. Telemetry Standards, RCC Standard 106-17 Annex A.1, July 2017 A.1.2 Figure Error! No text of specified style
Reading and Writing in the 21st Century.
ERIC Educational Resources Information Center
Soloway, Elliot; And Others
1993-01-01
Describes MediaText, a multimedia document processor developed at the University of Michigan that allows the incorporation of video, music, sound, animations, still images, and text into one document. Interactive documents are discussed, and the need for users to be able to write documents as well as read them is emphasized. (four references) (LRW)
de Tayrac, R; Haylen, B T; Deffieux, X; Hermieu, J F; Wagner, L; Amarenco, G; Labat, J J; Leroi, A M; Billecocq, S; Letouzey, V; Fatton, B
2016-03-01
Given its increasing complexity, the terminology for female pelvic floor disorders needs to be updated in addition to existing terminology of the lower urinary tract. To do this, it seems preferable to adopt a female-specific approach and build on a consensus based on clinical practice. This paper summarizes the work of the standardization and terminology committees of two international scientific societies, namely the International Urogynecological Association (IUGA) and the International Continence Society (ICS). These committees were assisted by many external expert referees. A ranking into relevant major clinical categories and sub-categories was developed in order to allocate an alphanumeric code to each definition. An extensive process of 15 internal and external reviews was set up to study each definition in detail, with decisions taken collectively (consensus). Terminology was developed for female pelvic floor disorders, bringing together more than 250 definitions. It is clinically based and the six most common diagnoses are defined. The emphasis was placed on clarity and user-friendliness to make this terminology accessible to practitioners and trainees in all the specialties involved in female pelvic floor disorders. Imaging investigations (ultrasound, radiology, MRI) exclusively for women have been added to the text, relevant figures have also been included to complete the text and help clarify the meaning. Regular reviews are planned and are also required to keep the document up-to-date and as widely acceptable as possible. The work conducted led to the development of a consensual terminology of female pelvic floor disorders. This document has been designed to provide substantial assistance in clinical practice and research. 4. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Semi-Automated Methods for Refining a Domain-Specific Terminology Base
2011-02-01
only as a resource for written and oral translation, but also for Natural Language Processing ( NLP ) applications, text retrieval, document indexing...Natural Language Processing ( NLP ) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this...also for Natural Language Processing ( NLP ) applications, text retrieval (1), document indexing, and other knowledge management tasks. The National
Thematic clustering of text documents using an EM-based approach
2012-01-01
Clustering textual contents is an important step in mining useful information on the web or other text-based resources. The common task in text clustering is to handle text in a multi-dimensional space, and to partition documents into groups, where each group contains documents that are similar to each other. However, this strategy lacks a comprehensive view for humans in general since it cannot explain the main subject of each cluster. Utilizing semantic information can solve this problem, but it needs a well-defined ontology or pre-labeled gold standard set. In this paper, we present a thematic clustering algorithm for text documents. Given text, subject terms are extracted and used for clustering documents in a probabilistic framework. An EM approach is used to ensure documents are assigned to correct subjects, hence it converges to a locally optimal solution. The proposed method is distinctive because its results are sufficiently explanatory for human understanding as well as efficient for clustering performance. The experimental results show that the proposed method provides a competitive performance compared to other state-of-the-art approaches. We also show that the extracted themes from the MEDLINE® dataset represent the subjects of clusters reasonably well. PMID:23046528
2015-04-01
Army position, policy or decision unless so designated by other documentation. Table of Contents Page Introduction...Narrative that briefly (one paragraph) describes the subject, purpose and scope of the research. This study is designed to investigate the...accomplishing any of the tasks. Statistical tests of significance shall be applied to all data whenever possible. Figures and graphs referenced in the text
Fast words boundaries localization in text fields for low quality document images
NASA Astrophysics Data System (ADS)
Ilin, Dmitry; Novikov, Dmitriy; Polevoy, Dmitry; Nikolaev, Dmitry
2018-04-01
The paper examines the problem of word boundaries precise localization in document text zones. Document processing on a mobile device consists of document localization, perspective correction, localization of individual fields, finding words in separate zones, segmentation and recognition. While capturing an image with a mobile digital camera under uncontrolled capturing conditions, digital noise, perspective distortions or glares may occur. Further document processing gets complicated because of its specifics: layout elements, complex background, static text, document security elements, variety of text fonts. However, the problem of word boundaries localization has to be solved at runtime on mobile CPU with limited computing capabilities under specified restrictions. At the moment, there are several groups of methods optimized for different conditions. Methods for the scanned printed text are quick but limited only for images of high quality. Methods for text in the wild have an excessively high computational complexity, thus, are hardly suitable for running on mobile devices as part of the mobile document recognition system. The method presented in this paper solves a more specialized problem than the task of finding text on natural images. It uses local features, a sliding window and a lightweight neural network in order to achieve an optimal algorithm speed-precision ratio. The duration of the algorithm is 12 ms per field running on an ARM processor of a mobile device. The error rate for boundaries localization on a test sample of 8000 fields is 0.3
Documents Similarity Measurement Using Field Association Terms.
ERIC Educational Resources Information Center
Atlam, El-Sayed; Fuketa, M.; Morita, K.; Aoe, Jun-ichi
2003-01-01
Discussion of text analysis and information retrieval and measurement of document similarity focuses on a new text manipulation system called FA (field association)-Sim that is useful for retrieving information in large heterogeneous texts and for recognizing content similarity in text excerpts. Discusses recall and precision, automatic indexing…
Means of storage and automated monitoring of versions of text technical documentation
NASA Astrophysics Data System (ADS)
Leonovets, S. A.; Shukalov, A. V.; Zharinov, I. O.
2018-03-01
The paper presents automation of the process of preparation, storage and monitoring of version control of a text designer, and program documentation by means of the specialized software is considered. Automation of preparation of documentation is based on processing of the engineering data which are contained in the specifications and technical documentation or in the specification. Data handling assumes existence of strictly structured electronic documents prepared in widespread formats according to templates on the basis of industry standards and generation by an automated method of the program or designer text document. Further life cycle of the document and engineering data entering it are controlled. At each stage of life cycle, archive data storage is carried out. Studies of high-speed performance of use of different widespread document formats in case of automated monitoring and storage are given. The new developed software and the work benches available to the developer of the instrumental equipment are described.
Development of a full-text information retrieval system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Keizo Oyama; AKira Miyazawa, Atsuhiro Takasu; Kouji Shibano
The authors have executed a project to realize a full-text information retrieval system. The system is designed to deal with a document database comprising full text of a large number of documents such as academic papers. The document structures are utilized in searching and extracting appropriate information. The concept of structure handling and the configuration of the system are described in this paper.
Skeppstedt, Maria; Kvist, Maria; Nilsson, Gunnar H; Dalianis, Hercules
2014-06-01
Automatic recognition of clinical entities in the narrative text of health records is useful for constructing applications for documentation of patient care, as well as for secondary usage in the form of medical knowledge extraction. There are a number of named entity recognition studies on English clinical text, but less work has been carried out on clinical text in other languages. This study was performed on Swedish health records, and focused on four entities that are highly relevant for constructing a patient overview and for medical hypothesis generation, namely the entities: Disorder, Finding, Pharmaceutical Drug and Body Structure. The study had two aims: to explore how well named entity recognition methods previously applied to English clinical text perform on similar texts written in Swedish; and to evaluate whether it is meaningful to divide the more general category Medical Problem, which has been used in a number of previous studies, into the two more granular entities, Disorder and Finding. Clinical notes from a Swedish internal medicine emergency unit were annotated for the four selected entity categories, and the inter-annotator agreement between two pairs of annotators was measured, resulting in an average F-score of 0.79 for Disorder, 0.66 for Finding, 0.90 for Pharmaceutical Drug and 0.80 for Body Structure. A subset of the developed corpus was thereafter used for finding suitable features for training a conditional random fields model. Finally, a new model was trained on this subset, using the best features and settings, and its ability to generalise to held-out data was evaluated. This final model obtained an F-score of 0.81 for Disorder, 0.69 for Finding, 0.88 for Pharmaceutical Drug, 0.85 for Body Structure and 0.78 for the combined category Disorder+Finding. The obtained results, which are in line with or slightly lower than those for similar studies on English clinical text, many of them conducted using a larger training data set, show that the approaches used for English are also suitable for Swedish clinical text. However, a small proportion of the errors made by the model are less likely to occur in English text, showing that results might be improved by further tailoring the system to clinical Swedish. The entity recognition results for the individual entities Disorder and Finding show that it is meaningful to separate the general category Medical Problem into these two more granular entity types, e.g. for knowledge mining of co-morbidity relations and disorder-finding relations. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Giordano, Richard
1994-01-01
Describes the Text Encoding Initiative (TEI) project and the TEI header, which documents electronic text in a standard interchange format understandable to both librarian catalogers and nonlibrarian text encoders. The form and function of the TEI header is introduced, and its relationship to the MARC record is explained. (10 references) (KRN)
Rowlands, Stella; Coverdale, Steven; Callen, Joanne
2016-12-01
Clinical documentation is essential for communication between health professionals and the provision of quality care to patients. To examine medical students' perspectives of their education in documentation of clinical care in hospital patients' medical records. A qualitative design using semi-structured interviews with fourth-year medical students was undertaken at a hospital-based clinical school in an Australian university. Several themes reflecting medical students' clinical documentation education emerged from the data: formal clinical documentation education using lectures and tutorials was minimal; most education occurred on the job by junior doctors and student's expressed concerns regarding variation in education between teams and receiving limited feedback on performance. Respondents reported on the importance of feedback for their learning of disease processes and treatments. They suggested that improvements could be made in the timing of clinical documentation education and they stressed the importance of training on the job. On-the-job education with feedback in clinical documentation provides a learning opportunity for medical students and is essential in order to ensure accurate, safe, succinct and timely clinical notes. © The Author(s) 2016.
Handwritten text line segmentation by spectral clustering
NASA Astrophysics Data System (ADS)
Han, Xuecheng; Yao, Hui; Zhong, Guoqiang
2017-02-01
Since handwritten text lines are generally skewed and not obviously separated, text line segmentation of handwritten document images is still a challenging problem. In this paper, we propose a novel text line segmentation algorithm based on the spectral clustering. Given a handwritten document image, we convert it to a binary image first, and then compute the adjacent matrix of the pixel points. We apply spectral clustering on this similarity metric and use the orthogonal kmeans clustering algorithm to group the text lines. Experiments on Chinese handwritten documents database (HIT-MW) demonstrate the effectiveness of the proposed method.
Retrieving clinical evidence: a comparison of PubMed and Google Scholar for quick clinical searches.
Shariff, Salimah Z; Bejaimal, Shayna Ad; Sontrop, Jessica M; Iansavichus, Arthur V; Haynes, R Brian; Weir, Matthew A; Garg, Amit X
2013-08-15
Physicians frequently search PubMed for information to guide patient care. More recently, Google Scholar has gained popularity as another freely accessible bibliographic database. To compare the performance of searches in PubMed and Google Scholar. We surveyed nephrologists (kidney specialists) and provided each with a unique clinical question derived from 100 renal therapy systematic reviews. Each physician provided the search terms they would type into a bibliographic database to locate evidence to answer the clinical question. We executed each of these searches in PubMed and Google Scholar and compared results for the first 40 records retrieved (equivalent to 2 default search pages in PubMed). We evaluated the recall (proportion of relevant articles found) and precision (ratio of relevant to nonrelevant articles) of the searches performed in PubMed and Google Scholar. Primary studies included in the systematic reviews served as the reference standard for relevant articles. We further documented whether relevant articles were available as free full-texts. Compared with PubMed, the average search in Google Scholar retrieved twice as many relevant articles (PubMed: 11%; Google Scholar: 22%; P<.001). Precision was similar in both databases (PubMed: 6%; Google Scholar: 8%; P=.07). Google Scholar provided significantly greater access to free full-text publications (PubMed: 5%; Google Scholar: 14%; P<.001). For quick clinical searches, Google Scholar returns twice as many relevant articles as PubMed and provides greater access to free full-text articles.
Machine printed text and handwriting identification in noisy document images.
Zheng, Yefeng; Li, Huiping; Doermann, David
2004-03-01
In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections.
The Text Encoding Initiative: Flexible and Extensible Document Encoding.
ERIC Educational Resources Information Center
Barnard, David T.; Ide, Nancy M.
1997-01-01
The Text Encoding Initiative (TEI), an international collaboration aimed at producing a common encoding scheme for complex texts, examines the requirement for generality versus the requirement to handle specialized text types. Discusses how documents and users tax the limits of fixed schemes requiring flexible extensible encoding to support…
BioTextQuest: a web-based biomedical text mining suite for concept discovery.
Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J
2011-12-01
BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.
ERIC Educational Resources Information Center
Farri, Oladimeji Feyisetan
2012-01-01
Large quantities of redundant clinical data are usually transferred from one clinical document to another, making the review of such documents cognitively burdensome and potentially error-prone. Inadequate designs of electronic health record (EHR) clinical document user interfaces probably contribute to the difficulties clinicians experience while…
Yuan, Soe-Tsyr; Sun, Jerry
2005-10-01
Development of algorithms for automated text categorization in massive text document sets is an important research area of data mining and knowledge discovery. Most of the text-clustering methods were grounded in the term-based measurement of distance or similarity, ignoring the structure of the documents. In this paper, we present a novel method named structured cosine similarity (SCS) that furnishes document clustering with a new way of modeling on document summarization, considering the structure of the documents so as to improve the performance of document clustering in terms of quality, stability, and efficiency. This study was motivated by the problem of clustering speech documents (of no rich document features) attained from the wireless experience oral sharing conducted by mobile workforce of enterprises, fulfilling audio-based knowledge management. In other words, this problem aims to facilitate knowledge acquisition and sharing by speech. The evaluations also show fairly promising results on our method of structured cosine similarity.
Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying.
Kropf, Stefan; Krücken, Peter; Mueller, Wolf; Denecke, Kerstin
2017-05-18
Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results. We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi-automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse. Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML. Pathology reports (PRs) can be reliably structured into sections by a keyword-based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries. Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.
Ohmann, Christian; Banzi, Rita; Canham, Steve; Battaglia, Serena; Matei, Mihaela; Ariyo, Christopher; Becnel, Lauren; Bierer, Barbara; Bowers, Sarion; Clivio, Luca; Dias, Monica; Druml, Christiane; Faure, Hélène; Fenner, Martin; Galvez, Jose; Ghersi, Davina; Gluud, Christian; Groves, Trish; Houston, Paul; Karam, Ghassan; Kalra, Dipak; Knowles, Rachel L; Krleža-Jerić, Karmela; Kubiak, Christine; Kuchinke, Wolfgang; Kush, Rebecca; Lukkarinen, Ari; Marques, Pedro Silverio; Newbigging, Andrew; O'Callaghan, Jennifer; Ravaud, Philippe; Schlünder, Irene; Shanahan, Daniel; Sitter, Helmut; Spalding, Dylan; Tudur-Smith, Catrin; van Reusel, Peter; van Veen, Evert-Ben; Visser, Gerben Rienk; Wilson, Julia; Demotes-Mainard, Jacques
2017-12-14
We examined major issues associated with sharing of individual clinical trial data and developed a consensus document on providing access to individual participant data from clinical trials, using a broad interdisciplinary approach. This was a consensus-building process among the members of a multistakeholder task force, involving a wide range of experts (researchers, patient representatives, methodologists, information technology experts, and representatives from funders, infrastructures and standards development organisations). An independent facilitator supported the process using the nominal group technique. The consensus was reached in a series of three workshops held over 1 year, supported by exchange of documents and teleconferences within focused subgroups when needed. This work was set within the Horizon 2020-funded project CORBEL (Coordinated Research Infrastructures Building Enduring Life-science Services) and coordinated by the European Clinical Research Infrastructure Network. Thus, the focus was on non-commercial trials and the perspective mainly European. We developed principles and practical recommendations on how to share data from clinical trials. The task force reached consensus on 10 principles and 50 recommendations, representing the fundamental requirements of any framework used for the sharing of clinical trials data. The document covers the following main areas: making data sharing a reality (eg, cultural change, academic incentives, funding), consent for data sharing, protection of trial participants (eg, de-identification), data standards, rights, types and management of access (eg, data request and access models), data management and repositories, discoverability, and metadata. The adoption of the recommendations in this document would help to promote and support data sharing and reuse among researchers, adequately inform trial participants and protect their rights, and provide effective and efficient systems for preparing, storing and accessing data. The recommendations now need to be implemented and tested in practice. Further work needs to be done to integrate these proposals with those from other geographical areas and other academic domains. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Page layout analysis and classification for complex scanned documents
NASA Astrophysics Data System (ADS)
Erkilinc, M. Sezer; Jaber, Mustafa; Saber, Eli; Bauer, Peter; Depalov, Dejan
2011-09-01
A framework for region/zone classification in color and gray-scale scanned documents is proposed in this paper. The algorithm includes modules for extracting text, photo, and strong edge/line regions. Firstly, a text detection module which is based on wavelet analysis and Run Length Encoding (RLE) technique is employed. Local and global energy maps in high frequency bands of the wavelet domain are generated and used as initial text maps. Further analysis using RLE yields a final text map. The second module is developed to detect image/photo and pictorial regions in the input document. A block-based classifier using basis vector projections is employed to identify photo candidate regions. Then, a final photo map is obtained by applying probabilistic model based on Markov random field (MRF) based maximum a posteriori (MAP) optimization with iterated conditional mode (ICM). The final module detects lines and strong edges using Hough transform and edge-linkages analysis, respectively. The text, photo, and strong edge/line maps are combined to generate a page layout classification of the scanned target document. Experimental results and objective evaluation show that the proposed technique has a very effective performance on variety of simple and complex scanned document types obtained from MediaTeam Oulu document database. The proposed page layout classifier can be used in systems for efficient document storage, content based document retrieval, optical character recognition, mobile phone imagery, and augmented reality.
Christoph, J; Griebel, L; Leb, I; Engel, I; Köpcke, F; Toddenroth, D; Prokosch, H-U; Laufer, J; Marquardt, K; Sedlmayr, M
2015-01-01
The secondary use of clinical data provides large opportunities for clinical and translational research as well as quality assurance projects. For such purposes, it is necessary to provide a flexible and scalable infrastructure that is compliant with privacy requirements. The major goals of the cloud4health project are to define such an architecture, to implement a technical prototype that fulfills these requirements and to evaluate it with three use cases. The architecture provides components for multiple data provider sites such as hospitals to extract free text as well as structured data from local sources and de-identify such data for further anonymous or pseudonymous processing. Free text documentation is analyzed and transformed into structured information by text-mining services, which are provided within a cloud-computing environment. Thus, newly gained annotations can be integrated along with the already available structured data items and the resulting data sets can be uploaded to a central study portal for further analysis. Based on the architecture design, a prototype has been implemented and is under evaluation in three clinical use cases. Data from several hundred patients provided by a University Hospital and a private hospital chain have already been processed. Cloud4health has shown how existing components for secondary use of structured data can be complemented with text-mining in a privacy compliant manner. The cloud-computing paradigm allows a flexible and dynamically adaptable service provision that facilitates the adoption of services by data providers without own investments in respective hardware resources and software tools.
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. House Select Committee on Children, Youth, and Families.
The text of a Congressional hearing to examine the impact of eating disorders on children and families is presented in this document. Testimony by the following witnesses is included: (1) Krista Brown, eating disorder victim, and her mother, Susan Brown; (2) Robert B. Duncan, a hospital president; (3) Patricia Fallon, a clinical psychogist; (4)…
Versloot, Judith; Grudniewicz, Agnes; Chatterjee, Ananda; Hayden, Leigh; Kastner, Monika; Bhattacharyya, Onil
2015-06-01
We present simple formatting rules derived from an extensive literature review that can improve the format of clinical practice guidelines (CPGs), and potentially increase the likelihood of being used. We recently conducted a review of the literature from medicine, psychology, design, and human factors engineering on characteristics of guidelines that are associated with their use in practice, covering both the creation and communication of content. The formatting rules described in this article are derived from that review. The formatting rules are grouped into three categories that can be easily applied to CPGs: first, Vivid: make it stand out; second, Intuitive: match it to the audience's expectations, and third, Visual: use alternatives to text. We highlight rules supported by our broad literature review and provide specific 'how to' recommendations for individuals and groups developing evidence-based materials for clinicians. The way text documents are formatted influences their accessibility and usability. Optimizing the formatting of CPGs is a relatively inexpensive intervention and can be used to facilitate the dissemination of evidence in healthcare. Applying simple formatting principles to make documents more vivid, intuitive, and visual is a practical approach that has the potential to influence the usability of guidelines and to influence the extent to which guidelines are read, remembered, and used in practice.
A systematic literature review of automated clinical coding and classification systems
Williams, Margaret; Fenton, Susan H; Jenders, Robert A; Hersh, William R
2010-01-01
Clinical coding and classification processes transform natural language descriptions in clinical text into data that can subsequently be used for clinical care, research, and other purposes. This systematic literature review examined studies that evaluated all types of automated coding and classification systems to determine the performance of such systems. Studies indexed in Medline or other relevant databases prior to March 2009 were considered. The 113 studies included in this review show that automated tools exist for a variety of coding and classification purposes, focus on various healthcare specialties, and handle a wide variety of clinical document types. Automated coding and classification systems themselves are not generalizable, nor are the results of the studies evaluating them. Published research shows these systems hold promise, but these data must be considered in context, with performance relative to the complexity of the task and the desired outcome. PMID:20962126
A systematic literature review of automated clinical coding and classification systems.
Stanfill, Mary H; Williams, Margaret; Fenton, Susan H; Jenders, Robert A; Hersh, William R
2010-01-01
Clinical coding and classification processes transform natural language descriptions in clinical text into data that can subsequently be used for clinical care, research, and other purposes. This systematic literature review examined studies that evaluated all types of automated coding and classification systems to determine the performance of such systems. Studies indexed in Medline or other relevant databases prior to March 2009 were considered. The 113 studies included in this review show that automated tools exist for a variety of coding and classification purposes, focus on various healthcare specialties, and handle a wide variety of clinical document types. Automated coding and classification systems themselves are not generalizable, nor are the results of the studies evaluating them. Published research shows these systems hold promise, but these data must be considered in context, with performance relative to the complexity of the task and the desired outcome.
[Problem list in computer-based patient records].
Ludwig, C A
1997-01-14
Computer-based clinical information systems are capable of effectively processing even large amounts of patient-related data. However, physicians depend on rapid access to summarized, clearly laid out data on the computer screen to inform themselves about a patient's current clinical situation. In introducing a clinical workplace system, we therefore transformed the problem list-which for decades has been successfully used in clinical information management-into an electronic equivalent and integrated it into the medical record. The table contains a concise overview of diagnoses and problems as well as related findings. Graphical information can also be integrated into the table, and an additional space is provided for a summary of planned examinations or interventions. The digital form of the problem list makes it possible to use the entire list or selected text elements for generating medical documents. Diagnostic terms for medical reports are transferred automatically to corresponding documents. Computer technology has an immense potential for the further development of problem list concepts. With multimedia applications sound and images will be included in the problem list. For hyperlink purpose the problem list could become a central information board and table of contents of the medical record, thus serving as the starting point for database searches and supporting the user in navigating through the medical record.
Fernandes, Andrea C; Dutta, Rina; Velupillai, Sumithra; Sanyal, Jyoti; Stewart, Robert; Chandran, David
2018-05-09
Research into suicide prevention has been hampered by methodological limitations such as low sample size and recall bias. Recently, Natural Language Processing (NLP) strategies have been used with Electronic Health Records to increase information extraction from free text notes as well as structured fields concerning suicidality and this allows access to much larger cohorts than previously possible. This paper presents two novel NLP approaches - a rule-based approach to classify the presence of suicide ideation and a hybrid machine learning and rule-based approach to identify suicide attempts in a psychiatric clinical database. Good performance of the two classifiers in the evaluation study suggest they can be used to accurately detect mentions of suicide ideation and attempt within free-text documents in this psychiatric database. The novelty of the two approaches lies in the malleability of each classifier if a need to refine performance, or meet alternate classification requirements arises. The algorithms can also be adapted to fit infrastructures of other clinical datasets given sufficient clinical recording practice knowledge, without dependency on medical codes or additional data extraction of known risk factors to predict suicidal behaviour.
Redundancy-Aware Topic Modeling for Patient Record Notes
Cohen, Raphael; Aviram, Iddo; Elhadad, Michael; Elhadad, Noémie
2014-01-01
The clinical notes in a given patient record contain much redundancy, in large part due to clinicians’ documentation habit of copying from previous notes in the record and pasting into a new note. Previous work has shown that this redundancy has a negative impact on the quality of text mining and topic modeling in particular. In this paper we describe a novel variant of Latent Dirichlet Allocation (LDA) topic modeling, Red-LDA, which takes into account the inherent redundancy of patient records when modeling content of clinical notes. To assess the value of Red-LDA, we experiment with three baselines and our novel redundancy-aware topic modeling method: given a large collection of patient records, (i) apply vanilla LDA to all documents in all input records; (ii) identify and remove all redundancy by chosing a single representative document for each record as input to LDA; (iii) identify and remove all redundant paragraphs in each record, leaving partial, non-redundant documents as input to LDA; and (iv) apply Red-LDA to all documents in all input records. Both quantitative evaluation carried out through log-likelihood on held-out data and topic coherence of produced topics and qualitative assessement of topics carried out by physicians show that Red-LDA produces superior models to all three baseline strategies. This research contributes to the emerging field of understanding the characteristics of the electronic health record and how to account for them in the framework of data mining. The code for the two redundancy-elimination baselines and Red-LDA is made publicly available to the community. PMID:24551060
Redundancy-aware topic modeling for patient record notes.
Cohen, Raphael; Aviram, Iddo; Elhadad, Michael; Elhadad, Noémie
2014-01-01
The clinical notes in a given patient record contain much redundancy, in large part due to clinicians' documentation habit of copying from previous notes in the record and pasting into a new note. Previous work has shown that this redundancy has a negative impact on the quality of text mining and topic modeling in particular. In this paper we describe a novel variant of Latent Dirichlet Allocation (LDA) topic modeling, Red-LDA, which takes into account the inherent redundancy of patient records when modeling content of clinical notes. To assess the value of Red-LDA, we experiment with three baselines and our novel redundancy-aware topic modeling method: given a large collection of patient records, (i) apply vanilla LDA to all documents in all input records; (ii) identify and remove all redundancy by chosing a single representative document for each record as input to LDA; (iii) identify and remove all redundant paragraphs in each record, leaving partial, non-redundant documents as input to LDA; and (iv) apply Red-LDA to all documents in all input records. Both quantitative evaluation carried out through log-likelihood on held-out data and topic coherence of produced topics and qualitative assessment of topics carried out by physicians show that Red-LDA produces superior models to all three baseline strategies. This research contributes to the emerging field of understanding the characteristics of the electronic health record and how to account for them in the framework of data mining. The code for the two redundancy-elimination baselines and Red-LDA is made publicly available to the community.
Integration of a knowledge-based system and a clinical documentation system via a data dictionary.
Eich, H P; Ohmann, C; Keim, E; Lang, K
1997-01-01
This paper describes the design and realisation of a knowledge-based system and a clinical documentation system linked via a data dictionary. The software was developed as a shell with object oriented methods and C++ for IBM-compatible PC's and WINDOWS 3.1/95. The data dictionary covers terminology and document objects with relations to external classifications. It controls the terminology in the documentation program with form-based entry of clinical documents and in the knowledge-based system with scores and rules. The software was applied to the clinical field of acute abdominal pain by implementing a data dictionary with 580 terminology objects, 501 document objects, and 2136 links; a documentation module with 8 clinical documents and a knowledge-based system with 10 scores and 7 sets of rules.
NLPReViz: an interactive tool for natural language processing on clinical text.
Trivedi, Gaurav; Pham, Phuong; Chapman, Wendy W; Hwa, Rebecca; Wiebe, Janyce; Hochheiser, Harry
2018-01-01
The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe a prototype tool for interactive review and revision of natural language processing models of binary concepts extracted from clinical notes. We evaluated our prototype in a user study involving 9 physicians, who used our tool to build and revise models for 2 colonoscopy quality variables. We report changes in performance relative to the quantity of feedback. Using initial training sets as small as 10 documents, expert review led to final F1scores for the "appendiceal-orifice" variable between 0.78 and 0.91 (with improvements ranging from 13.26% to 29.90%). F1for "biopsy" ranged between 0.88 and 0.94 (-1.52% to 11.74% improvements). The average System Usability Scale score was 70.56. Subjective feedback also suggests possible design improvements. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Cavero, Icilio; Crumb, William
2005-05-01
The International Conference on Harmonization (ICH) stems from the initiative of three major world partners (Japan, USA, European Community) who composed a mutually accepted body of regulations concerning the safety, quality and efficacy requirements that new medicines have to meet in order to receive market approval. Documents on non-clinical safety pharmacology already composed by this organisation include two guidelines: the S7A adopted in 2000 and, its companion, the S7B guideline, in a draft form since 2001. The S7A guideline deals with general principles and recommendations on safety pharmacology studies designed to protect healthy volunteers and patients from potential drug-induced adverse reactions. The S7B recommends a general non-clinical testing strategy for determining the propensity of non-cardiovascular pharmaceuticals to delay ventricular repolarisation, an effect that at times progresses into life-threatening ventricular arrhythmia. In the most recent version of this document (June 2004), the strategy proposes experimental assays and a critical examination of other pertinent information for applying an 'evidence of risk' label to a compound. Regrettably, the guideline fails to deal satisfactorily with a number of crucial issues such as scoring the evidence of risk and the clinical consequences of such scoring. However, in the latter case, the S7B relies on the new ICH guideline E14 which is currently in preparation. E14 is the clinical counterpart of the S7B guideline which states that non-clinical data are a poor predictor of drug-induced repolarisation delay in humans. The present contribution summarises and assesses salient aspects of the S7A guideline as its founding principles are also applicable to the S7B guideline. The differences in strategies proposed by the various existing drafts of the latter document are critically examined together with some unresolved, crucial problems. The need for extending the objective of the S7B document to characterise the full electrophysiological profile of new pharmaceuticals is argued as this approach would more extensively assess the non-clinical cardiac safety of a drug. Finally, in order to overcome present difficulties in arriving at the definitive version of the S7B guideline, the Expert Working Group could reflect on the introduction of the S7B guideline recommendations in the S7A document, as originally intended, or on postponing the adoption of an harmonized text until the availability of novel scientific data allows solving presently contentious aspects of this and the E14 guidelines.
Three-Dimensional Dispaly Of Document Set
Lantrip, David B.; Pennock, Kelly A.; Pottier, Marc C.; Schur, Anne; Thomas, James J.; Wise, James A.
2003-06-24
A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may be transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts' effort.
Three-dimensional display of document set
Lantrip, David B [Oxnard, CA; Pennock, Kelly A [Richland, WA; Pottier, Marc C [Richland, WA; Schur, Anne [Richland, WA; Thomas, James J [Richland, WA; Wise, James A [Richland, WA
2006-09-26
A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may e transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts' effort.
Three-dimensional display of document set
Lantrip, David B [Oxnard, CA; Pennock, Kelly A [Richland, WA; Pottier, Marc C [Richland, WA; Schur, Anne [Richland, WA; Thomas, James J [Richland, WA; Wise, James A [Richland, WA
2001-10-02
A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may be transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts' effort.
Three-dimensional display of document set
Lantrip, David B [Oxnard, CA; Pennock, Kelly A [Richland, WA; Pottier, Marc C [Richland, WA; Schur, Anne [Richland, WA; Thomas, James J [Richland, WA; Wise, James A [Richland, WA; York, Jeremy [Bothell, WA
2009-06-30
A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may be transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts' effort.
Eyrolle, Hélène; Virbel, Jacques; Lemarié, Julie
2008-03-01
Based on previous research in the field of cognitive psychology, highlighting the facilitatory effects of titles on several text-related activities, this paper looks at the extent to which titles reflect text content. An exploratory study of real-life technical documents investigated the content of their Subject lines, which linguistic analyses had led us to regard as titles. The study showed that most of the titles supplied by the writers failed to represent the documents' contents and that most users failed to detect this lack of validity.
Text-line extraction in handwritten Chinese documents based on an energy minimization framework.
Koo, Hyung Il; Cho, Nam Ik
2012-03-01
Text-line extraction in unconstrained handwritten documents remains a challenging problem due to nonuniform character scale, spatially varying text orientation, and the interference between text lines. In order to address these problems, we propose a new cost function that considers the interactions between text lines and the curvilinearity of each text line. Precisely, we achieve this goal by introducing normalized measures for them, which are based on an estimated line spacing. We also present an optimization method that exploits the properties of our cost function. Experimental results on a database consisting of 853 handwritten Chinese document images have shown that our method achieves a detection rate of 99.52% and an error rate of 0.32%, which outperforms conventional methods.
When and how do GPs record vital signs in children with acute infections? A cross-sectional study
Blacklock, Claire; Haj-Hassan, Tanya Ali; Thompson, Matthew J
2012-01-01
Background NICE recommendations and evidence from ambulatory settings promotes the use of vital signs in identifying serious infections in children. This appears to differ from usual clinical practice where GPs report measuring vital signs infrequently. Aim To identify frequency of vital sign documentation by GPs, in the assessment of children with acute infections in primary care. Design and setting Observational study in 15 general practice surgeries in Oxfordshire and Somerset, UK. Method A standardised proforma was used to extract consultation details including documentation of numerical vital signs, and words or phrases used by the GP in assessing vital signs, for 850 children aged 1 month to 16 years presenting with acute infection. Results Of the children presenting with acute infections 31.6% had one or more numerical vital signs recorded (269, 31.6%), however GP recording rate improved if free text proxies were also considered: at least one vital sign was then recorded in over half (54.1%) of children. In those with recorded numerical values for vital signs, the most frequent was temperature (210, 24.7%), followed by heart rate (62, 7.3%), respiratory rate (58, 6.8%), and capillary refill time (36, 4.2%). Words or phrases for vital signs were documented infrequently (temperature 17.6%, respiratory rate 14.6%, capillary refill time 12.5%, and heart rate 0.5%), Text relating to global assessment was documented in 313/850 (36.8%) of consultations. Conclusion GPs record vital signs using words and phrases as well as numerical methods, although overall documentation of vital signs is infrequent in children presenting with acute infections. PMID:23265227
Mapping annotations with textual evidence using an scLDA model.
Jin, Bo; Chen, Vicky; Chen, Lujia; Lu, Xinghua
2011-01-01
Most of the knowledge regarding genes and proteins is stored in biomedical literature as free text. Extracting information from complex biomedical texts demands techniques capable of inferring biological concepts from local text regions and mapping them to controlled vocabularies. To this end, we present a sentence-based correspondence latent Dirichlet allocation (scLDA) model which, when trained with a corpus of PubMed documents with known GO annotations, performs the following tasks: 1) learning major biological concepts from the corpus, 2) inferring the biological concepts existing within text regions (sentences), and 3) identifying the text regions in a document that provides evidence for the observed annotations. When applied to new gene-related documents, a trained scLDA model is capable of predicting GO annotations and identifying text regions as textual evidence supporting the predicted annotations. This study uses GO annotation data as a testbed; the approach can be generalized to other annotated data, such as MeSH and MEDLINE documents.
Cross-mapping clinical notes between hospitals: an application of the LOINC Document Ontology.
Li, Li; Morrey, C Paul; Baorto, David
2011-01-01
Standardization of document titles is essential for management as the volume of electronic clinical notes increases. The two campuses of the New York Presbyterian Hospital have over 2,700 distinct document titles. The LOINC Document Ontology (DO) provides a standard for the naming of clinical documents in a multi-axis structure. We have represented the latest LOINC DO structure in the MED, and developed an automated process mapping the clinical documents from both the West (Columbia) and East (Cornell) campuses to the LOINC DO. We find that the LOINC DO can represent the majority of our documents, and about half of the documents map between campuses using the LOINC DO as a reference. We evaluated the possibility of using current LOINC codes in document exchange between different institutions. While there is clear success in the ability of the LOINC DO to represent documents and facilitate exchange we find there are granularity issues.
Text mining for traditional Chinese medical knowledge discovery: a survey.
Zhou, Xuezhong; Peng, Yonghong; Liu, Baoyan
2010-08-01
Extracting meaningful information and knowledge from free text is the subject of considerable research interest in the machine learning and data mining fields. Text data mining (or text mining) has become one of the most active research sub-fields in data mining. Significant developments in the area of biomedical text mining during the past years have demonstrated its great promise for supporting scientists in developing novel hypotheses and new knowledge from the biomedical literature. Traditional Chinese medicine (TCM) provides a distinct methodology with which to view human life. It is one of the most complete and distinguished traditional medicines with a history of several thousand years of studying and practicing the diagnosis and treatment of human disease. It has been shown that the TCM knowledge obtained from clinical practice has become a significant complementary source of information for modern biomedical sciences. TCM literature obtained from the historical period and from modern clinical studies has recently been transformed into digital data in the form of relational databases or text documents, which provide an effective platform for information sharing and retrieval. This motivates and facilitates research and development into knowledge discovery approaches and to modernize TCM. In order to contribute to this still growing field, this paper presents (1) a comparative introduction to TCM and modern biomedicine, (2) a survey of the related information sources of TCM, (3) a review and discussion of the state of the art and the development of text mining techniques with applications to TCM, (4) a discussion of the research issues around TCM text mining and its future directions. Copyright 2010 Elsevier Inc. All rights reserved.
Solt, Illés; Tikk, Domonkos; Gál, Viktor; Kardkovács, Zsolt T.
2009-01-01
Objective Automated and disease-specific classification of textual clinical discharge summaries is of great importance in human life science, as it helps physicians to make medical studies by providing statistically relevant data for analysis. This can be further facilitated if, at the labeling of discharge summaries, semantic labels are also extracted from text, such as whether a given disease is present, absent, questionable in a patient, or is unmentioned in the document. The authors present a classification technique that successfully solves the semantic classification task. Design The authors introduce a context-aware rule-based semantic classification technique for use on clinical discharge summaries. The classification is performed in subsequent steps. First, some misleading parts are removed from the text; then the text is partitioned into positive, negative, and uncertain context segments, then a sequence of binary classifiers is applied to assign the appropriate semantic labels. Measurement For evaluation the authors used the documents of the i2b2 Obesity Challenge and adopted its evaluation measures: F1-macro and F1-micro for measurements. Results On the two subtasks of the Obesity Challenge (textual and intuitive classification) the system performed very well, and achieved a F1-macro = 0.80 for the textual and F1-macro = 0.67 for the intuitive tasks, and obtained second place at the textual and first place at the intuitive subtasks of the challenge. Conclusions The authors show in the paper that a simple rule-based classifier can tackle the semantic classification task more successfully than machine learning techniques, if the training data are limited and some semantic labels are very sparse. PMID:19390101
How to use the WWW to distribute STI
NASA Technical Reports Server (NTRS)
Roper, Donna G.
1994-01-01
This presentation explains how to use the World Wide Web (WWW) to distribute scientific and technical information as hypermedia. WWW clients and servers use the HyperText Transfer Protocol (HTTP) to transfer documents containing links to other text, graphics, video, and sound. The standard language for these documents is the HyperText MarkUp Language (HTML). These are simply text files with formatting codes that contain layout information and hyperlinks. HTML documents can be created with any text editor or with one of the publicly available HTML editors or convertors. HTML can also include links to available image formats. This presentation is available online. The URL is http://sti.larc.nasa. (followed by) gov/demos/workshop/introtext.html.
Creation of structured documentation templates using Natural Language Processing techniques.
Kashyap, Vipul; Turchin, Alexander; Morin, Laura; Chang, Frank; Li, Qi; Hongsermeier, Tonya
2006-01-01
Structured Clinical Documentation is a fundamental component of the healthcare enterprise, linking both clinical (e.g., electronic health record, clinical decision support) and administrative functions (e.g., evaluation and management coding, billing). One of the challenges in creating good quality documentation templates has been the inability to address specialized clinical disciplines and adapt to local clinical practices. A one-size-fits-all approach leads to poor adoption and inefficiencies in the documentation process. On the other hand, the cost associated with manual generation of documentation templates is significant. Consequently there is a need for at least partial automation of the template generation process. We propose an approach and methodology for the creation of structured documentation templates for diabetes using Natural Language Processing (NLP).
Managing the life cycle of electronic clinical documents.
Payne, Thomas H; Graham, Gail
2006-01-01
To develop a model of the life cycle of clinical documents from inception to use in a person's medical record, including workflow requirements from clinical practice, local policy, and regulation. We propose a model for the life cycle of clinical documents as a framework for research on documentation within electronic medical record (EMR) systems. Our proposed model includes three axes: the stages of the document, the roles of those involved with the document, and the actions those involved may take on the document at each stage. The model includes the rules to describe who (in what role) can perform what actions on the document, and at what stages they can perform them. Rules are derived from needs of clinicians, and requirements of hospital bylaws and regulators. Our model encompasses current practices for paper medical records and workflow in some EMR systems. Commercial EMR systems include methods for implementing document workflow rules. Workflow rules that are part of this model mirror functionality in the Department of Veterans Affairs (VA) EMR system where the Authorization/ Subscription Utility permits document life cycle rules to be written in English-like fashion. Creating a model of the life cycle of clinical documents serves as a framework for discussion of document workflow, how rules governing workflow can be implemented in EMR systems, and future research of electronic documentation.
Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers
ERIC Educational Resources Information Center
Anaya, Leticia H.
2011-01-01
In the Information Age, a proliferation of unstructured text electronic documents exists. Processing these documents by humans is a daunting task as humans have limited cognitive abilities for processing large volumes of documents that can often be extremely lengthy. To address this problem, text data computer algorithms are being developed.…
NASA Astrophysics Data System (ADS)
David, Peter; Hansen, Nichole; Nolan, James J.; Alcocer, Pedro
2015-05-01
The growth in text data available online is accompanied by a growth in the diversity of available documents. Corpora with extreme heterogeneity in terms of file formats, document organization, page layout, text style, and content are common. The absence of meaningful metadata describing the structure of online and open-source data leads to text extraction results that contain no information about document structure and are cluttered with page headers and footers, web navigation controls, advertisements, and other items that are typically considered noise. We describe an approach to document structure and metadata recovery that uses visual analysis of documents to infer the communicative intent of the author. Our algorithm identifies the components of documents such as titles, headings, and body content, based on their appearance. Because it operates on an image of a document, our technique can be applied to any type of document, including scanned images. Our approach to document structure recovery considers a finer-grained set of component types than prior approaches. In this initial work, we show that a machine learning approach to document structure recovery using a feature set based on the geometry and appearance of images of documents achieves a 60% greater F1- score than a baseline random classifier.
iSMART: Ontology-based Semantic Query of CDA Documents
Liu, Shengping; Ni, Yuan; Mei, Jing; Li, Hanyu; Xie, Guotong; Hu, Gang; Liu, Haifeng; Hou, Xueqiao; Pan, Yue
2009-01-01
The Health Level 7 Clinical Document Architecture (CDA) is widely accepted as the format for electronic clinical document. With the rich ontological references in CDA documents, the ontology-based semantic query could be performed to retrieve CDA documents. In this paper, we present iSMART (interactive Semantic MedicAl Record reTrieval), a prototype system designed for ontology-based semantic query of CDA documents. The clinical information in CDA documents will be extracted into RDF triples by a declarative XML to RDF transformer. An ontology reasoner is developed to infer additional information by combining the background knowledge from SNOMED CT ontology. Then an RDF query engine is leveraged to enable the semantic queries. This system has been evaluated using the real clinical documents collected from a large hospital in southern China. PMID:20351883
Retrieving Clinical Evidence: A Comparison of PubMed and Google Scholar for Quick Clinical Searches
Bejaimal, Shayna AD; Sontrop, Jessica M; Iansavichus, Arthur V; Haynes, R Brian; Weir, Matthew A; Garg, Amit X
2013-01-01
Background Physicians frequently search PubMed for information to guide patient care. More recently, Google Scholar has gained popularity as another freely accessible bibliographic database. Objective To compare the performance of searches in PubMed and Google Scholar. Methods We surveyed nephrologists (kidney specialists) and provided each with a unique clinical question derived from 100 renal therapy systematic reviews. Each physician provided the search terms they would type into a bibliographic database to locate evidence to answer the clinical question. We executed each of these searches in PubMed and Google Scholar and compared results for the first 40 records retrieved (equivalent to 2 default search pages in PubMed). We evaluated the recall (proportion of relevant articles found) and precision (ratio of relevant to nonrelevant articles) of the searches performed in PubMed and Google Scholar. Primary studies included in the systematic reviews served as the reference standard for relevant articles. We further documented whether relevant articles were available as free full-texts. Results Compared with PubMed, the average search in Google Scholar retrieved twice as many relevant articles (PubMed: 11%; Google Scholar: 22%; P<.001). Precision was similar in both databases (PubMed: 6%; Google Scholar: 8%; P=.07). Google Scholar provided significantly greater access to free full-text publications (PubMed: 5%; Google Scholar: 14%; P<.001). Conclusions For quick clinical searches, Google Scholar returns twice as many relevant articles as PubMed and provides greater access to free full-text articles. PMID:23948488
Integration of clinical research documentation in electronic health records.
Broach, Debra
2015-04-01
Clinical trials of investigational drugs and devices are often conducted within healthcare facilities concurrently with clinical care. With implementation of electronic health records, new communication methods are required to notify nonresearch clinicians of research participation. This article reviews clinical research source documentation, the electronic health record and the medical record, areas in which the research record and electronic health record overlap, and implications for the research nurse coordinator in documentation of the care of the patient/subject. Incorporation of clinical research documentation in the electronic health record will lead to a more complete patient/subject medical record in compliance with both research and medical records regulations. A literature search provided little information about the inclusion of clinical research documentation within the electronic health record. Although regulations and guidelines define both source documentation and the medical record, integration of research documentation in the electronic health record is not clearly defined. At minimum, the signed informed consent(s), investigational drug or device usage, and research team contact information should be documented within the electronic health record. Institutional policies should define a standardized process for this integration in the absence federal guidance. Nurses coordinating clinical trials are in an ideal position to define this integration.
Cooper, Gregory F.; Miller, Randolph A.
1998-01-01
Abstract Objective: A primary goal of the University of Pittsburgh's 1990-94 UMLS-sponsored effort was to develop and evaluate PostDoc (a lexical indexing system) and Pindex (a statistical indexing system) comparatively, and then in combination as a hybrid system. Each system takes as input a portion of the free text from a narrative part of a patient's electronic medical record and returns a list of suggested MeSH terms to use in formulating a Medline search that includes concepts in the text. This paper describes the systems and reports an evaluation. The intent is for this evaluation to serve as a step toward the eventual realization of systems that assist healthcare personnel in using the electronic medical record to construct patient-specific searches of Medline. Design: The authors tested the performances of PostDoc, Pindex, and a hybrid system, using text taken from randomly selected clinical records, which were stratified to include six radiology reports, six pathology reports, and six discharge summaries. They identified concepts in the clinical records that might conceivably be used in performing a patient-specific Medline search. Each system was given the free text of each record as an input. The extent to which a system-derived list of MeSH terms captured the relevant concepts in these documents was determined based on blinded assessments by the authors. Results: PostDoc output a mean of approximately 19 MeSH terms per report, which included about 40% of the relevant report concepts. Pindex output a mean of approximately 57 terms per report and captured about 45% of the relevant report concepts. A hybrid system captured approximately 66% of the relevant concepts and output about 71 terms per report. Conclusion: The outputs of PostDoc and Pindex are complementary in capturing MeSH terms from clinical free text. The results suggest possible approaches to reduce the number of terms output while maintaining the percentage of terms captured, including the use of UMLS semantic types to constrain the output list to contain only clinically relevant MeSH terms. PMID:9452986
Sutcliffe, Catherine G; Thuma, Philip E; van Dijk, Janneke H; Sinywimaanzi, Kathy; Mweetwa, Sydney; Hamahuwa, Mutinta; Moss, William J
2017-03-08
Early infant diagnosis of HIV infection is challenging in rural sub-Saharan Africa as blood samples are sent to central laboratories for HIV DNA testing, leading to delays in diagnosis and treatment initiation. Simple technologies to rapidly deliver results to clinics and notify mothers of test results would decrease many of these delays. The feasibility of using mobile phones to contact mothers was evaluated. In addition, the first two years of implementation of a national short message service (SMS) reporting system to deliver test results from the laboratory to the clinic were evaluated. The study was conducted in Macha, Zambia from 2013 to 2015 among mothers of HIV-exposed infants. Mothers were interviewed about mobile phone use and willingness to be contacted directly or through their rural health center. Mothers were contacted according to their preferred method of communication when test results were available. Mothers of positive infants were asked to return to the clinic as soon as possible. Dates of sample collection, delivery of test results to the clinic and notification of mothers were documented in addition to test results. Four hundred nineteen mothers and infants were enrolled. Only 30% of mothers had ever used a mobile phone. 96% of mobile phone owners were reached by study staff and 98% of mothers without mobile phones were contacted through their rural health center. Turnaround times for mothers of positive infants were approximately 2 weeks shorter than for mothers of negative infants. Delivery of test results by the national SMS system improved from 2013 to 2014, with increases in the availability of texted results (38 vs. 91%) and arrival of the texted result prior to the hardcopy report (27 vs. 83%). Texted results arriving at the clinic before the hardcopy were received a median of 19 days earlier. Four discrepancies between texted and hardcopy results were identified out of 340 tests. Mobile phone and text messaging technology has the potential to improve early infant diagnosis but challenges to widespread implementation need to be addressed, including low mobile phone ownership, use and coverage in rural areas.
DOE Research and Development Accomplishments Help
be used to search, locate, access, and electronically download full-text research and development (R Browse Downloading, Viewing, and/or Searching Full-text Documents/Pages Searching the Database Search Features Search allows you to search the OCRed full-text document and bibliographic information, the
Text line extraction in free style document
NASA Astrophysics Data System (ADS)
Shen, Xiaolu; Liu, Changsong; Ding, Xiaoqing; Zou, Yanming
2009-01-01
This paper addresses to text line extraction in free style document, such as business card, envelope, poster, etc. In free style document, global property such as character size, line direction can hardly be concluded, which reveals a grave limitation in traditional layout analysis. 'Line' is the most prominent and the highest structure in our bottom-up method. First, we apply a novel intensity function found on gradient information to locate text areas where gradient within a window have large magnitude and various directions, and split such areas into text pieces. We build a probability model of lines consist of text pieces via statistics on training data. For an input image, we group text pieces to lines using a simulated annealing algorithm with cost function based on the probability model.
Semantic Theme Analysis of Pilot Incident Reports
NASA Technical Reports Server (NTRS)
Thirumalainambi, Rajkumar
2009-01-01
Pilots report accidents or incidents during take-off, on flight and landing to airline authorities and Federal aviation authority as well. The description of pilot reports for an incident contains technical terms related to Flight instruments and operations. Normal text mining approaches collect keywords from text documents and relate them among documents that are stored in database. Present approach will extract specific theme analysis of incident reports and semantically relate hierarchy of terms assigning weights of themes. Once the theme extraction has been performed for a given document, a unique key can be assigned to that document to cross linking the documents. Semantic linking will be used to categorize the documents based on specific rules that can help an end-user to analyze certain types of accidents. This presentation outlines the architecture of text mining for pilot incident reports for autonomous categorization of pilot incident reports using semantic theme analysis.
Preparing a collection of radiology examinations for distribution and retrieval.
Demner-Fushman, Dina; Kohli, Marc D; Rosenman, Marc B; Shooshan, Sonya E; Rodriguez, Laritza; Antani, Sameer; Thoma, George R; McDonald, Clement J
2016-03-01
Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database. The authors collected 3996 radiology reports from the Indiana Network for Patient Care and 8121 associated images from the hospitals' picture archiving systems. The images and reports were de-identified automatically and then the automatic de-identification was manually verified. The authors coded the key findings of the reports and empirically assessed the benefits of manual coding on retrieval. The automatic de-identification of the narrative was aggressive and achieved 100% precision at the cost of rendering a few findings uninterpretable. Automatic de-identification of images was not quite as perfect. Images for two of 3996 patients (0.05%) showed protected health information. Manual encoding of findings improved retrieval precision. Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and needs special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/). Published by Oxford University Press on behalf of the American Medical Informatics Association 2015. This work is written by US Government employees and is in the public domain in the US.
Mining protein function from text using term-based support vector machines
Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J
2005-01-01
Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835
An automated system for generating program documentation
NASA Technical Reports Server (NTRS)
Hanney, R. J.
1970-01-01
A documentation program was developed in which the emphasis is placed on text content rather than flowcharting. It is keyword oriented, with 26 keywords that control the program. Seventeen of those keywords are recognized by the flowchart generator, three are related to text generation, and three have to do with control card and deck displays. The strongest advantage offered by the documentation program is that it produces the entire document. The document is prepared on 35mm microfilm, which is easy to store, and letter-size reproductions can be made inexpensively on bond paper.
Information Model for Reusability in Clinical Trial Documentation
ERIC Educational Resources Information Center
Bahl, Bhanu
2013-01-01
In clinical research, New Drug Application (NDA) to health agencies requires generation of a large number of documents throughout the clinical development life cycle, many of which are also submitted to public databases and external partners. Current processes to assemble the information, author, review and approve the clinical research documents,…
Lilholt, Lars; Haubro, Camilla Dremstrup; Møller, Jørn Munkhof; Aarøe, Jens; Højen, Anne Randorff; Gøeg, Kirstine Rosenbeck
2013-01-01
It is well-established that to increase acceptance of electronic clinical documentation tools, such as electronic health record (EHR) systems, it is important to have a strong relationship between those who document the clinical encounters and those who reaps the benefit of digitalized and more structured documentation. [1] Therefore, templates for EHR systems benefit from being closely related to clinical practice with a strong focus on primarily solving clinical problems. Clinical use as a driver for structured documentation has been the focus of the acute-physical-examination template (APET) development in the North Denmark Region. The template was developed through a participatory design where precision and clarity of documentation was prioritized as well as fast registration. The resulting template has approximately 700 easy accessible input possibilities and will be evaluated in clinical practice in the first quarter of 2013.
Ontology-based reusable clinical document template production system.
Nam, Sejin; Lee, Sungin; Kim, James G Boram; Kim, Hong-Gee
2012-01-01
Clinical documents embody professional clinical knowledge. This paper shows an effective clinical document template (CDT) production system that uses a clinical description entity (CDE) model, a CDE ontology, and a knowledge management system called STEP that manages ontology-based clinical description entities. The ontology represents CDEs and their inter-relations, and the STEP system stores and manages CDE ontology-based information regarding CDTs. The system also provides Web Services interfaces for search and reasoning over clinical entities. The system was populated with entities and relations extracted from 35 CDTs that were used in admission, discharge, and progress reports, as well as those used in nursing and operation functions. A clinical document template editor is shown that uses STEP.
Forsyth, Alexander W; Barzilay, Regina; Hughes, Kevin S; Lui, Dickson; Lorenz, Karl A; Enzinger, Andrea; Tulsky, James A; Lindvall, Charlotta
2018-06-01
Clinicians document cancer patients' symptoms in free-text format within electronic health record visit notes. Although symptoms are critically important to quality of life and often herald clinical status changes, computational methods to assess the trajectory of symptoms over time are woefully underdeveloped. To create machine learning algorithms capable of extracting patient-reported symptoms from free-text electronic health record notes. The data set included 103,564 sentences obtained from the electronic clinical notes of 2695 breast cancer patients receiving paclitaxel-containing chemotherapy at two academic cancer centers between May 1996 and May 2015. We manually annotated 10,000 sentences and trained a conditional random field model to predict words indicating an active symptom (positive label), absence of a symptom (negative label), or no symptom at all (neutral label). Sentences labeled by human coder were divided into training, validation, and test data sets. Final model performance was determined on 20% test data unused in model development or tuning. The final model achieved precision of 0.82, 0.86, and 0.99 and recall of 0.56, 0.69, and 1.00 for positive, negative, and neutral symptom labels, respectively. The most common positive symptoms were pain, fatigue, and nausea. Machine-based labeling of 103,564 sentences took two minutes. We demonstrate the potential of machine learning to gather, track, and analyze symptoms experienced by cancer patients during chemotherapy. Although our initial model requires further optimization to improve the performance, further model building may yield machine learning methods suitable to be deployed in routine clinical care, quality improvement, and research applications. Copyright © 2018 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
A Feature Mining Based Approach for the Classification of Text Documents into Disjoint Classes.
ERIC Educational Resources Information Center
Nieto Sanchez, Salvador; Triantaphyllou, Evangelos; Kraft, Donald
2002-01-01
Proposes a new approach for classifying text documents into two disjoint classes. Highlights include a brief overview of document clustering; a data mining approach called the One Clause at a Time (OCAT) algorithm which is based on mathematical logic; vector space model (VSM); and comparing the OCAT to the VSM. (Author/LRW)
Text Categorization for Multi-Page Documents: A Hybrid Naive Bayes HMM Approach.
ERIC Educational Resources Information Center
Frasconi, Paolo; Soda, Giovanni; Vullo, Alessandro
Text categorization is typically formulated as a concept learning problem where each instance is a single isolated document. This paper is interested in a more general formulation where documents are organized as page sequences, as naturally occurring in digital libraries of scanned books and magazines. The paper describes a method for classifying…
RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.
Marshall, Iain J; Kuiper, Joël; Wallace, Byron C
2016-01-01
To develop and evaluate RobotReviewer, a machine learning (ML) system that automatically assesses bias in clinical trials. From a (PDF-formatted) trial report, the system should determine risks of bias for the domains defined by the Cochrane Risk of Bias (RoB) tool, and extract supporting text for these judgments. We algorithmically annotated 12,808 trial PDFs using data from the Cochrane Database of Systematic Reviews (CDSR). Trials were labeled as being at low or high/unclear risk of bias for each domain, and sentences were labeled as being informative or not. This dataset was used to train a multi-task ML model. We estimated the accuracy of ML judgments versus humans by comparing trials with two or more independent RoB assessments in the CDSR. Twenty blinded experienced reviewers rated the relevance of supporting text, comparing ML output with equivalent (human-extracted) text from the CDSR. By retrieving the top 3 candidate sentences per document (top3 recall), the best ML text was rated more relevant than text from the CDSR, but not significantly (60.4% ML text rated 'highly relevant' v 56.5% of text from reviews; difference +3.9%, [-3.2% to +10.9%]). Model RoB judgments were less accurate than those from published reviews, though the difference was <10% (overall accuracy 71.0% with ML v 78.3% with CDSR). Risk of bias assessment may be automated with reasonable accuracy. Automatically identified text supporting bias assessment is of equal quality to the manually identified text in the CDSR. This technology could substantially reduce reviewer workload and expedite evidence syntheses. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Assisted annotation of medical free text using RapTAT
Gobbel, Glenn T; Garvin, Jennifer; Reeves, Ruth; Cronin, Robert M; Heavirland, Julia; Williams, Jenifer; Weaver, Allison; Jayaramaraja, Shrimalini; Giuse, Dario; Speroff, Theodore; Brown, Steven H; Xu, Hua; Matheny, Michael E
2014-01-01
Objective To determine whether assisted annotation using interactive training can reduce the time required to annotate a clinical document corpus without introducing bias. Materials and methods A tool, RapTAT, was designed to assist annotation by iteratively pre-annotating probable phrases of interest within a document, presenting the annotations to a reviewer for correction, and then using the corrected annotations for further machine learning-based training before pre-annotating subsequent documents. Annotators reviewed 404 clinical notes either manually or using RapTAT assistance for concepts related to quality of care during heart failure treatment. Notes were divided into 20 batches of 19–21 documents for iterative annotation and training. Results The number of correct RapTAT pre-annotations increased significantly and annotation time per batch decreased by ∼50% over the course of annotation. Annotation rate increased from batch to batch for assisted but not manual reviewers. Pre-annotation F-measure increased from 0.5 to 0.6 to >0.80 (relative to both assisted reviewer and reference annotations) over the first three batches and more slowly thereafter. Overall inter-annotator agreement was significantly higher between RapTAT-assisted reviewers (0.89) than between manual reviewers (0.85). Discussion The tool reduced workload by decreasing the number of annotations needing to be added and helping reviewers to annotate at an increased rate. Agreement between the pre-annotations and reference standard, and agreement between the pre-annotations and assisted annotations, were similar throughout the annotation process, which suggests that pre-annotation did not introduce bias. Conclusions Pre-annotations generated by a tool capable of interactive training can reduce the time required to create an annotated document corpus by up to 50%. PMID:24431336
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burchard, Ross L.; Pierson, Kathleen P.; Trumbo, Derek
Tarjetas is used to generate requirements from source documents. These source documents are in a hierarchical XML format that have been produced from PDF documents processed through the “Reframe” software package. The software includes the ability to create Topics and associate text Snippets with those topics. Requirements are then generated and text Snippets with their associated Topics are referenced to the requirement. The software maintains traceability from the requirement ultimately to the source document that produced the snippet
75 FR 55267 - Airspace Designations; Incorporation By Reference
Federal Register 2010, 2011, 2012, 2013, 2014
2010-09-10
... airspace listings in FAA Order 7400.9T in full text as proposed rule documents in the Federal Register. Likewise, all amendments of these listings were published in full text as final rules in the Federal... Order 7400.9U in full text as proposed rule documents in the Federal Register. Likewise, all amendments...
76 FR 53328 - Airspace Designations; Incorporation by Reference
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-26
... proposed changes of the airspace listings in FAA Order 7400.9U in full text as proposed rule documents in the Federal Register. Likewise, all amendments of these listings were published in full text as final... the airspace listings in FAA Order 7400.9V in full text as proposed rule documents in the Federal...
ERIC Educational Resources Information Center
Beghtol, Clare
1986-01-01
Explicates a definition and theory of "aboutness" and aboutness analysis developed by text linguist van Dijk; explores implications of text linguistics for bibliographic classification theory; suggests the elements that a theory of the cognitive process of classifying documents needs to encompass; and delineates how people identify…
An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development
Knauff, Markus; Nejasmic, Jelica
2014-01-01
The choice of an efficient document preparation system is an important decision for any academic researcher. To assist the research community, we report a software usability study in which 40 researchers across different disciplines prepared scholarly texts with either Microsoft Word or LaTeX. The probe texts included simple continuous text, text with tables and subheadings, and complex text with several mathematical equations. We show that LaTeX users were slower than Word users, wrote less text in the same amount of time, and produced more typesetting, orthographical, grammatical, and formatting errors. On most measures, expert LaTeX users performed even worse than novice Word users. LaTeX users, however, more often report enjoying using their respective software. We conclude that even experienced LaTeX users may suffer a loss in productivity when LaTeX is used, relative to other document preparation systems. Individuals, institutions, and journals should carefully consider the ramifications of this finding when choosing document preparation strategies, or requiring them of authors. PMID:25526083
Multioriented and curved text lines extraction from Indian documents.
Pal, U; Roy, Partha Pratim
2004-08-01
There are printed artistic documents where text lines of a single page may not be parallel to each other. These text lines may have different orientations or the text lines may be curved shapes. For the optical character recognition (OCR) of these documents, we need to extract such lines properly. In this paper, we propose a novel scheme, mainly based on the concept of water reservoir analogy, to extract individual text lines from printed Indian documents containing multioriented and/or curve text lines. A reservoir is a metaphor to illustrate the cavity region of a character where water can be stored. In the proposed scheme, at first, connected components are labeled and identified either as isolated or touching. Next, each touching component is classified either straight type (S-type) or curve type (C-type), depending on the reservoir base-area and envelope points of the component. Based on the type (S-type or C-type) of a component two candidate points are computed from each touching component. Finally, candidate regions (neighborhoods of the candidate points) of the candidate points of each component are detected and after analyzing these candidate regions, components are grouped to get individual text lines.
Selecting a Clinical Intervention Documentation System for an Academic Setting
Andrus, Miranda; Hester, E. Kelly; Byrd, Debbie C.
2011-01-01
Pharmacists' clinical interventions have been the subject of a substantial body of literature that focuses on the process and outcomes of establishing an intervention documentation program within the acute care setting. Few reports describe intervention documentation as a component of doctor of pharmacy (PharmD) programs; none describe the process of selecting an intervention documentation application to support the complete array of pharmacy practice and experiential sites. The process that a school of pharmacy followed to select and implement a school-wide intervention system to document the clinical and financial impact of an experiential program is described. Goals included finding a tool that allowed documentation from all experiential sites and the ability to assign dollar savings (hard and soft) to all documented interventions. The paper provides guidance for other colleges and schools of pharmacy in selecting a clinical intervention documentation system for program-wide use. PMID:21519426
Selecting a clinical intervention documentation system for an academic setting.
Fox, Brent I; Andrus, Miranda; Hester, E Kelly; Byrd, Debbie C
2011-03-10
Pharmacists' clinical interventions have been the subject of a substantial body of literature that focuses on the process and outcomes of establishing an intervention documentation program within the acute care setting. Few reports describe intervention documentation as a component of doctor of pharmacy (PharmD) programs; none describe the process of selecting an intervention documentation application to support the complete array of pharmacy practice and experiential sites. The process that a school of pharmacy followed to select and implement a school-wide intervention system to document the clinical and financial impact of an experiential program is described. Goals included finding a tool that allowed documentation from all experiential sites and the ability to assign dollar savings (hard and soft) to all documented interventions. The paper provides guidance for other colleges and schools of pharmacy in selecting a clinical intervention documentation system for program-wide use.
Gupta, Dilip; Saul, Melissa; Gilbertson, John
2004-02-01
We evaluated a comprehensive deidentification engine at the University of Pittsburgh Medical Center (UPMC), Pittsburgh, PA, that uses a complex set of rules, dictionaries, pattern-matching algorithms, and the Unified Medical Language System to identify and replace identifying text in clinical reports while preserving medical information for sharing in research. In our initial data set of 967 surgical pathology reports, the software did not suppress outside (103), UPMC (47), and non-UPMC (56) accession numbers; dates (7); names (9) or initials (25) of case pathologists; or hospital or laboratory names (46). In 150 reports, some clinical information was suppressed inadvertently (overmarking). The engine retained eponymic patient names, eg, Barrett and Gleason. In the second evaluation (1,000 reports), the software did not suppress outside (90) or UPMC (6) accession numbers or names (4) or initials (2) of case pathologists. In the third evaluation, the software removed names of patients, hospitals (297/300), pathologists (297/300), transcriptionists, residents and physicians, dates of procedures, and accession numbers (298/300). By the end of the evaluation, the system was reliably and specifically removing safe-harbor identifiers and producing highly readable deidentified text without removing important clinical information. Collaboration between pathology domain experts and system developers and continuous quality assurance are needed to optimize ongoing deidentification processes.
Redd, Andrew M; Gundlapalli, Adi V; Divita, Guy; Carter, Marjorie E; Tran, Le-Thuy; Samore, Matthew H
2017-07-01
Templates in text notes pose challenges for automated information extraction algorithms. We propose a method that identifies novel templates in plain text medical notes. The identification can then be used to either include or exclude templates when processing notes for information extraction. The two-module method is based on the framework of information foraging and addresses the hypothesis that documents containing templates and the templates within those documents can be identified by common features. The first module takes documents from the corpus and groups those with common templates. This is accomplished through a binned word count hierarchical clustering algorithm. The second module extracts the templates. It uses the groupings and performs a longest common subsequence (LCS) algorithm to obtain the constituent parts of the templates. The method was developed and tested on a random document corpus of 750 notes derived from a large database of US Department of Veterans Affairs (VA) electronic medical notes. The grouping module, using hierarchical clustering, identified 23 groups with 3 documents or more, consisting of 120 documents from the 750 documents in our test corpus. Of these, 18 groups had at least one common template that was present in all documents in the group for a positive predictive value of 78%. The LCS extraction module performed with 100% positive predictive value, 94% sensitivity, and 83% negative predictive value. The human review determined that in 4 groups the template covered the entire document, with the remaining 14 groups containing a common section template. Among documents with templates, the number of templates per document ranged from 1 to 14. The mean and median number of templates per group was 5.9 and 5, respectively. The grouping method was successful in finding like documents containing templates. Of the groups of documents containing templates, the LCS module was successful in deciphering text belonging to the template and text that was extraneous. Major obstacles to improved performance included documents composed of multiple templates, templates that included other templates embedded within them, and variants of templates. We demonstrate proof of concept of the grouping and extraction method of identifying templates in electronic medical records in this pilot study and propose methods to improve performance and scaling up. Published by Elsevier Inc.
Scalable ranked retrieval using document images
NASA Astrophysics Data System (ADS)
Jain, Rajiv; Oard, Douglas W.; Doermann, David
2013-12-01
Despite the explosion of text on the Internet, hard copy documents that have been scanned as images still play a significant role for some tasks. The best method to perform ranked retrieval on a large corpus of document images, however, remains an open research question. The most common approach has been to perform text retrieval using terms generated by optical character recognition. This paper, by contrast, examines whether a scalable segmentation-free image retrieval algorithm, which matches sub-images containing text or graphical objects, can provide additional benefit in satisfying a user's information needs on a large, real world dataset. Results on 7 million scanned pages from the CDIP v1.0 test collection show that content based image retrieval finds a substantial number of documents that text retrieval misses, and that when used as a basis for relevance feedback can yield improvements in retrieval effectiveness.
Global and Local Features Based Classification for Bleed-Through Removal
NASA Astrophysics Data System (ADS)
Hu, Xiangyu; Lin, Hui; Li, Shutao; Sun, Bin
2016-12-01
The text on one side of historical documents often seeps through and appears on the other side, so the bleed-through is a common problem in historical document images. It makes the document images hard to read and the text difficult to recognize. To improve the image quality and readability, the bleed-through has to be removed. This paper proposes a global and local features extraction based bleed-through removal method. The Gaussian mixture model is used to get the global features of the images. Local features are extracted by the patch around each pixel. Then, the extreme learning machine classifier is utilized to classify the scanned images into the foreground text and the bleed-through component. Experimental results on real document image datasets show that the proposed method outperforms the state-of-the-art bleed-through removal methods and preserves the text strokes well.
Fuzzy Document Clustering Approach using WordNet Lexical Categories
NASA Astrophysics Data System (ADS)
Gharib, Tarek F.; Fouad, Mohammed M.; Aref, Mostafa M.
Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. This area is growing rapidly mainly because of the strong need for analysing the huge and large amount of textual data that reside on internal file systems and the Web. Text document clustering provides an effective navigation mechanism to organize this large amount of data by grouping their documents into a small number of meaningful classes. In this paper we proposed a fuzzy text document clustering approach using WordNet lexical categories and Fuzzy c-Means algorithm. Some experiments are performed to compare efficiency of the proposed approach with the recently reported approaches. Experimental results show that Fuzzy clustering leads to great performance results. Fuzzy c-means algorithm overcomes other classical clustering algorithms like k-means and bisecting k-means in both clustering quality and running time efficiency.
Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol
2007-11-27
A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries.
Yoo, Illhoi; Hu, Xiaohua; Song, Il-Yeol
2007-01-01
Background A huge amount of biomedical textual information has been produced and collected in MEDLINE for decades. In order to easily utilize biomedical information in the free text, document clustering and text summarization together are used as a solution for text information overload problem. In this paper, we introduce a coherent graph-based semantic clustering and summarization approach for biomedical literature. Results Our extensive experimental results show the approach shows 45% cluster quality improvement and 72% clustering reliability improvement, in terms of misclassification index, over Bisecting K-means as a leading document clustering approach. In addition, our approach provides concise but rich text summary in key concepts and sentences. Conclusion Our coherent biomedical literature clustering and summarization approach that takes advantage of ontology-enriched graphical representations significantly improves the quality of document clusters and understandability of documents through summaries. PMID:18047705
Text Generation: The State of the Art and the Literature.
ERIC Educational Resources Information Center
Mann, William C.; And Others
This report comprises two documents which describe the state of the art of computer generation of natural language text. Both were prepared by a panel of individuals who are active in research on text generation. The first document assesses the techniques now available for use in systems design, covering all of the technical methods by which…
Embedding the shapes of regions of interest into a Clinical Document Architecture document.
Minh, Nguyen Hai; Yi, Byoung-Kee; Kim, Il Kon; Song, Joon Hyun; Binh, Pham Viet
2015-03-01
Sharing a medical image visually annotated by a region of interest with a remotely located specialist for consultation is a good practice. It may, however, require a special-purpose (and most likely expensive) system to send and view them, which is an unfeasible solution in developing countries such as Vietnam. In this study, we design and implement interoperable methods based on the HL7 Clinical Document Architecture and the eXtensible Markup Language Stylesheet Language for Transformation standards to seamlessly exchange and visually present the shapes of regions of interest using web browsers. We also propose a new integration architecture for a Clinical Document Architecture generator that enables embedding of regions of interest and simultaneous auto-generation of corresponding style sheets. Using the Clinical Document Architecture document and style sheet, a sender can transmit clinical documents and medical images together with coordinate values of regions of interest to recipients. Recipients can easily view the documents and display embedded regions of interest by rendering them in their web browser of choice. © The Author(s) 2014.
DOCU-TEXT: A tool before the data dictionary
NASA Technical Reports Server (NTRS)
Carter, B.
1983-01-01
DOCU-TEXT, a proprietary software package that aids in the production of documentation for a data processing organization and can be installed and operated only on IBM computers is discussed. In organizing information that ultimately will reside in a data dictionary, DOCU-TEXT proved to be a useful documentation tool in extracting information from existing production jobs, procedure libraries, system catalogs, control data sets and related files. DOCU-TEXT reads these files to derive data that is useful at the system level. The output of DOCU-TEXT is a series of user selectable reports. These reports can reflect the interactions within a single job stream, a complete system, or all the systems in an installation. Any single report, or group of reports, can be generated in an independent documentation pass.
Data mining of text as a tool in authorship attribution
NASA Astrophysics Data System (ADS)
Visa, Ari J. E.; Toivonen, Jarmo; Autio, Sami; Maekinen, Jarno; Back, Barbro; Vanharanta, Hannu
2001-03-01
It is common that text documents are characterized and classified by keywords that the authors use to give them. Visa et al. have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the document database of the monitored document flow. The new methodology is capable of extracting the meaning of the document in a certain degree. Our claim is that the new methodology is also capable of authenticating the authorship. To verify this claim two tests were designed. The test hypothesis was that the words and the word order in the sentences could authenticate the author. In the first test three authors were selected. The selected authors were William Shakespeare, Edgar Allan Poe, and George Bernard Shaw. Three texts from each author were examined. Every text was one by one used as a prototype. The two nearest matches with the prototype were noted. The second test uses the Reuters-21578 financial news database. A group of 25 short financial news reports from five different authors are examined. Our new methodology and the interesting results from the two tests are reported in this paper. In the first test, for Shakespeare and for Poe all cases were successful. For Shaw one text was confused with Poe. In the second test the Reuters-21578 financial news were identified by the author relatively well. The resolution is that our text mining methodology seems to be capable of authorship attribution.
Roadmap to a Comprehensive Clinical Data Warehouse for Precision Medicine Applications in Oncology
Foran, David J; Chen, Wenjin; Chu, Huiqi; Sadimin, Evita; Loh, Doreen; Riedlinger, Gregory; Goodell, Lauri A; Ganesan, Shridar; Hirshfield, Kim; Rodriguez, Lorna; DiPaola, Robert S
2017-01-01
Leading institutions throughout the country have established Precision Medicine programs to support personalized treatment of patients. A cornerstone for these programs is the establishment of enterprise-wide Clinical Data Warehouses. Working shoulder-to-shoulder, a team of physicians, systems biologists, engineers, and scientists at Rutgers Cancer Institute of New Jersey have designed, developed, and implemented the Warehouse with information originating from data sources, including Electronic Medical Records, Clinical Trial Management Systems, Tumor Registries, Biospecimen Repositories, Radiology and Pathology archives, and Next Generation Sequencing services. Innovative solutions were implemented to detect and extract unstructured clinical information that was embedded in paper/text documents, including synoptic pathology reports. Supporting important precision medicine use cases, the growing Warehouse enables physicians to systematically mine and review the molecular, genomic, image-based, and correlated clinical information of patient tumors individually or as part of large cohorts to identify changes and patterns that may influence treatment decisions and potential outcomes. PMID:28469389
Jahn, Michelle A; Porter, Brian W; Patel, Himalaya; Zillich, Alan J; Simon, Steven R; Russ, Alissa L
2018-04-01
Web-based patient portals feature secure messaging systems that enable health care providers and patients to communicate information. However, little is known about the usability of these systems for clinical document sharing. This article evaluates the usability of a secure messaging system for providers and patients in terms of its ability to support sharing of electronic clinical documents. We conducted usability testing with providers and patients in a human-computer interaction laboratory at a Midwestern U.S. hospital. Providers sent a medication list document to a fictitious patient via secure messaging. Separately, patients retrieved the clinical document from a secure message and returned it to a fictitious provider. We collected use errors, task completion, task time, and satisfaction. Twenty-nine individuals participated: 19 providers (6 physicians, 6 registered nurses, and 7 pharmacists) and 10 patients. Among providers, 11 (58%) attached and sent the clinical document via secure messaging without requiring assistance, in a median (range) of 4.5 (1.8-12.7) minutes. No patients completed tasks without moderator assistance. Patients accessed the secure messaging system within 3.6 (1.2-15.0) minutes; retrieved the clinical document within 0.8 (0.5-5.7) minutes; and sent the attached clinical document in 6.3 (1.5-18.1) minutes. Although median satisfaction ratings were high, with 5.8 for providers and 6.0 for patients (scale, 0-7), we identified 36 different use errors. Physicians and pharmacists requested additional features to support care coordination via health information technology, while nurses requested features to support efficiency for their tasks. This study examined the usability of clinical document sharing, a key feature of many secure messaging systems. Our results highlight similarities and differences between provider and patient end-user groups, which can inform secure messaging design to improve learnability and efficiency. The observations suggest recommendations for improving the technical aspects of secure messaging for clinical document sharing. Schattauer GmbH Stuttgart.
Developing a system to track meaningful outcome measures in head and neck cancer treatment.
Walters, Ronald S; Albright, Heidi W; Weber, Randal S; Feeley, Thomas W; Hanna, Ehab Y; Cantor, Scott B; Lewis, Carol M; Burke, Thomas W
2014-02-01
The health care industry, including consumers, providers, and payers of health care, recognize the importance of developing meaningful, patient-centered measures. This article describes our experience using an existing electronic medical record largely based on free text formats without structured documentation, in conjunction with tumor registry abstraction techniques, to obtain and analyze data for use in clinical improvement and public reporting. We performed a retrospective analysis of 2467 previously untreated patients treated with curative intent who presented with laryngeal, pharyngeal, or oral cavity cancer in order to develop a system to monitor and report meaningful outcome metrics of head and neck cancer treatment. Patients treated between 1995 and 2006 were analyzed for the primary outcomes of survival at 1 and 2 years, the ability to speak at 1 year posttreatment, and the ability to swallow at 1 year posttreatment. We encountered significant limitations in clinical documentation because of the lack of standardization of meaningful measures, as well limitations with data abstraction using a retrospective approach to reporting measures. Almost 5000 person-hours were required for data abstraction, quality review, and reporting, at a cost of approximately $134,000. Our multidisciplinary teams document extensive patient information; however, data is not stored in easily accessible formats for measurement, comparison, and reporting. We recommend identifying measures meaningful to patients, providers, and payers to be documented throughout the patients' entire treatment cycle, and significant investment in the improvements to electronic medical records and tumor registry reporting in order to provide meaningful quality measures for the future. Copyright © 2013 Wiley Periodicals, Inc.
The role, responsibilities and status of the clinical medical physicist in AFOMP.
Ng, K H; Cheung, K Y; Hu, Y M; Inamura, K; Kim, H J; Krisanachinda, A; Leung, J; Pradhan, A S; Round, H; van Doomo, T; Wong, T J; Yi, B Y
2009-12-01
This document is the first of a series of policy statements being issued by the Asia-Oceania Federation of Organizations for Medical Physics (AFOMP). The document was developed by the AFOMP Professional Development Committee (PDC) and was endorsed for official release by AFOMP Council in 2006. The main purpose of the document was to give guidance to AFOMP member organizations on the role and responsibilities of clinical medical physicists. A definition of clinical medical physicist has also been provided. This document discusses the following topics: professional aspects of education and training; responsibilities of the clinical medical physicist; status and organization of the clinical medical physics service and the need for clinical medical physics service.
Görg, Carsten; Liu, Zhicheng; Kihm, Jaeyeon; Choo, Jaegul; Park, Haesun; Stasko, John
2013-10-01
Investigators across many disciplines and organizations must sift through large collections of text documents to understand and piece together information. Whether they are fighting crime, curing diseases, deciding what car to buy, or researching a new field, inevitably investigators will encounter text documents. Taking a visual analytics approach, we integrate multiple text analysis algorithms with a suite of interactive visualizations to provide a flexible and powerful environment that allows analysts to explore collections of documents while sensemaking. Our particular focus is on the process of integrating automated analyses with interactive visualizations in a smooth and fluid manner. We illustrate this integration through two example scenarios: an academic researcher examining InfoVis and VAST conference papers and a consumer exploring car reviews while pondering a purchase decision. Finally, we provide lessons learned toward the design and implementation of visual analytics systems for document exploration and understanding.
Tank waste remediation system functions and requirements document
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carpenter, K.E
1996-10-03
This is the Tank Waste Remediation System (TWRS) Functions and Requirements Document derived from the TWRS Technical Baseline. The document consists of several text sections that provide the purpose, scope, background information, and an explanation of how this document assists the application of Systems Engineering to the TWRS. The primary functions identified in the TWRS Functions and Requirements Document are identified in Figure 4.1 (Section 4.0) Currently, this document is part of the overall effort to develop the TWRS Functional Requirements Baseline, and contains the functions and requirements needed to properly define the top three TWRS function levels. TWRS Technicalmore » Baseline information (RDD-100 database) included in the appendices of the attached document contain the TWRS functions, requirements, and architecture necessary to define the TWRS Functional Requirements Baseline. Document organization and user directions are provided in the introductory text. This document will continue to be modified during the TWRS life-cycle.« less
Nephrology in the Lancisi Medical Dictionary (1672-1720).
Gazzaniga, Valentina; Marinozzi, Silvia
2006-01-01
Giovanni Maria Lancisi (1654-1720) shows a particular interest in urological and nephrological diseases, especially evident in a course of lectures held at Studium Urbis in 1696-97, which reflected his vast knowledge and familiarity with various important texts devoted to urology and nephrology. This interest is further documented in commentaries on articles on nephrological diseases in his Repertorium medicum (a sort of medical dictionary written between 1672 and his death). Lancisi's quoting medical authorities clarifies the clinical answers he gave in some of his unpublished Consulti concerning nephrological pathologies.
NASA Astrophysics Data System (ADS)
Fume, Kosei; Ishitani, Yasuto
2008-01-01
We propose a document categorization method based on a document model that can be defined externally for each task and that categorizes Web content or business documents into a target category in accordance with the similarity of the model. The main feature of the proposed method consists of two aspects of semantics extraction from an input document. The semantics of terms are extracted by the semantic pattern analysis and implicit meanings of document substructure are specified by a bottom-up text clustering technique focusing on the similarity of text line attributes. We have constructed a system based on the proposed method for trial purposes. The experimental results show that the system achieves more than 80% classification accuracy in categorizing Web content and business documents into 15 or 70 categories.
39 CFR 3001.10 - Form and number of copies of documents.
Code of Federal Regulations, 2010 CFR
2010-07-01
... service must be printed from a text-based pdf version of the document, where possible. Otherwise, they may... generated in either Acrobat (pdf), Word, or WordPerfect, or Rich Text Format (rtf). [67 FR 67559, Nov. 6...
Use of speech-to-text technology for documentation by healthcare providers.
Ajami, Sima
2016-01-01
Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Trivedi, Hari; Mesterhazy, Joseph; Laguna, Benjamin; Vu, Thienkhai; Sohn, Jae Ho
2018-04-01
Magnetic resonance imaging (MRI) protocoling can be time- and resource-intensive, and protocols can often be suboptimal dependent upon the expertise or preferences of the protocoling radiologist. Providing a best-practice recommendation for an MRI protocol has the potential to improve efficiency and decrease the likelihood of a suboptimal or erroneous study. The goal of this study was to develop and validate a machine learning-based natural language classifier that can automatically assign the use of intravenous contrast for musculoskeletal MRI protocols based upon the free-text clinical indication of the study, thereby improving efficiency of the protocoling radiologist and potentially decreasing errors. We utilized a deep learning-based natural language classification system from IBM Watson, a question-answering supercomputer that gained fame after challenging the best human players on Jeopardy! in 2011. We compared this solution to a series of traditional machine learning-based natural language processing techniques that utilize a term-document frequency matrix. Each classifier was trained with 1240 MRI protocols plus their respective clinical indications and validated with a test set of 280. Ground truth of contrast assignment was obtained from the clinical record. For evaluation of inter-reader agreement, a blinded second reader radiologist analyzed all cases and determined contrast assignment based on only the free-text clinical indication. In the test set, Watson demonstrated overall accuracy of 83.2% when compared to the original protocol. This was similar to the overall accuracy of 80.2% achieved by an ensemble of eight traditional machine learning algorithms based on a term-document matrix. When compared to the second reader's contrast assignment, Watson achieved 88.6% agreement. When evaluating only the subset of cases where the original protocol and second reader were concordant (n = 251), agreement climbed further to 90.0%. The classifier was relatively robust to spelling and grammatical errors, which were frequent. Implementation of this automated MR contrast determination system as a clinical decision support tool may save considerable time and effort of the radiologist while potentially decreasing error rates, and require no change in order entry or workflow.
Wilbanks, Bryan A; Geisz-Everson, Marjorie; Boust, Rebecca R
2016-09-01
Clinical documentation is a critical tool in supporting care provided to patients. Sound documentation provides a picture of clinical events that can be used to improve patient care. However, many other uses for clinical documentation are equally important. Such documentation informs clinical decision support tools, creates a legal record of patient care, assists in financial reimbursement of services, and serves as a repository for secondary data analysis. Conversely, poor documentation can impair patient safety and increase malpractice risk exposure by reflecting poor or inaccurate information that ultimately may guide patient care decisions.Through an examination of anesthesia-related closed claims, a descriptive qualitative study emerged, which explored the antecedents and consequences of documentation quality in the claims reviewed. A secondary data analysis utilized a database generated by the American Association of Nurse Anesthetists Foundation closed claim review team. Four major themes emerged from the analysis. Themes 1, 2, and 4 primarily describe how poor documentation quality can have negative consequences for clinicians. The third theme primarily describes how poor documentation quality that can negatively affect patient safety.
A Qualitative Analysis Evaluating The Purposes And Practices Of Clinical Documentation
Ho, Y.-X.; Gadd, C. S.; Kohorst, K.L.; Rosenbloom, S.T.
2014-01-01
Summary Objectives An important challenge for biomedical informatics researchers is determining the best approach for healthcare providers to use when generating clinical notes in settings where electronic health record (EHR) systems are used. The goal of this qualitative study was to explore healthcare providers’ and administrators’ perceptions about the purpose of clinical documentation and their own documentation practices. Methods We conducted seven focus groups with a total of 46 subjects composed of healthcare providers and administrators to collect knowledge, perceptions and beliefs about documentation from those who generate and review notes, respectively. Data were analyzed using inductive analysis to probe and classify impressions collected from focus group subjects. Results We observed that both healthcare providers and administrators believe that documentation serves five primary domains: clinical, administrative, legal, research, education. These purposes are tied closely to the nature of the clinical note as a document shared by multiple stakeholders, which can be a source of tension for all parties who must use the note. Most providers reported using a combination of methods to complete their notes in a timely fashion without compromising patient care. While all administrators reported relying on computer-based documentation tools to review notes, they expressed a desire for a more efficient method of extracting relevant data. Conclusions Although clinical documentation has utility, and is valued highly by its users, the development and successful adoption of a clinical documentation tool largely depends on its ability to be smoothly integrated into the provider’s busy workflow, while allowing the provider to generate a note that communicates effectively and efficiently with multiple stakeholders. PMID:24734130
Annotating Socio-Cultural Structures in Text
2012-10-31
parts of speech (POS) within text, using the Stanford Part of Speech Tagger (Stanford Log-Linear, 2011). The ERDC-CERL taxonomy is then used to...annotated NP/VP Pane: Shows the sentence parsed using the Parts of Speech tagger Document View Pane: Specifies the document (being annotated) in three...first parsed using the Stanford Parts of Speech tagger and converted to an XML document both components which are done through the Import function
Neural networks for data mining electronic text collections
NASA Astrophysics Data System (ADS)
Walker, Nicholas; Truman, Gregory
1997-04-01
The use of neural networks in information retrieval and text analysis has primarily suffered from the issues of adequate document representation, the ability to scale to very large collections, dynamism in the face of new information and the practical difficulties of basing the design on the use of supervised training sets. Perhaps the most important approach to begin solving these problems is the use of `intermediate entities' which reduce the dimensionality of document representations and the size of documents collections to manageable levels coupled with the use of unsupervised neural network paradigms. This paper describes the issues, a fully configured neural network-based text analysis system--dataHARVEST--aimed at data mining text collections which begins this process, along with the remaining difficulties and potential ways forward.
Assessing the Readability of Medical Documents: A Ranking Approach.
Zheng, Jiaping; Yu, Hong
2018-03-23
The use of electronic health record (EHR) systems with patient engagement capabilities, including viewing, downloading, and transmitting health information, has recently grown tremendously. However, using these resources to engage patients in managing their own health remains challenging due to the complex and technical nature of the EHR narratives. Our objective was to develop a machine learning-based system to assess readability levels of complex documents such as EHR notes. We collected difficulty ratings of EHR notes and Wikipedia articles using crowdsourcing from 90 readers. We built a supervised model to assess readability based on relative orders of text difficulty using both surface text features and word embeddings. We evaluated system performance using the Kendall coefficient of concordance against human ratings. Our system achieved significantly higher concordance (.734) with human annotators than did a baseline using the Flesch-Kincaid Grade Level, a widely adopted readability formula (.531). The improvement was also consistent across different disease topics. This method's concordance with an individual human user's ratings was also higher than the concordance between different human annotators (.658). We explored methods to automatically assess the readability levels of clinical narratives. Our ranking-based system using simple textual features and easy-to-learn word embeddings outperformed a widely used readability formula. Our ranking-based method can predict relative difficulties of medical documents. It is not constrained to a predefined set of readability levels, a common design in many machine learning-based systems. Furthermore, the feature set does not rely on complex processing of the documents. One potential application of our readability ranking is personalization, allowing patients to better accommodate their own background knowledge. ©Jiaping Zheng, Hong Yu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 23.03.2018.
Text mining for the biocuration workflow
Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.
2012-01-01
Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129
Text mining for the biocuration workflow.
Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G
2012-01-01
Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
Replacement Attack: A New Zero Text Watermarking Attack
NASA Astrophysics Data System (ADS)
Bashardoost, Morteza; Mohd Rahim, Mohd Shafry; Saba, Tanzila; Rehman, Amjad
2017-03-01
The main objective of zero watermarking methods that are suggested for the authentication of textual properties is to increase the fragility of produced watermarks against tampering attacks. On the other hand, zero watermarking attacks intend to alter the contents of document without changing the watermark. In this paper, the Replacement attack is proposed, which focuses on maintaining the location of the words in the document. The proposed text watermarking attack is specifically effective on watermarking approaches that exploit words' transition in the document. The evaluation outcomes prove that tested word-based method are unable to detect the existence of replacement attack in the document. Moreover, the comparison results show that the size of Replacement attack is estimated less accurate than other common types of zero text watermarking attacks.
A study on design and development of enterprise-wide concepts for clinical documentation templates.
Zhou, Li; Gurjar, Rupali; Regier, Rachel; Morgan, Stephen; Meyer, Theresa; Aroy, Teal; Goldman, Debora Scavone; Hongsermeier, Tonya; Middleton, Blackford
2008-11-06
Structured clinical documents are associated with many potential benefits. Underlying terminologies and structure of information are keys to their successful implementation and use. This paper presents a methodology for design and development of enterprise-wide concepts for clinical documentation templates for an ambulatory Electronic Medical Record (EMR) system.
C4ISR Architecture Working Group (AWG), Architecture Framework Version 2.0.
1997-12-18
Vision Name Name/identifier of document that contains doctrine, goals, or vision Type Doctrine, goals, or vision Description Text summary description...e.g., organization, directive, order) Description Text summary of tasking •Rules, Criteria, or Conventions Name Name/identifier of document that...contains rules, criteria, or conventions Type One of: rules, criteria, or conventions Description Text summary description of contents or
Simonaitis, Linas; Belsito, Anne; Warvel, Jeff; Hui, Siu; McDonald, Clement J
2006-01-01
Clinicians at Wishard Hospital in Indianapolis print and carry clinical reports called "Pocket Rounds". This paper describes a new process we developed to improve these clinical reports. The heart of our new process is a World Wide Web Consortium standard: Extensible Stylesheet Language Formatting Objects (XSL-FO). Using XSL-FO stylesheets we generated Portable Document Format (PDF) and PostScript reports with complex formatting: columns, tables, borders, shading, indents, dividing lines. We observed patterns of clinical report printing during a eight month study period on three Medicine wards. Usage statistics indicated that clinicians accepted the new system enthusiastically: 78% of 26,418 reports were printed using the new system. We surveyed 67 clinical users. Respondents gave the new reports a rating of 4.2 (on a 5 point scale); they gave the old reports a rating of 3.4. The primary complaint was that it took longer to print the new reports. We believe that XSL-FO is a promising way to transform text data into functional and attractive clinical reports: relatively easy to implement and modify.
Extracting and connecting chemical structures from text sources using chemicalize.org.
Southan, Christopher; Stracz, Andras
2013-04-23
Exploring bioactive chemistry requires navigating between structures and data from a variety of text-based sources. While PubChem currently includes approximately 16 million document-extracted structures (15 million from patents) the extent of public inter-document and document-to-database links is still well below any estimated total, especially for journal articles. A major expansion in access to text-entombed chemistry is enabled by chemicalize.org. This on-line resource can process IUPAC names, SMILES, InChI strings, CAS numbers and drug names from pasted text, PDFs or URLs to generate structures, calculate properties and launch searches. Here, we explore its utility for answering questions related to chemical structures in documents and where these overlap with database records. These aspects are illustrated using a common theme of Dipeptidyl Peptidase 4 (DPPIV) inhibitors. Full-text open URL sources facilitated the download of over 1400 structures from a DPPIV patent and the alignment of specific examples with IC50 data. Uploading the SMILES to PubChem revealed extensive linking to patents and papers, including prior submissions from chemicalize.org as submitting source. A DPPIV medicinal chemistry paper was completely extracted and structures were aligned to the activity results table, as well as linked to other documents via PubChem. In both cases, key structures with data were partitioned from common chemistry by dividing them into individual new PDFs for conversion. Over 500 structures were also extracted from a batch of PubMed abstracts related to DPPIV inhibition. The drug structures could be stepped through each text occurrence and included some converted MeSH-only IUPAC names not linked in PubChem. Performing set intersections proved effective for detecting compounds-in-common between documents and merged extractions. This work demonstrates the utility of chemicalize.org for the exploration of chemical structure connectivity between documents and databases, including structure searches in PubChem, InChIKey searches in Google and the chemicalize.org archive. It has the flexibility to extract text from any internal, external or Web source. It synergizes with other open tools and the application is undergoing continued development. It should thus facilitate progress in medicinal chemistry, chemical biology and other bioactive chemistry domains.
Automated validation of patient safety clinical incident classification: macro analysis.
Gupta, Jaiprakash; Patrick, Jon
2013-01-01
Patient safety is the buzz word in healthcare. Incident Information Management System (IIMS) is electronic software that stores clinical mishaps narratives in places where patients are treated. It is estimated that in one state alone over one million electronic text documents are available in IIMS. In this paper we investigate the data density available in the fields entered to notify an incident and the validity of the built in classification used by clinician to categories the incidents. Waikato Environment for Knowledge Analysis (WEKA) software was used to test the classes. Four statistical classifier based on J48, Naïve Bayes (NB), Naïve Bayes Multinominal (NBM) and Support Vector Machine using radial basis function (SVM_RBF) algorithms were used to validate the classes. The data pool was 10,000 clinical incidents drawn from 7 hospitals in one state in Australia. In first part of the study 1000 clinical incidents were selected to determine type and number of fields worth investigating and in the second part another 5448 clinical incidents were randomly selected to validate 13 clinical incident types. Result shows 74.6% of the cells were empty and only 23 fields had content over 70% of the time. The percentage correctly classified classes on four algorithms using categorical dataset ranged from 42 to 49%, using free-text datasets from 65% to 77% and using both datasets from 72% to 79%. Kappa statistic ranged from 0.36 to 0.4. for categorical data, from 0.61 to 0.74. for free-text and from 0.67 to 0.77 for both datasets. Similar increases in performance in the 3 experiments was noted on true positive rate, precision, F-measure and area under curve (AUC) of receiver operating characteristics (ROC) scores. The study demonstrates only 14 of 73 fields in IIMS have data that is usable for machine learning experiments. Irrespective of the type of algorithms used when all datasets are used performance was better. Classifier NBM showed best performance. We think the classifier can be improved further by reclassifying the most confused classes and there is scope to apply text mining tool on patient safety classifications.
Contemporary issues in HIM. The application layer--III.
Wear, L L; Pinkert, J R
1993-07-01
We have seen document preparation systems evolve from basic line editors through powerful, sophisticated desktop publishing programs. This component of the application layer is probably one of the most used, and most readily identifiable. Ask grade school children nowadays, and many will tell you that they have written a paper on a computer. Next month will be a "fun" tour through a number of other application programs we find useful. They will range from a simple notebook reminder to a sophisticated photograph processor. Application layer: Software targeted for the end user, focusing on a specific application area, and typically residing in the computer system as distinct components on top of the OS. Desktop publishing: A document preparation program that begins with the text features of a word processor, then adds the ability for a user to incorporate outputs from a variety of graphic programs, spreadsheets, and other applications. Line editor: A document preparation program that manipulates text in a file on the basis of numbered lines. Word processor: A document preparation program that can, among other things, reformat sections of documents, move and replace blocks of text, use multiple character fonts, automatically create a table of contents and index, create complex tables, and combine text and graphics.
Segmentation-driven compound document coding based on H.264/AVC-INTRA.
Zaghetto, Alexandre; de Queiroz, Ricardo L
2007-07-01
In this paper, we explore H.264/AVC operating in intraframe mode to compress a mixed image, i.e., composed of text, graphics, and pictures. Even though mixed contents (compound) documents usually require the use of multiple compressors, we apply a single compressor for both text and pictures. For that, distortion is taken into account differently between text and picture regions. Our approach is to use a segmentation-driven adaptation strategy to change the H.264/AVC quantization parameter on a macroblock by macroblock basis, i.e., we deviate bits from pictorial regions to text in order to keep text edges sharp. We show results of a segmentation driven quantizer adaptation method applied to compress documents. Our reconstructed images have better text sharpness compared to straight unadapted coding, at negligible visual losses on pictorial regions. Our results also highlight the fact that H.264/AVC-INTRA outperforms coders such as JPEG-2000 as a single coder for compound images.
Open Globe Injury Patient Identification in Warfare Clinical Notes1
Apostolova, Emilia; White, Helen A.; Morris, Patty A.; Eliason, David A.; Velez, Tom
2017-01-01
The aim of this study is to utilize the Defense and Veterans Eye Injury and Vision Registry clinical data derived from DoD and VA medical systems which include documentation of care while in combat, and develop methods for comprehensive and reliable Open Globe Injury (OGI) patient identification. In particular, we focus on the use of free-form clinical notes, since structured data, such as diagnoses or procedure codes, as found in early post-trauma clinical records, may not be a comprehensive and reliable indicator of OGIs. The challenges of the task include low incidence rate (few positive examples), idiosyncratic military ophthalmology vocabulary, extreme brevity of notes, specialized abbreviations, typos and misspellings. We modeled the problem as a text classification task and utilized a combination of supervised learning (SVMs) and word embeddings learnt in a unsupervised manner, achieving a precision of 92.50% and a recall of89.83%o. The described techniques are applicable to patient cohort identification with limited training data and low incidence rate. PMID:29854104
ERIC Educational Resources Information Center
Thomas, Georgelle; Fishburne, Robert P.
Part of the Anthropology Curriculum Project, the document contains a programmed text on evolution and a vocabulary pronunciation guide. The unit is intended for use by students in social studies and science courses in the 5th, 6th, and 7th grades. The bulk of the document, the programmed text, is organized in a question answer format. Students are…
NASA Astrophysics Data System (ADS)
Suzuki, Izumi; Mikami, Yoshiki; Ohsato, Ario
A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.
Unapparent Information Revelation: Text Mining for Counterterrorism
NASA Astrophysics Data System (ADS)
Srihari, Rohini K.
Unapparent information revelation (UIR) is a special case of text mining that focuses on detecting possible links between concepts across multiple text documents by generating an evidence trail explaining the connection. A traditional search involving, for example, two or more person names will attempt to find documents mentioning both these individuals. This research focuses on a different interpretation of such a query: what is the best evidence trail across documents that explains a connection between these individuals? For example, all may be good golfers. A generalization of this task involves query terms representing general concepts (e.g. indictment, foreign policy). Previous approaches to this problem have focused on graph mining involving hyperlinked documents, and link analysis exploiting named entities. A new robust framework is presented, based on (i) generating concept chain graphs, a hybrid content representation, (ii) performing graph matching to select candidate subgraphs, and (iii) subsequently using graphical models to validate hypotheses using ranked evidence trails. We adapt the DUC data set for cross-document summarization to evaluate evidence trails generated by this approach
Review and comparison of quality standards, guidelines and regulations for laboratories.
Datema, Tjeerd A M; Oskam, Linda; Klatser, Paul R
2012-01-01
The variety and number of laboratory quality standards, guidelines and regulations (hereafter: quality documents) makes it difficult to choose the most suitable one for establishing and maintaining a laboratory quality management system. There is a need to compare the characteristics, suitability and applicability of quality documents in view of the increasing efforts to introduce quality management in laboratories, especially in clinical diagnostic laboratories in low income and middle income countries. This may provide valuable insights for policy makers developing national laboratory policies, and for laboratory managers and quality officers in choosing the most appropriate quality document for upgrading their laboratories. We reviewed the history of quality document development and then selected a subset based on their current use. We analysed these documents following a framework for comparison of quality documents that was adapted from the Clinical Laboratory Standards Institute guideline GP26 Quality management system model for clinical laboratory services . Differences were identified between national and international, and non-clinical and clinical quality documents. The most salient findings were the absence of provisions on occurrence management and customer service in almost all non-clinical quality documents, a low number of safety requirements aimed at protecting laboratory personnel in international quality documents and no requirements regarding ethical behaviour in almost all quality documents. Each laboratory needs to investigate whether national regulatory standards are present. These are preferred as they most closely suit the needs of laboratories in the country. A laboratory should always use both a standard and a guideline: a standard sums up the requirements to a quality management system, a guideline describes how quality management can be integrated in the laboratory processes.
Real-time text extraction based on the page layout analysis system
NASA Astrophysics Data System (ADS)
Soua, M.; Benchekroun, A.; Kachouri, R.; Akil, M.
2017-05-01
Several approaches were proposed in order to extract text from scanned documents. However, text extraction in heterogeneous documents stills a real challenge. Indeed, text extraction in this context is a difficult task because of the variation of the text due to the differences of sizes, styles and orientations, as well as to the complexity of the document region background. Recently, we have proposed the improved hybrid binarization based on Kmeans method (I-HBK)5 to extract suitably the text from heterogeneous documents. In this method, the Page Layout Analysis (PLA), part of the Tesseract OCR engine, is used to identify text and image regions. Afterwards our hybrid binarization is applied separately on each kind of regions. In one side, gamma correction is employed before to process image regions. In the other side, binarization is performed directly on text regions. Then, a foreground and background color study is performed to correct inverted region colors. Finally, characters are located from the binarized regions based on the PLA algorithm. In this work, we extend the integration of the PLA algorithm within the I-HBK method. In addition, to speed up the separation of text and image step, we employ an efficient GPU acceleration. Through the performed experiments, we demonstrate the high F-measure accuracy of the PLA algorithm reaching 95% on the LRDE dataset. In addition, we illustrate the sequential and the parallel compared PLA versions. The obtained results give a speedup of 3.7x when comparing the parallel PLA implementation on GPU GTX 660 to the CPU version.
Improving Text Recall with Multiple Summaries
ERIC Educational Resources Information Center
van der Meij, Hans; van der Meij, Jan
2012-01-01
Background. QuikScan (QS) is an innovative design that aims to improve accessibility, comprehensibility, and subsequent recall of expository text by means of frequent within-document summaries that are formatted as numbered list items. The numbers in the QS summaries correspond to numbers placed in the body of the document where the summarized…
[Development of an ophthalmological clinical information system for inpatient eye clinics].
Kortüm, K U; Müller, M; Babenko, A; Kampik, A; Kreutzer, T C
2015-12-01
In times of increased digitalization in healthcare, departments of ophthalmology are faced with the challenge of introducing electronic clinical health records (EHR); however, specialized software for ophthalmology is not available with most major EHR sytems. The aim of this project was to create specific ophthalmological user interfaces for large inpatient eye care providers within a hospitalwide EHR. Additionally the integration of ophthalmic imaging systems, scheduling and surgical documentation should be achieved. The existing EHR i.s.h.med (Siemens, Germany) was modified using advanced business application programming (ABAP) language to create specific ophthalmological user interfaces for reproduction and moreover optimization of the clinical workflow. A user interface for documentation of ambulatory patients with eight tabs was designed. From June 2013 to October 2014 a total of 61,551 patient contact details were documented. For surgical documentation a separate user interface was set up. Digital clinical orders for documentation of registration and scheduling of operations user interfaces were also set up. A direct integration of ophthalmic imaging modalities could be established. An ophthalmologist-orientated EHR for outpatient and surgical documentation for inpatient clinics was created and successfully implemented. By incorporation of imaging procedures the foundation of future smart/big data analyses was created.
Analysis of line structure in handwritten documents using the Hough transform
NASA Astrophysics Data System (ADS)
Ball, Gregory R.; Kasiviswanathan, Harish; Srihari, Sargur N.; Narayanan, Aswin
2010-01-01
In the analysis of handwriting in documents a central task is that of determining line structure of the text, e.g., number of text lines, location of their starting and end-points, line-width, etc. While simple methods can handle ideal images, real world documents have complexities such as overlapping line structure, variable line spacing, line skew, document skew, noisy or degraded images etc. This paper explores the application of the Hough transform method to handwritten documents with the goal of automatically determining global document line structure in a top-down manner which can then be used in conjunction with a bottom-up method such as connected component analysis. The performance is significantly better than other top-down methods, such as the projection profile method. In addition, we evaluate the performance of skew analysis by the Hough transform on handwritten documents.
Security and Privacy in a DACS.
Delgado, Jaime; Llorente, Silvia; Pàmies, Martí; Vilalta, Josep
2016-01-01
The management of electronic health records (EHR), in general, and clinical documents, in particular, is becoming a key issue in the daily work of Healthcare Organizations (HO). The need for providing secure and private access to, and storage for, clinical documents together with the need for HO to interoperate, raises a number of issues difficult to solve. Many systems are in place to manage EHR and documents. Some of these Healthcare Information Systems (HIS) follow standards in their document structure and communications protocols, but many do not. In fact, they are mostly proprietary and do not interoperate. Our proposal to solve the current situation is the use of a DACS (Document Archiving and Communication System) for providing security, privacy and standardized access to clinical documents.
An electronic regulatory document management system for a clinical trial network.
Zhao, Wenle; Durkalski, Valerie; Pauls, Keith; Dillon, Catherine; Kim, Jaemyung; Kolk, Deneil; Silbergleit, Robert; Stevenson, Valerie; Palesch, Yuko
2010-01-01
A computerized regulatory document management system has been developed as a module in a comprehensive Clinical Trial Management System (CTMS) designed for an NIH-funded clinical trial network in order to more efficiently manage and track regulatory compliance. Within the network, several institutions and investigators are involved in multiple trials, and each trial has regulatory document requirements. Some of these documents are trial specific while others apply across multiple trials. The latter causes a possible redundancy in document collection and management. To address these and other related challenges, a central regulatory document management system was designed. This manuscript shares the design of the system as well as examples of it use in current studies. Copyright (c) 2009 Elsevier Inc. All rights reserved.
Multimedia Health Records: user-centered design approach for a multimedia uploading service.
Plazzotta, Fernando; Mayan, John C; Storani, Fernando D; Ortiz, Juan M; Lopez, Gastón E; Gimenez, Gastón M; Luna, Daniel R
2015-01-01
Multimedia elements add value to text documents by transmitting information difficult to express in words. In healthcare, many professional and services keep this elements in their own repositories. This brings the problem of information fragmentation in different silos which hinder its access to other healthcare professionals. On the other hand patients have clinical data of their own in different formats generated in different healthcare organizations which is not accessible to professionals within our healthcare network. This paper describes the design, development and implementation processes of a service which allows media elements to be loaded in a patient clinical data repository (CDR) either through an electronic health record by professionals (EHR) or through a personal health record (PHR) by patients, in order to avoid fragmentation of the information.
Document image cleanup and binarization
NASA Astrophysics Data System (ADS)
Wu, Victor; Manmatha, Raghaven
1998-04-01
Image binarization is a difficult task for documents with text over textured or shaded backgrounds, poor contrast, and/or considerable noise. Current optical character recognition (OCR) and document analysis technology do not handle such documents well. We have developed a simple yet effective algorithm for document image clean-up and binarization. The algorithm consists of two basic steps. In the first step, the input image is smoothed using a low-pass filter. The smoothing operation enhances the text relative to any background texture. This is because background texture normally has higher frequency than text does. The smoothing operation also removes speckle noise. In the second step, the intensity histogram of the smoothed image is computed and a threshold automatically selected as follows. For black text, the first peak of the histogram corresponds to text. Thresholding the image at the value of the valley between the first and second peaks of the histogram binarizes the image well. In order to reliably identify the valley, the histogram is smoothed by a low-pass filter before the threshold is computed. The algorithm has been applied to some 50 images from a wide variety of source: digitized video frames, photos, newspapers, advertisements in magazines or sales flyers, personal checks, etc. There are 21820 characters and 4406 words in these images. 91 percent of the characters and 86 percent of the words are successfully cleaned up and binarized. A commercial OCR was applied to the binarized text when it consisted of fonts which were OCR recognizable. The recognition rate was 84 percent for the characters and 77 percent for the words.
Leveraging Text Content for Management of Construction Project Documents
ERIC Educational Resources Information Center
Alqady, Mohammed
2012-01-01
The construction industry is a knowledge intensive industry. Thousands of documents are generated by construction projects. Documents, as information carriers, must be managed effectively to ensure successful project management. The fact that a single project can produce thousands of documents and that a lot of the documents are generated in a…
Ontology modularization to improve semantic medical image annotation.
Wennerberg, Pinar; Schulz, Klaus; Buitelaar, Paul
2011-02-01
Searching for medical images and patient reports is a significant challenge in a clinical setting. The contents of such documents are often not described in sufficient detail thus making it difficult to utilize the inherent wealth of information contained within them. Semantic image annotation addresses this problem by describing the contents of images and reports using medical ontologies. Medical images and patient reports are then linked to each other through common annotations. Subsequently, search algorithms can more effectively find related sets of documents on the basis of these semantic descriptions. A prerequisite to realizing such a semantic search engine is that the data contained within should have been previously annotated with concepts from medical ontologies. One major challenge in this regard is the size and complexity of medical ontologies as annotation sources. Manual annotation is particularly time consuming labor intensive in a clinical environment. In this article we propose an approach to reducing the size of clinical ontologies for more efficient manual image and text annotation. More precisely, our goal is to identify smaller fragments of a large anatomy ontology that are relevant for annotating medical images from patients suffering from lymphoma. Our work is in the area of ontology modularization, which is a recent and active field of research. We describe our approach, methods and data set in detail and we discuss our results. Copyright © 2010 Elsevier Inc. All rights reserved.
The Islamic State Battle Plan: Press Release Natural Language Processing
2016-06-01
Processing, text mining , corpus, generalized linear model, cascade, R Shiny, leaflet, data visualization 15. NUMBER OF PAGES 83 16. PRICE CODE...Terrorism and Responses to Terrorism TDM Term Document Matrix TF Term Frequency TF-IDF Term Frequency-Inverse Document Frequency tm text mining (R...package=leaflet. Feinerer I, Hornik K (2015) Text Mining Package “tm,” Version 0.6-2. (Jul 3) https://cran.r-project.org/web/packages/tm/tm.pdf
A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification
2016-07-01
financial predictions, etc. and is finding growing use in text mining studies. In this paper, we present an efficient algorithm for classification of high...video data, set of images, hyperspectral data, medical data, text data, etc. Moreover, the framework provides a way to analyze data whose different...also be incorporated. For text classification, one can use tfidf (term frequency inverse document frequency) to form feature vectors for each document
ERIC Educational Resources Information Center
Stahl, Steven A.; And Others
To examine the effects of students reading multiple documents on their perceptions of a historical event, in this case the "discovery" of America by Christopher Columbus, 85 high school freshmen read 3 of 4 different texts (or sets of texts) dealing with Columbus. One text was an encyclopedia article, one a set of articles from…
M68000 RNF text formatter user's manual
NASA Technical Reports Server (NTRS)
Will, R. W.; Grantham, C.
1985-01-01
A powerful, flexible text formatting program, RNF, is described. It is designed to automate many of the tedious elements of typing, including breaking a document into pages with titles and page numbers, formatting chapter and section headings, keeping track of page numbers for use in a table of contents, justifying lines by inserting blanks to give an even right margin, and inserting figures and footnotes at appropriate places on the page. The RNF program greatly facilitates both preparing and modifying a document because it allows you to concentrate your efforts on the content of the document instead of its appearance and because it removes the necessity of retyping text that has not changed.
Tilburt, Jon C; Koller, Kathryn; Tiesinga, James J; Wilson, Robin T; Trinh, Anne C; Hill, Kristin; Hall, Ingrid J; Smith, Judith Lee; Ekwueme, Donatus U; Petersen, Wesley O
2013-11-01
To assess clinical treatment patterns and response times among American Indian/Alaska Native men with a newly elevated PSA. We retrospectively identified men ages 50-80 receiving care in one of three tribally-operated clinics in Northern Minnesota, one medical center in Alaska, and who had an incident PSA elevation (> 4 ng/ml) in a specified time period. A clinical response was considered timely if it was documented as occurring within 90 days of the incident PSA elevation. Among 82 AI/AN men identified from medical records with an incident PSA elevation, 49 (60%) received a timely clinical response, while 18 (22%) had no documented clinical response. One in five AI/AN men in our study had no documented clinical action following an incident PSA elevation. Although a pilot study, these findings suggest the need to improve the documentation, notification, and care following an elevated PSA at clinics serving AI/AN men.
Schwartz, Jennifer A T; Pearson, Steven D
2013-06-24
Despite increasing concerns regarding the cost of health care, the consideration of costs in the development of clinical guidance documents by physician specialty societies has received little analysis. To evaluate the approach to consideration of cost in publicly available clinical guidance documents and methodological statements produced between 2008 and 2012 by the 30 largest US physician specialty societies. Qualitative document review. Whether costs are considered in clinical guidance development, mechanism of cost consideration, and the way that cost issues were used in support of specific clinical practice recommendations. Methodological statements for clinical guidance documents indicated that 17 of 30 physician societies (57%) explicitly integrated costs, 4 (13%) implicitly considered costs, 3 (10%) intentionally excluded costs, and 6 (20%) made no mention. Of the 17 societies that explicitly integrated costs, 9 (53%) consistently used a formal system in which the strength of recommendation was influenced in part by costs, whereas 8 (47%) were inconsistent in their approach or failed to mention the exact mechanism for considering costs. Among the 138 specific recommendations in these guidance documents that included cost as part of the rationale, the most common form of recommendation (50 [36%]) encouraged the use of a specific medical service because of equal effectiveness and lower cost. Slightly more than half of the largest US physician societies explicitly consider costs in developing their clinical guidance documents; among these, approximately half use an explicit mechanism for integrating costs into the strength of recommendations. Many societies remain vague in their approach. Physician specialty societies should demonstrate greater transparency and rigor in their approach to cost consideration in documents meant to influence care decisions.
Lee, Ken Ka-Yin; Tang, Wai-Choi; Choi, Kup-Sze
2013-04-01
Clinical data are dynamic in nature, often arranged hierarchically and stored as free text and numbers. Effective management of clinical data and the transformation of the data into structured format for data analysis are therefore challenging issues in electronic health records development. Despite the popularity of relational databases, the scalability of the NoSQL database model and the document-centric data structure of XML databases appear to be promising features for effective clinical data management. In this paper, three database approaches--NoSQL, XML-enabled and native XML--are investigated to evaluate their suitability for structured clinical data. The database query performance is reported, together with our experience in the databases development. The results show that NoSQL database is the best choice for query speed, whereas XML databases are advantageous in terms of scalability, flexibility and extensibility, which are essential to cope with the characteristics of clinical data. While NoSQL and XML technologies are relatively new compared to the conventional relational database, both of them demonstrate potential to become a key database technology for clinical data management as the technology further advances. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Inferring Group Processes from Computer-Mediated Affective Text Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schryver, Jack C; Begoli, Edmon; Jose, Ajith
2011-02-01
Political communications in the form of unstructured text convey rich connotative meaning that can reveal underlying group social processes. Previous research has focused on sentiment analysis at the document level, but we extend this analysis to sub-document levels through a detailed analysis of affective relationships between entities extracted from a document. Instead of pure sentiment analysis, which is just positive or negative, we explore nuances of affective meaning in 22 affect categories. Our affect propagation algorithm automatically calculates and displays extracted affective relationships among entities in graphical form in our prototype (TEAMSTER), starting with seed lists of affect terms. Severalmore » useful metrics are defined to infer underlying group processes by aggregating affective relationships discovered in a text. Our approach has been validated with annotated documents from the MPQA corpus, achieving a performance gain of 74% over comparable random guessers.« less
Using ontology network structure in text mining.
Berndt, Donald J; McCart, James A; Luther, Stephen L
2010-11-13
Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.
Evaluation of PHI Hunter in Natural Language Processing Research.
Redd, Andrew; Pickard, Steve; Meystre, Stephane; Scehnet, Jeffrey; Bolton, Dan; Heavirland, Julia; Weaver, Allison Lynn; Hope, Carol; Garvin, Jennifer Hornung
2015-01-01
We introduce and evaluate a new, easily accessible tool using a common statistical analysis and business analytics software suite, SAS, which can be programmed to remove specific protected health information (PHI) from a text document. Removal of PHI is important because the quantity of text documents used for research with natural language processing (NLP) is increasing. When using existing data for research, an investigator must remove all PHI not needed for the research to comply with human subjects' right to privacy. This process is similar, but not identical, to de-identification of a given set of documents. PHI Hunter removes PHI from free-form text. It is a set of rules to identify and remove patterns in text. PHI Hunter was applied to 473 Department of Veterans Affairs (VA) text documents randomly drawn from a research corpus stored as unstructured text in VA files. PHI Hunter performed well with PHI in the form of identification numbers such as Social Security numbers, phone numbers, and medical record numbers. The most commonly missed PHI items were names and locations. Incorrect removal of information occurred with text that looked like identification numbers. PHI Hunter fills a niche role that is related to but not equal to the role of de-identification tools. It gives research staff a tool to reasonably increase patient privacy. It performs well for highly sensitive PHI categories that are rarely used in research, but still shows possible areas for improvement. More development for patterns of text and linked demographic tables from electronic health records (EHRs) would improve the program so that more precise identifiable information can be removed. PHI Hunter is an accessible tool that can flexibly remove PHI not needed for research. If it can be tailored to the specific data set via linked demographic tables, its performance will improve in each new document set.
Evaluation of PHI Hunter in Natural Language Processing Research
Redd, Andrew; Pickard, Steve; Meystre, Stephane; Scehnet, Jeffrey; Bolton, Dan; Heavirland, Julia; Weaver, Allison Lynn; Hope, Carol; Garvin, Jennifer Hornung
2015-01-01
Objectives We introduce and evaluate a new, easily accessible tool using a common statistical analysis and business analytics software suite, SAS, which can be programmed to remove specific protected health information (PHI) from a text document. Removal of PHI is important because the quantity of text documents used for research with natural language processing (NLP) is increasing. When using existing data for research, an investigator must remove all PHI not needed for the research to comply with human subjects’ right to privacy. This process is similar, but not identical, to de-identification of a given set of documents. Materials and methods PHI Hunter removes PHI from free-form text. It is a set of rules to identify and remove patterns in text. PHI Hunter was applied to 473 Department of Veterans Affairs (VA) text documents randomly drawn from a research corpus stored as unstructured text in VA files. Results PHI Hunter performed well with PHI in the form of identification numbers such as Social Security numbers, phone numbers, and medical record numbers. The most commonly missed PHI items were names and locations. Incorrect removal of information occurred with text that looked like identification numbers. Discussion PHI Hunter fills a niche role that is related to but not equal to the role of de-identification tools. It gives research staff a tool to reasonably increase patient privacy. It performs well for highly sensitive PHI categories that are rarely used in research, but still shows possible areas for improvement. More development for patterns of text and linked demographic tables from electronic health records (EHRs) would improve the program so that more precise identifiable information can be removed. Conclusions PHI Hunter is an accessible tool that can flexibly remove PHI not needed for research. If it can be tailored to the specific data set via linked demographic tables, its performance will improve in each new document set. PMID:26807078
Application of portable CDA for secure clinical-document exchange.
Huang, Kuo-Hsuan; Hsieh, Sung-Huai; Chang, Yuan-Jen; Lai, Feipei; Hsieh, Sheau-Ling; Lee, Hsiu-Hui
2010-08-01
Health Level Seven (HL7) organization published the Clinical Document Architecture (CDA) for exchanging documents among heterogeneous systems and improving medical quality based on the design method in CDA. In practice, although the HL7 organization tried to make medical messages exchangeable, it is still hard to exchange medical messages. There are many issues when two hospitals want to exchange clinical documents, such as patient privacy, network security, budget, and the strategies of the hospital. In this article, we propose a method for the exchange and sharing of clinical documents in an offline model based on the CDA-the Portable CDA. This allows the physician to retrieve the patient's medical record stored in a portal device, but not through the Internet in real time. The security and privacy of CDA data will also be considered.
An Introduction to the Extensible Markup Language (XML).
ERIC Educational Resources Information Center
Bryan, Martin
1998-01-01
Describes Extensible Markup Language (XML), a subset of the Standard Generalized Markup Language (SGML) that is designed to make it easy to interchange structured documents over the Internet. Topics include Document Type Definition (DTD), components of XML, the use of XML, text and non-text elements, and uses for XML-coded files. (LRW)
Extracting biomedical events from pairs of text entities
2015-01-01
Background Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline. Results We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition. PMID:26201478
Mining the pharmacogenomics literature—a survey of the state of the art
Cohen, K. Bretonnel; Garten, Yael; Shah, Nigam H.
2012-01-01
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research. PMID:22833496
Mining the pharmacogenomics literature--a survey of the state of the art.
Hahn, Udo; Cohen, K Bretonnel; Garten, Yael; Shah, Nigam H
2012-07-01
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
Determining Multiple Sclerosis Phenotype from Electronic Medical Records.
Nelson, Richard E; Butler, Jorie; LaFleur, Joanne; Knippenberg, Kristin; C Kamauu, Aaron W; DuVall, Scott L
2016-12-01
Multiple sclerosis (MS), a central nervous system disease in which nerve signals are disrupted by scarring and demyelination, is classified into phenotypes depending on the patterns of cognitive or physical impairment progression: relapsing-remitting MS (RRMS), primary-progressive MS (PPMS), secondary-progressive MS (SPMS), or progressive-relapsing MS (PRMS). The phenotype is important in managing the disease and determining appropriate treatment. The ICD-9-CM code 340.0 is uninformative about MS phenotype, which increases the difficulty of studying the effects of phenotype on disease. To identify MS phenotype using natural language processing (NLP) techniques on progress notes and other clinical text in the electronic medical record (EMR). Patients with at least 2 ICD-9-CM codes for MS (340.0) from 1999 through 2010 were identified from nationwide EMR data in the Department of Veterans Affairs. Clinical experts were interviewed for possible keywords and phrases denoting MS phenotype in order to develop a data dictionary for NLP. For each patient, NLP was used to search EMR clinical notes, since the first MS diagnosis date for these keywords and phrases. Presence of phenotype-related keywords and phrases were analyzed in context to remove mentions that were negated (e.g., "not relapsing-remitting") or unrelated to MS (e.g., "RR" meaning "respiratory rate"). One thousand mentions of MS phenotype were validated, and all records of 150 patients were reviewed for missed mentions. There were 7,756 MS patients identified by ICD-9-CM code 340.0. MS phenotype was identified for 2,854 (36.8%) patients, with 1,836 (64.3%) of those having just 1 phenotype mentioned in their EMR clinical notes: 1,118 (39.2%) RRMS, 325 (11.4%) PPMS, 374 (13.1%) SPMS, and 19 (0.7%) PRMS. A total of 747 patients (26.2%) had 2 phenotypes, the most common being 459 patients (16.1%) with RRMS and SPMS. A total of 213 patients (7.5%) had 3 phenotypes, and 58 patients (2.0%) had 4 phenotypes mentioned in their EMR clinical notes. Positive predictive value of phenotype identification was 93.8% with sensitivity of 94.0%. Phenotype was documented for slightly more than one third of MS patients, an important but disappointing finding that sets a limit on studying the effects of phenotype on MS in general. However, for cases where the phenotype was documented, NLP accurately identified the phenotypes. Having multiple phenotypes documented is consistent with disease progression. The most common misidentification was because of ambiguity while clinicians were trying to determine phenotype. This study brings attention to the need for care providers to document MS phenotype more consistently and provides a solution for capturing phenotype from clinical text. This study was funded by Anolinx and F. Hoffman-La Roche. Nelson serves as a consultant for Anolinx. Kamauu is owner of Anolinx, which has received multiple research grants from pharmaceutical and biotechnology companies. LaFleur has received a Novartis grant for ongoing work. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the U.S. government. Study concept and design were contributed by Butler, LaFleur, Kamauu, DuVall, and Nelson. DuVall collected the data, and interpretation was performed by Nelson, DuVall, and Kamauu, along with Butler, LaFleur, and Knippenberg. The manuscript was written primarily by Nelson, along with Knippenberg and assisted by the other authors, and revised by Knippenberg, Nelson, and DuVall, along with the other authors.
What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images.
Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W
2015-06-01
Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
Elbogen, Eric B; Tomkins, Alan J; Pothuloori, Antara P; Scalora, Mario J
2003-01-01
Studies have identified risk factors that show a strong association with violent behavior in psychiatric populations. Yet, little research has been conducted on the documentation of violence risk information in actual clinical practice, despite the relevance of such documentation to risk assessment liability and to conducting effective risk management. In this study, the documentation of cues of risk for violence were examined in psychiatric settings. Patient charts (n = 283) in four psychiatric settings were reviewed for documentation of violence risk information summarized in the MacArthur Violence Risk Assessment Study. The results revealed that particular patient and institutional variables influenced documentation practices. The presence of personality disorder, for example, predicted greater documentation of cues of violence risk, regardless of clinical setting. These findings have medicolegal implications for risk assessment liability and clinical implications for optimizing risk management in psychiatric practice.
Clinical Document Architecture integration system to support patient referral and reply letters.
Lee, Sung-Hyun; Song, Joon Hyun; Kim, Il Kon; Kim, Jeong-Whun
2016-06-01
Many Clinical Document Architecture (CDA) referrals and reply documents have been accumulated for patients since the deployment of the Health Information Exchange System (HIES) in Korea. Clinical data were scattered in many CDA documents and this took too much time for physicians to read. Physicians in Korea spend only limited time per patient as insurances in Korea follow a fee-for-service model. Therefore, physicians were not allowed sufficient time for making medical decisions, and follow-up care service was hindered. To address this, we developed CDA Integration Template (CIT) and CDA Integration System (CIS) for the HIES. The clinical items included in CIT were defined reflecting the Korean Standard for CDA Referral and Reply Letters and requests by physicians. CIS integrates CDA documents of a specified patient into a single CDA document following the format of CIT. Finally, physicians were surveyed after CIT/CIS adoption, and they indicated overall satisfaction. © The Author(s) 2014.
Zhou, Li; Collins, Sarah; Morgan, Stephen J.; Zafar, Neelam; Gesner, Emily J.; Fehrenbach, Martin; Rocha, Roberto A.
2016-01-01
Structured clinical documentation is an important component of electronic health records (EHRs) and plays an important role in clinical care, administrative functions, and research activities. Clinical data elements serve as basic building blocks for composing the templates used for generating clinical documents (such as notes and forms). We present our experience in creating and maintaining data elements for three different EHRs (one home-grown and two commercial systems) across different clinical settings, using flowsheet data elements as examples in our case studies. We identified basic but important challenges (including naming convention, links to standard terminologies, and versioning and change management) and possible solutions to address them. We also discussed more complicated challenges regarding governance, documentation vs. structured data capture, pre-coordination vs. post-coordination, reference information models, as well as monitoring, communication and training. PMID:28269927
Online database for documenting clinical pathology resident education.
Hoofnagle, Andrew N; Chou, David; Astion, Michael L
2007-01-01
Training of clinical pathologists is evolving and must now address the 6 core competencies described by the Accreditation Council for Graduate Medical Education (ACGME), which include patient care. A substantial portion of the patient care performed by the clinical pathology resident takes place while the resident is on call for the laboratory, a practice that provides the resident with clinical experience and assists the laboratory in providing quality service to clinicians in the hospital and surrounding community. Documenting the educational value of these on-call experiences and providing evidence of competence is difficult for residency directors. An online database of these calls, entered by residents and reviewed by faculty, would provide a mechanism for documenting and improving the education of clinical pathology residents. With Microsoft Access we developed an online database that uses active server pages and secure sockets layer encryption to document calls to the clinical pathology resident. Using the data collected, we evaluated the efficacy of 3 interventions aimed at improving resident education. The database facilitated the documentation of more than 4 700 calls in the first 21 months it was online, provided archived resident-generated data to assist in serving clients, and demonstrated that 2 interventions aimed at improving resident education were successful. We have developed a secure online database, accessible from any computer with Internet access, that can be used to easily document clinical pathology resident education and competency.
Web Prep: How to Prepare NAS Reports For Publication on the Web
NASA Technical Reports Server (NTRS)
Walatka, Pamela; Balakrishnan, Prithika; Clucas, Jean; McCabe, R. Kevin; Felchle, Gail; Brickell, Cristy
1996-01-01
This document contains specific advice and requirements for NASA Ames Code IN authors of NAS reports. Much of the information may be of interest to other authors writing for the Web. WebPrep has a graphic Table of Contents in the form of a WebToon, which simulates a discussion between a scientist and a Web publishing consultant. In the WebToon, Frequently Asked Questions about preparing reports for the Web are linked to relevant text in the body of this document. We also provide a text-only Table of Contents. The text for this document is divided into chapters: each chapter corresponds to one frame of the WebToons. The chapter topics are: converting text to HTML, converting 2D graphic images to gif, creating imagemaps and tables, converting movie and audio files to Web formats, supplying 3D interactive data, and (briefly) JAVA capabilities. The last chapter is specifically for NAS staff authors. The Glossary-Index lists web related words and links to topics covered in the main text.
Readability Formulas and User Perceptions of Electronic Health Records Difficulty: A Corpus Study
Yu, Hong
2017-01-01
Background Electronic health records (EHRs) are a rich resource for developing applications to engage patients and foster patient activation, thus holding a strong potential to enhance patient-centered care. Studies have shown that providing patients with access to their own EHR notes may improve the understanding of their own clinical conditions and treatments, leading to improved health care outcomes. However, the highly technical language in EHR notes impedes patients’ comprehension. Numerous studies have evaluated the difficulty of health-related text using readability formulas such as Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), and Gunning-Fog Index (GFI). They conclude that the materials are often written at a grade level higher than common recommendations. Objective The objective of our study was to explore the relationship between the aforementioned readability formulas and the laypeople’s perceived difficulty on 2 genres of text: general health information and EHR notes. We also validated the formulas’ appropriateness and generalizability on predicting difficulty levels of highly complex technical documents. Methods We collected 140 Wikipedia articles on diabetes and 242 EHR notes with diabetes International Classification of Diseases, Ninth Revision code. We recruited 15 Amazon Mechanical Turk (AMT) users to rate difficulty levels of the documents. Correlations between laypeople’s perceived difficulty levels and readability formula scores were measured, and their difference was tested. We also compared word usage and the impact of medical concepts of the 2 genres of text. Results The distributions of both readability formulas’ scores (P<.001) and laypeople’s perceptions (P=.002) on the 2 genres were different. Correlations of readability predictions and laypeople’s perceptions were weak. Furthermore, despite being graded at similar levels, documents of different genres were still perceived with different difficulty (P<.001). Word usage in the 2 related genres still differed significantly (P<.001). Conclusions Our findings suggested that the readability formulas’ predictions did not align with perceived difficulty in either text genre. The widely used readability formulas were highly correlated with each other but did not show adequate correlation with readers’ perceived difficulty. Therefore, they were not appropriate to assess the readability of EHR notes. PMID:28254738
Bejan, Cosmin Adrian; Wei, Wei-Qi; Denny, Joshua C
2015-01-01
Objective To evaluate the contribution of the MEDication Indication (MEDI) resource and SemRep for identifying treatment relations in clinical text. Materials and methods We first processed clinical documents with SemRep to extract the Unified Medical Language System (UMLS) concepts and the treatment relations between them. Then, we incorporated MEDI into a simple algorithm that identifies treatment relations between two concepts if they match a medication-indication pair in this resource. For a better coverage, we expanded MEDI using ontology relationships from RxNorm and UMLS Metathesaurus. We also developed two ensemble methods, which combined the predictions of SemRep and the MEDI algorithm. We evaluated our selected methods on two datasets, a Vanderbilt corpus of 6864 discharge summaries and the 2010 Informatics for Integrating Biology and the Bedside (i2b2)/Veteran's Affairs (VA) challenge dataset. Results The Vanderbilt dataset included 958 manually annotated treatment relations. A double annotation was performed on 25% of relations with high agreement (Cohen's κ = 0.86). The evaluation consisted of comparing the manual annotated relations with the relations identified by SemRep, the MEDI algorithm, and the two ensemble methods. On the first dataset, the best F1-measure results achieved by the MEDI algorithm and the union of the two resources (78.7 and 80, respectively) were significantly higher than the SemRep results (72.3). On the second dataset, the MEDI algorithm achieved better precision and significantly lower recall values than the best system in the i2b2 challenge. The two systems obtained comparable F1-measure values on the subset of i2b2 relations with both arguments in MEDI. Conclusions Both SemRep and MEDI can be used to extract treatment relations from clinical text. Knowledge-based extraction with MEDI outperformed use of SemRep alone, but superior performance was achieved by integrating both systems. The integration of knowledge-based resources such as MEDI into information extraction systems such as SemRep and the i2b2 relation extractors may improve treatment relation extraction from clinical text. PMID:25336593
The Development of Clinical Document Standards for Semantic Interoperability in China
Yang, Peng; Pan, Feng; Wan, Yi; Tu, Haibo; Tang, Xuejun; Hu, Jianping
2011-01-01
Objectives This study is aimed at developing a set of data groups (DGs) to be employed as reusable building blocks for the construction of the eight most common clinical documents used in China's general hospitals in order to achieve their structural and semantic standardization. Methods The Diagnostics knowledge framework, the related approaches taken from the Health Level Seven (HL7), the Integrating the Healthcare Enterprise (IHE), and the Healthcare Information Technology Standards Panel (HITSP) and 1,487 original clinical records were considered together to form the DG architecture and data sets. The internal structure, content, and semantics of each DG were then defined by mapping each DG data set to a corresponding Clinical Document Architecture data element and matching each DG data set to the metadata in the Chinese National Health Data Dictionary. By using the DGs as reusable building blocks, standardized structures and semantics regarding the clinical documents for semantic interoperability were able to be constructed. Results Altogether, 5 header DGs, 48 section DGs, and 17 entry DGs were developed. Several issues regarding the DGs, including their internal structure, identifiers, data set names, definitions, length and format, data types, and value sets, were further defined. Standardized structures and semantics regarding the eight clinical documents were structured by the DGs. Conclusions This approach of constructing clinical document standards using DGs is a feasible standard-driven solution useful in preparing documents possessing semantic interoperability among the disparate information systems in China. These standards need to be validated and refined through further study. PMID:22259722
10 CFR 2.1013 - Use of the electronic docket during the proceeding.
Code of Federal Regulations, 2010 CFR
2010-01-01
... bi-tonal documents. (v) Electronic submissions must be generated in the appropriate PDF output format by using: (A) PDF—Formatted Text and Graphics for textual documents converted from native applications; (B) PDF—Searchable Image (Exact) for textual documents converted from scanned documents; and (C...
VisualUrText: A Text Analytics Tool for Unstructured Textual Data
NASA Astrophysics Data System (ADS)
Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.
2018-05-01
The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dimmick, Ross
This document contains updates to the Supplemental Information Sandia National Laboratories/New Mexico Site-Wide Environmental Impact Statement Source Documents that were developed in 2010. In general, this addendum provides calendar year 2010 data, along with changes or additions to text in the original documents.
Walton, Merrilyn; Harrison, Reema; Burgess, Annette; Foster, Kirsty
2015-10-01
Preventable harm is one of the top six health problems in the developed world. Developing patient safety skills and knowledge among advanced trainee doctors is critical. Clinical supervision is the main form of training for advanced trainees. The use of supervision to develop patient safety competence has not been established. To establish the use of clinical supervision and other workplace training to develop non-technical patient safety competency in advanced trainee doctors. Keywords, synonyms and subject headings were used to search eight electronic databases in addition to hand-searching of relevant journals up to 1 March 2014. Titles and abstracts of retrieved publications were screened by two reviewers and checked by a third. Full-text articles were screened against the eligibility criteria. Data on design, methods and key findings were extracted. Clinical supervision documents were assessed against components common to established patient safety frameworks. Findings from the reviewed articles and document analysis were collated in a narrative synthesis. Clinical supervision is not identified as an avenue for embedding patient safety skills in the workplace and is consequently not evaluated as a method to teach trainees these skills. Workplace training in non-technical patient safety skills is limited, but one-off training courses are sometimes used. Clinical supervision is the primary avenue for learning in postgraduate medical education but the most overlooked in the context of patient safety learning. The widespread implementation of short courses is not matched by evidence of rigorous evaluation. Supporting supervisors to identify teaching moments during supervision and to give weight to non-technical skills and technical skills equally is critical. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Lee, Eunjoo; Noh, Hyun Kyung
2016-01-01
To examine the effects of a web-based nursing process documentation system on the stress and anxiety of nursing students during their clinical practice. A quasi-experimental design was employed. The experimental group (n = 110) used a web-based nursing process documentation program for their case reports as part of assignments for a clinical practicum, whereas the control group (n = 106) used traditional paper-based case reports. Stress and anxiety levels were measured with a numeric rating scale before, 2 weeks after, and 4 weeks after using the web-based nursing process documentation program during a clinical practicum. The data were analyzed using descriptive statistics, t tests, chi-square tests, and repeated-measures analyses of variance. Nursing students who used the web-based nursing process documentation program showed significant lower levels of stress and anxiety than the control group. A web-based nursing process documentation program could be used to reduce the stress and anxiety of nursing students during clinical practicum, which ultimately would benefit nursing students by increasing satisfaction with and effectiveness of clinical practicum. © 2015 NANDA International, Inc.
ERIC Educational Resources Information Center
Göçer, Ali
2014-01-01
In this study, Turkish text-based written examination questions posed to students in secondary schools were examined. In this research, document analysis method within the framework of the qualitative research approach was used. The data obtained from the documents consisting of written examination papers were analyzed with content analysis…
ERIC Educational Resources Information Center
Stromso, Helge I.; Braten, Ivar; Britt, M. Anne
2010-01-01
In many situations, readers are asked to learn from multiple documents. Many studies have found that evaluating the trustworthiness and usefulness of document sources is an important skill in such learning situations. There has been, however, no direct evidence that attending to source information helps readers learn from and interpret a…
Automatic Identification of Topic Tags from Texts Based on Expansion-Extraction Approach
ERIC Educational Resources Information Center
Yang, Seungwon
2013-01-01
Identifying topics of a textual document is useful for many purposes. We can organize the documents by topics in digital libraries. Then, we could browse and search for the documents with specific topics. By examining the topics of a document, we can quickly understand what the document is about. To augment the traditional manual way of topic…
Document reconstruction by layout analysis of snippets
NASA Astrophysics Data System (ADS)
Kleber, Florian; Diem, Markus; Sablatnig, Robert
2010-02-01
Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew. In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction. Documents can either be destroyed by the intention to make the printed content unavailable (e.g. tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques. In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown. This implies a rotational analysis, a color analysis and a line detection. As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs. machine printed) and the text layout of a snippet (text size, line spacing). Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets.
Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia
2013-01-01
Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008
[The Breast Unit in the European and national policy documents: similarities and differences].
Marcon, Anna; Albertini, Giovanna; Di Gregori, Valentina; Ghirarduzzi, Angelo; Fantini, Maria Pia
2013-11-01
Aim of this study is to assess differences and similarities in official European and Italian Ministry of Health policy documents referring to the subject "Breast Unit". The T-Lab software package for textual analysis was used to analyze the documents. This instrument permits the identification of the most frequent used words and the semantic network associated with "Breast Unit". Results show that the European document gives more emphasis to the concept of "integrated care", delivered by a multi-professional team that meets the clinical, psychological and informational needs of the patient. The Italian document gives more prominence to themes related to the clinical content of the interventions and managerial aspects through the use of clinical guidelines.
Rectification of curved document images based on single view three-dimensional reconstruction.
Kang, Lai; Wei, Yingmei; Jiang, Jie; Bai, Liang; Lao, Songyang
2016-10-01
Since distortions in camera-captured document images significantly affect the accuracy of optical character recognition (OCR), distortion removal plays a critical role for document digitalization systems using a camera for image capturing. This paper proposes a novel framework that performs three-dimensional (3D) reconstruction and rectification of camera-captured document images. While most existing methods rely on additional calibrated hardware or multiple images to recover the 3D shape of a document page, or make a simple but not always valid assumption on the corresponding 3D shape, our framework is more flexible and practical since it only requires a single input image and is able to handle a general locally smooth document surface. The main contributions of this paper include a new iterative refinement scheme for baseline fitting from connected components of text line, an efficient discrete vertical text direction estimation algorithm based on convex hull projection profile analysis, and a 2D distortion grid construction method based on text direction function estimation using 3D regularization. In order to examine the performance of our proposed method, both qualitative and quantitative evaluation and comparison with several recent methods are conducted in our experiments. The experimental results demonstrate that the proposed method outperforms relevant approaches for camera-captured document image rectification, in terms of improvements on both visual distortion removal and OCR accuracy.
Reminder Cards Improve Physician Documentation of Obesity But Not Obesity Counseling.
Shungu, Nicholas; Miller, Marshal N; Mills, Geoffrey; Patel, Neesha; de la Paz, Amanda; Rose, Victoria; Kropa, Jill; Edi, Rina; Levy, Emily; Crenshaw, Margaret; Hwang, Chris
2015-01-01
Physicians frequently fail to document obesity and obesity-related counseling. We sought to determine whether attaching a physical reminder card to patient encounter forms would increase electronic medical record (EMR) assessment of and documentation of obesity and dietary counseling. Reminder cards for obesity documentation were attached to encounter forms for patient encounters over a 2-week intervention period. For visits in the intervention period, the EMR was retrospectively reviewed for BMI, assessment of "obesity" or "morbid obesity" as an active problem, free-text dietary counseling within physician notes, and assessment of "dietary counseling" as an active problem. These data were compared to those collected through a retrospective chart review during a 2-week pre-intervention period. We also compared physician self-report of documentation via reminder cards with EMR documentation. We found significant improvement in the primary endpoint of assessment of "obesity" or "morbid obesity" as an active problem (42.5% versus 28%) compared to the pre-intervention period. There was no significant difference in the primary endpoints of free-text dietary counseling or assessment of "dietary counseling" as an active problem between the groups. Physician self-reporting of assessment of "obesity" or "morbid obesity" as an active problem (77.7% versus 42.5%), free-text dietary counseling on obesity (69.1% versus 35.4%) and assessment of "dietary counseling" as an active problem (54.3% versus 25.2%) were all significantly higher than those reflected in EMR documentation. This study demonstrates that physical reminder cards are a successful means of increasing obesity documentation rates among providers but do not necessarily increase rates of obesity-related counseling or documentation of counseling. Our study suggests that even with such interventions, physicians are likely under-documenting obesity and counseling compared to self-reported rates.
Williamson, Rebecca; Meacham, Lillian; Cherven, Brooke; Hassen-Schilling, Leann; Edwards, Paula; Palgon, Michael; Espinoza, Sofia; Mertens, Ann
2014-09-01
Cancer SurvivorLink™, www.cancersurvivorlink.org , is a patient-controlled communication tool where survivors can electronically store and share documents with healthcare providers. Functionally, SurvivorLink serves as an electronic personal health record-a record of health-related information managed and controlled by the survivor. Recruitment methods to increase registration and the characteristics of registrants who completed each step of using SurvivorLink are described. Pediatric cancer survivors were recruited via mailings, survivor clinic, and community events. Recruitment method and Aflac Survivor Clinic attendance was determined for each registrant. Registration date, registrant type (parent vs. survivor), zip code, creation of a personal health record in SurvivorLink, storage of documents, and document sharing were measured. Logistic regression was used to determine the characteristics that predicted creation of a health record and storage of documents. To date, 275 survivors/parents have completed registration: 63 were recruited via mailing, 99 from clinic, 56 from community events, and 57 via other methods. Overall, 66.9 % registrants created a personal health record and 45.7 % of those stored a health document. There were no significant predictors for creating a personal health record. Attending a survivor clinic was the strongest predictor of document storage (p < 0.01). Of those with a document stored, 21.4 % shared with a provider. Having attended survivor clinic is the biggest predictor of registering and using SurvivorLink. Many survivors must advocate for their survivorship care. Survivor Link provides educational material and supports the dissemination of survivor-specific follow-up recommendations to facilitate shared clinical care decision making.
Let Documents Talk to Each Other: A Computer Model for Connection of Short Documents.
ERIC Educational Resources Information Center
Chen, Z.
1993-01-01
Discusses the integration of scientific texts through the connection of documents and describes a computer model that can connect short documents. Information retrieval and artificial intelligence are discussed; a prototype system of the model is explained; and the model is compared to other computer models. (17 references) (LRW)
30 CFR 285.115 - Documents incorporated by reference.
Code of Federal Regulations, 2011 CFR
2011-07-01
... incorporating by reference the documents listed in the table in paragraph (e) of this section. The Director of...: ER29AP09.104 (e) This paragraph lists documents incorporated by reference. To easily reference text of the... 30 Mineral Resources 2 2011-07-01 2011-07-01 false Documents incorporated by reference. 285.115...
75 FR 28594 - Ready-to-Learn Television Program
Federal Register 2010, 2011, 2012, 2013, 2014
2010-05-21
... Federal Register. Free Internet access to the official edition of the Federal Register and the Code of... Access to This Document: You can view this document, as well as all other documents of this Department published in the Federal Register, in text or Adobe Portable Document Format (PDF) on the Internet at the...
Text-mining analysis of mHealth research.
Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun
2017-01-01
In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies.
Text-mining analysis of mHealth research
Zengul, Ferhat; Oner, Nurettin; Delen, Dursun
2017-01-01
In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions, (V) Research Design, (VI) Infrastructure, (VII) Applications, (VIII) Research and Innovation in Health Technologies, (IX) Sensor-based Devices and Measurement Algorithms, (X) Survey-based Research. Third, the trend analyses indicated the infrastructure cluster as the highest percentage researched area until 2014. The Research and Innovation in Health Technologies cluster experienced the largest increase in numbers of publications in recent years, especially after 2014. This study is unique because it is the only known study utilizing text-mining analyses to reveal the streams and trends for mHealth research. The fast growth in mobile technologies is expected to lead to higher numbers of studies focusing on mHealth and its implications for various healthcare outcomes. Findings of this study can be utilized by researchers in identifying areas for future studies. PMID:29430456
Impact of registration on clinical trials on infection risk in pediatric acute myeloid leukemia.
Dix, David; Aplenc, Richard; Bowes, Lynette; Cellot, Sonia; Ethier, Marie-Chantal; Feusner, Jim; Gillmeister, Biljana; Johnston, Donna L; Lewis, Victor; Michon, Bruno; Mitchell, David; Portwine, Carol; Price, Victoria; Silva, Mariana; Stobart, Kent; Yanofsky, Rochelle; Zelcer, Shayna; Beyene, Joseph; Sung, Lillian
2016-04-01
Little is known about the impact of enrollment on therapeutic clinical trials on adverse event rates. Primary objective was to describe the impact of clinical trial registration on sterile site microbiologically documented infection for children with newly diagnosed acute myeloid leukemia (AML). We conducted a multicenter cohort study that included children aged ≤18 years with de novo AML. Primary outcome was microbiologically documented sterile site infection. Infection rates were compared between those registered and not registered on clinical trials. Five hundred seventy-four children with AML were included of which 198 (34.5%) were registered on a therapeutic clinical trial. Overall, 400 (69.7%) had at least one sterile site microbiologically documented infection. In multiple regression, registration on clinical trials was independently associated with a higher risk of microbiologically documented sterile site infection [adjusted odds ratio (OR) 1.24, 95% confidence interval (CI) 1.01-1.53; p = 0.040] and viridans group streptococcal infection (OR 1.46, 95% CI 1.08-1.98; p = 0.015). Registration on trials was not associated with Gram-negative or invasive fungal infections. Children with newly diagnosed AML enrolled on clinical trials have a higher risk of microbiologically documented sterile site infection. This information may impact on supportive care practices in pediatric AML. © 2015 UICC.
Who is teaching what, when? An evolving online tool to manage dental curricula.
Walton, Joanne N
2014-03-01
There are numerous issues in the documentation and ongoing development of health professions curricula. It seems that curriculum information falls quickly out of date between accreditation cycles, while students and faculty members struggle in the meantime with the "hidden curriculum" and unintended redundancies and gaps. Beyond knowing what is in the curriculum lies the frustration of timetabling learning in a transparent way while allowing for on-the-fly changes and improvements. The University of British Columbia Faculty of Dentistry set out to develop a curriculum database to answer the simple but challenging question "who is teaching what, when?" That tool, dubbed "OSCAR," has evolved to not only document the dental curriculum, but as a shared instrument that also holds the curricula and scheduling detail of the dental hygiene degree and clinical graduate programs. In addition to providing documentation ranging from reports for accreditation to daily information critical to faculty administrators and staff, OSCAR provides faculty and students with individual timetables and pushes updates via text, email, and calendar changes. It incorporates reminders and session resources for students and can be updated by both faculty members and staff. OSCAR has evolved into an essential tool for tracking, scheduling, and improving the school's curricula.
Audit of Endotracheal Tube Suction in a Pediatric Intensive Care Unit.
Davies, Kylie; Bulsara, Max K; Ramelet, Anne-Sylvie; Monterosso, Leanne
2017-02-01
We report outcomes of a clinical audit examining criteria used in clinical practice to rationalize endotracheal tube (ETT) suction, and the extent these matched criteria in the Endotracheal Suction Assessment Tool(ESAT)©. A retrospective audit of patient notes ( N = 292) and analyses of criteria documented by pediatric intensive care nurses to rationalize ETT suction were undertaken. The median number of documented respiratory and ventilation status criteria per ETT suction event that matched the ESAT© criteria was 2 [Interquartile Range (IQR) 1-6]. All criteria listed within the ESAT© were documented within the reviewed notes. A direct link was established between criteria used for current clinical practice of ETT suction and the ESAT©. The ESAT©, therefore, reflects documented clinical decision making and could be used as both a clinical and educational guide for inexperienced pediatric critical care nurses. Modification to the ESAT © requires "preparation for extubation" to be added.
[Preventing dependency in the elderly].
Gómez Pavón, J; Martín Lesende, I; Baztán Cortés, J J; Regato Pajares, P; Formiga Pérez, F; Segura Benedito, A; Abizanda Soler, P; de Pedro Cuesta, J
2008-01-01
Dependency, i.e. the need to depend on another person to perform activities of daily living, is the main concern and cause of suffering and poor quality of life in the elderly. The prevalence of dependency increases with age and is related to the presence of prior disease and fragility. Dependency is associated with increased morbidity, mortality and institutionalization, as well as with greater health and social resource utilization, all of which increases health costs. To create a consensus document on the main health recommendations for the prevention of dependency in the elderly, based on the scientific evidence available to date, with the collaboration of scientific societies and public health administrations (the Spanish Ministry of Health, Autonomous Communities and Cities). a) a preliminary consensus document was drafted by an expert group composed of representatives of various scientific societies and health administrations. This document was based on a review of the recommendations and guidelines published by the main organizations involved in health promotion and the prevention of disease, functional deterioration and dependency in the elderly; b) the consensus document was reviewed by the remaining experts assigned by the scientific societies and central and autonomous administrations; c) the final document was approved after a session in which the text was discussed and reviewed by all the experts participating in the working group (including the academic committee); d) the document was presented and discussed in the First National Conference on Prevention and Health Promotion in Clinical Practice in Spain. All participating experts signed a conflicts of interest statement. The document provides recommendations, with their grades of evidence, grouped in the following three categories: a) health promotion and disease prevention, with specific preventive activities for the elderly, including prevention of geriatric syndromes; b) prevention of functional deterioration, with clinical recommendations that can be applied in primary and specialized care; c) prevention of iatrogeny (drug prescription, inappropriate use of diagnostic and therapeutic modalities and healthcare). These recommendations were tailored to the characteristics of the older person (OP), categorized in five groups: healthy OP, OP with chronic disease, fragile or at risk OP, dependent OP, and OP at the end of life. These recommendations should be implemented by public health administrations to improve strategies for the prevention of dependency in the elderly in the xxi century.
On the Creation of Hypertext Links in Full-Text Documents: Measurement of Inter-Linker Consistency.
ERIC Educational Resources Information Center
Ellis, David; And Others
1994-01-01
Describes a study in which several different sets of hypertext links are inserted by different people in full-text documents. The degree of similarity between the sets is measured using coefficients and topological indices. As in comparable studies of inter-indexer consistency, the sets of links used by different people showed little similarity.…
Helios: Understanding Solar Evolution Through Text Analytics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Randazzese, Lucien
This proof-of-concept project focused on developing, testing, and validating a range of bibliometric, text analytic, and machine-learning based methods to explore the evolution of three photovoltaic (PV) technologies: Cadmium Telluride (CdTe), Dye-Sensitized solar cells (DSSC), and Multi-junction solar cells. The analytical approach to the work was inspired by previous work by the same team to measure and predict the scientific prominence of terms and entities within specific research domains. The goal was to create tools that could assist domain-knowledgeable analysts in investigating the history and path of technological developments in general, with a focus on analyzing step-function changes in performance,more » or “breakthroughs,” in particular. The text-analytics platform developed during this project was dubbed Helios. The project relied on computational methods for analyzing large corpora of technical documents. For this project we ingested technical documents from the following sources into Helios: Thomson Scientific Web of Science (papers), the U.S. Patent & Trademark Office (patents), the U.S. Department of Energy (technical documents), the U.S. National Science Foundation (project funding summaries), and a hand curated set of full-text documents from Thomson Scientific and other sources.« less
"What is relevant in a text document?": An interpretable machine learning approach
Arras, Leila; Horn, Franziska; Montavon, Grégoire; Müller, Klaus-Robert
2017-01-01
Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text’s category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications. PMID:28800619
Schizophrenia Patient or Spiritually Advanced Personality? A Qualitative Case Analysis.
Bhargav, Hemant; Jagannathan, Aarti; Raghuram, Nagarathna; Srinivasan, T M; Gangadhar, Bangalore N
2015-10-01
Many aspects of spiritual experience are similar in form and content to symptoms of psychosis. Both spiritually advanced people and patients suffering from psychopathology experience alterations in their sense of 'self.' Psychotic experiences originate from derangement of the personality, whereas spiritual experiences involve systematic thinning out of the selfish ego, allowing individual consciousness to merge into universal consciousness. Documented instances and case studies suggest possible confusion between the spiritually advanced and schizophrenia patients. Clinical practice contains no clear guidelines on how to distinguish them. Here we use a case presentation to help tabulate clinically useful points distinguishing spiritually advanced persons from schizophrenia patients. A 34-year-old unmarried male reported to our clinic with four main complaints: lack of sense of self since childhood; repeated thoughts questioning whether he existed or not; social withdrawal; and inability to continue in any occupation. Qualitative case analysis and discussions using descriptions from ancient texts and modern psychology led to the diagnosis of schizophrenia rather than spiritual advancement.
Robust keyword retrieval method for OCRed text
NASA Astrophysics Data System (ADS)
Fujii, Yusaku; Takebe, Hiroaki; Tanaka, Hiroshi; Hotta, Yoshinobu
2011-01-01
Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.
Fu, Xiao; Batista-Navarro, Riza; Rak, Rafal; Ananiadou, Sophia
2015-01-01
Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Extending information retrieval methods to personalized genomic-based studies of disease.
Ye, Shuyun; Dawson, John A; Kendziorski, Christina
2014-01-01
Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual's disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a "document" with "text" detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.
Natural Language Processing in Radiology: A Systematic Review.
Pons, Ewoud; Braun, Loes M M; Hunink, M G Myriam; Kors, Jan A
2016-05-01
Radiological reporting has generated large quantities of digital content within the electronic health record, which is potentially a valuable source of information for improving clinical care and supporting research. Although radiology reports are stored for communication and documentation of diagnostic imaging, harnessing their potential requires efficient and automated information extraction: they exist mainly as free-text clinical narrative, from which it is a major challenge to obtain structured data. Natural language processing (NLP) provides techniques that aid the conversion of text into a structured representation, and thus enables computers to derive meaning from human (ie, natural language) input. Used on radiology reports, NLP techniques enable automatic identification and extraction of information. By exploring the various purposes for their use, this review examines how radiology benefits from NLP. A systematic literature search identified 67 relevant publications describing NLP methods that support practical applications in radiology. This review takes a close look at the individual studies in terms of tasks (ie, the extracted information), the NLP methodology and tools used, and their application purpose and performance results. Additionally, limitations, future challenges, and requirements for advancing NLP in radiology will be discussed. (©) RSNA, 2016 Online supplemental material is available for this article.
Text-image alignment for historical handwritten documents
NASA Astrophysics Data System (ADS)
Zinger, S.; Nerbonne, J.; Schomaker, L.
2009-01-01
We describe our work on text-image alignment in context of building a historical document retrieval system. We aim at aligning images of words in handwritten lines with their text transcriptions. The images of handwritten lines are automatically segmented from the scanned pages of historical documents and then manually transcribed. To train automatic routines to detect words in an image of handwritten text, we need a training set - images of words with their transcriptions. We present our results on aligning words from the images of handwritten lines and their corresponding text transcriptions. Alignment based on the longest spaces between portions of handwriting is a baseline. We then show that relative lengths, i.e. proportions of words in their lines, can be used to improve the alignment results considerably. To take into account the relative word length, we define the expressions for the cost function that has to be minimized for aligning text words with their images. We apply right to left alignment as well as alignment based on exhaustive search. The quality assessment of these alignments shows correct results for 69% of words from 100 lines, or 90% of partially correct and correct alignments combined.
Mining the Text: 34 Text Features that Can Ease or Obstruct Text Comprehension and Use
ERIC Educational Resources Information Center
White, Sheida
2012-01-01
This article presents 34 characteristics of texts and tasks ("text features") that can make continuous (prose), noncontinuous (document), and quantitative texts easier or more difficult for adolescents and adults to comprehend and use. The text features were identified by examining the assessment tasks and associated texts in the national…
Audit cycle of documentation in laser hair removal.
Cohen, S N; Lanigan, S W
2005-09-01
Lasercare clinics are one of the largest providers of skin laser treatment in the United Kingdom, in both private sector and National Health Service. Laser hair removal is performed by trained nurses following written protocols. Choice of laser and fluence is tailored to Fitzpatrick skin type. We audited and re-audited documentation of six criteria in patients receiving laser hair removal (signed consent, Fitzpatrick skin type, use of appropriate laser, appropriate fluence, patient satisfaction and objective assessment) across 13 clinics at different points in time. Data were obtained on 772 treatments. Overall findings revealed excellent documentation of consent, use of appropriate laser and fluence (median 100%), good documentation of skin type (median 90%) and poor documentation of patient satisfaction and objective assessment (median 67% and 53%, respectively). Comparison between baseline and repeat audit at 6-8 months (nine clinics) showed significant improvement across clinics in these latter two criteria [patient satisfaction: odds ratio (OR) 0.38, 95% confidence interval (CI) 0.15-0.78, P=0.01; objective assessment: OR 0.23, 95% CI 0.07-0.50, P=0.0003 (Mantel-Haenszel weighted odds ratios)]. We conclude that quality of documentation was generally and consistently high in multiple clinics and that re-auditing led to significant improvement in poor scores. This simple measure could easily be implemented more widely across many disciplines.
Assessing semantic similarity of texts - Methods and algorithms
NASA Astrophysics Data System (ADS)
Rozeva, Anna; Zerkova, Silvia
2017-12-01
Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.
Relevance popularity: A term event model based feature selection scheme for text classification.
Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao
2017-01-01
Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.
Patient reminder systems and asthma medication adherence: a systematic review.
Tran, Nancy; Coffman, Janet M; Sumino, Kaharu; Cabana, Michael D
2014-06-01
One of the most common reasons for medication non-adherence for asthma patients is forgetfulness. Daily medication reminder system interventions in the form of text messages, automated phone calls and audiovisual reminder devices can potentially address this problem. The aim of this review was to assess the effectiveness of reminder systems on patient daily asthma medication adherence. We conducted a systematic review of the literature to identify randomized controlled trials (RCTs) which assessed the effect of reminder systems on daily asthma medication adherence. We searched all English-language articles in Pub Med (MEDLINE), CINAHL, EMBASE, PsychINFO and the Cochrane Library through May 2013. We abstracted data on the year of study publication, location, inclusion and exclusion criteria, patient characteristics, reminder system characteristics, effect on patient adherence rate and other outcomes measured. Descriptive statistics were used to summarize the characteristics and results of the studies. Five RCTs and one pragmatic RCT were included in the analysis. Median follow-up time was 16 weeks. All of the six studies suggested that the reminder system intervention was associated with greater levels of participant asthma medication adherence compared to those participants in the control group. None of the studies documented a change in asthma-related quality of life or clinical asthma outcomes. All studies in our analysis suggest that reminder systems increase patient medication adherence, but none documented improved clinical outcomes. Further studies with longer intervention durations are needed to assess effects on clinical outcomes, as well as the sustainability of effects on patient adherence.
Peissig, Peggy L; Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B
2012-01-01
There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.
Standard Information Models for Representing Adverse Sensitivity Information in Clinical Documents.
Topaz, M; Seger, D L; Goss, F; Lai, K; Slight, S P; Lau, J J; Nandigam, H; Zhou, L
2016-01-01
Adverse sensitivity (e.g., allergy and intolerance) information is a critical component of any electronic health record system. While several standards exist for structured entry of adverse sensitivity information, many clinicians record this data as free text. This study aimed to 1) identify and compare the existing common adverse sensitivity information models, and 2) to evaluate the coverage of the adverse sensitivity information models for representing allergy information on a subset of inpatient and outpatient adverse sensitivity clinical notes. We compared four common adverse sensitivity information models: Health Level 7 Allergy and Intolerance Domain Analysis Model, HL7-DAM; the Fast Healthcare Interoperability Resources, FHIR; the Consolidated Continuity of Care Document, C-CDA; and OpenEHR, and evaluated their coverage on a corpus of inpatient and outpatient notes (n = 120). We found that allergy specialists' notes had the highest frequency of adverse sensitivity attributes per note, whereas emergency department notes had the fewest attributes. Overall, the models had many similarities in the central attributes which covered between 75% and 95% of adverse sensitivity information contained within the notes. However, representations of some attributes (especially the value-sets) were not well aligned between the models, which is likely to present an obstacle for achieving data interoperability. Also, adverse sensitivity exceptions were not well represented among the information models. Although we found that common adverse sensitivity models cover a significant portion of relevant information in the clinical notes, our results highlight areas needed to be reconciled between the standards for data interoperability.
Rihal, Charanjit S; Naidu, Srihari S; Givertz, Michael M; Szeto, Wilson Y; Burke, James A; Kapur, Navin K; Kern, Morton; Garratt, Kirk N; Goldstein, James A; Dimas, Vivian; Tu, Thomas
2015-06-01
This article provides a brief summary of the relevant recommendations and references related to percutaneous mechanical circulatory support. The goal was to provide the clinician with concise, evidence-based contemporary recommendations, and the supporting documentation to encourage their application. The full text includes disclosure of all relevant relationships with industry for each writing committee member. A fundamental aspect of all expert consensus statements is that these carefully developed, evidence-based documents can neither encompass all clinical circumstances, nor replace the judgment of individual physicians in management of each patient. The science of medicine is rooted in evidence, and the art of medicine is based on the application of this evidence to the individual patient. This expert consensus statement has adhered to these principles for optimal management of patients requiring percutaneous mechanical circulatory support. © 2015 by The Society for Cardiovascular Angiography and Interventions, The American College of Cardiology Foundation, the Heart Failure Society of America, and The Society for Thoracic Surgery.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-02
... published in the Federal Register. Free Internet access to the official edition of the Federal Register and.... Electronic Access to This Document: You can view this document, as well as all other documents of this Department published in the Federal Register, in text or Adobe Portable Document Format (PDF) on the Internet...
Topic detection using paragraph vectors to support active learning in systematic reviews.
Hashimoto, Kazuma; Kontonatsios, Georgios; Miwa, Makoto; Ananiadou, Sophia
2016-08-01
Systematic reviews require expert reviewers to manually screen thousands of citations in order to identify all relevant articles to the review. Active learning text classification is a supervised machine learning approach that has been shown to significantly reduce the manual annotation workload by semi-automating the citation screening process of systematic reviews. In this paper, we present a new topic detection method that induces an informative representation of studies, to improve the performance of the underlying active learner. Our proposed topic detection method uses a neural network-based vector space model to capture semantic similarities between documents. We firstly represent documents within the vector space, and cluster the documents into a predefined number of clusters. The centroids of the clusters are treated as latent topics. We then represent each document as a mixture of latent topics. For evaluation purposes, we employ the active learning strategy using both our novel topic detection method and a baseline topic model (i.e., Latent Dirichlet Allocation). Results obtained demonstrate that our method is able to achieve a high sensitivity of eligible studies and a significantly reduced manual annotation cost when compared to the baseline method. This observation is consistent across two clinical and three public health reviews. The tool introduced in this work is available from https://nactem.ac.uk/pvtopic/. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
System of HPC content archiving
NASA Astrophysics Data System (ADS)
Bogdanov, A.; Ivashchenko, A.
2017-12-01
This work is aimed to develop a system, that will effectively solve the problem of storing and analyzing files containing text data, by using modern software development tools, techniques and approaches. The main challenge of storing a large number of text documents defined at the problem formulation stage, have to be resolved with such functionality as full text search and document clustering depends on their contents. Main system features could be described with notions of distributed multilevel architecture, flexibility and interchangeability of components, achieved through the standard functionality incapsulation in independent executable modules.
Kwedza, Ruyamuro K; Larkins, Sarah; Johnson, Julie K; Zwar, Nicholas
2017-10-01
Definitions of clinical governance are varied and there is no one agreed model. This paper explored the perspectives of rural and remote primary healthcare services, located in North Queensland, Australia, on the meaning and goals of clinical governance. The study followed an embedded multiple case study design with semi-structured interviews, document analysis and non-participant observation. Participants included clinicians, non-clinical support staff, managers and executives. Similarities and differences in the understanding of clinical governance between health centre and committee case studies were evident. Almost one-third of participants were unfamiliar with the term or were unsure of its meaning; alongside limited documentation of a definition. Although most cases linked the concept of clinical governance to key terms, many lacked a comprehensive understanding. Similarities between cases included viewing clinical governance as a management and administrative function. Differences included committee members' alignment of clinical governance with corporate governance and frontline staff associating clinical governance with staff safety. Document analysis offered further insight into these perspectives. Clinical governance is well-documented as an expected organisational requirement, including in rural and remote areas where geographic, workforce and demographic factors pose additional challenges to quality and safety. However, in reality, it is not clearly, similarly or comprehensively understood by all participants.
Ford, Stephen; Illich, Stan; Smith, Lisa; Franklin, Arthur
2006-01-01
To describe the use of personal digital assistants (PDAs) in documenting pharmacists' clinical interventions. Evans Army Community Hospital (EACH), a 78-bed military treatment facility, in Colorado Springs. Pharmacists on staff at EACH. All pharmacists at EACH used PDAs with the pilot software to record interventions for 1 month. The program underwent final design changes and then became the sole source for recording pharmacist interventions. The results of this project are being evaluated every 3 months for the first year and yearly thereafter. Visual CE (Syware Inc. Cambridge, Mass.) software was selected to develop fields for the documentation tool. This software is simple and easy to use, and users can retrieve reports of interventions from both inpatient and outpatient sections. The software needed to be designed so that data entry would only take a few minutes and ad hoc reports could be produced easily. Number of pharmacist interventions reported, time spent in clinical interventions, and outcome of clinical intervention. Implementing a PDA-based system for documenting pharmacist interventions across ambulatory, inpatient, and clinical services dramatically increased reporting during the first 6 months after implementation (August 2004-February 2005). After initial fielding, clinical pharmacists in advanced practice settings (such as disease management clinic, anticoagulation clinic) recognized a need to tailor the program to their specific activities, which resulted in a spin-off program unique to their practice roles. A PDA-based system for documenting clinical interventions at a military treatment facility increased reporting of interventions across all pharmacy points of service. Pharmacy leadership used these data to document the impact of pharmacist interventions on safety and quality of pharmaceutical care provided.
ERIC Educational Resources Information Center
Stadtler, Marc; Scharrer, Lisa; Brummernhenrich, Benjamin; Bromme, Rainer
2013-01-01
Past research has shown that readers often fail to notice conflicts in text. In our present study we investigated whether accessing information from multiple documents instead of a single document might alleviate this problem by motivating readers to integrate information. We further tested whether this effect would be moderated by source…
Automatic document classification of biological literature
Chen, David; Müller, Hans-Michael; Sternberg, Paul W
2006-01-01
Background Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusion We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. PMID:16893465
Transcript mapping for handwritten English documents
NASA Astrophysics Data System (ADS)
Jose, Damien; Bharadwaj, Anurag; Govindaraju, Venu
2008-01-01
Transcript mapping or text alignment with handwritten documents is the automatic alignment of words in a text file with word images in a handwritten document. Such a mapping has several applications in fields ranging from machine learning where large quantities of truth data are required for evaluating handwriting recognition algorithms, to data mining where word image indexes are used in ranked retrieval of scanned documents in a digital library. The alignment also aids "writer identity" verification algorithms. Interfaces which display scanned handwritten documents may use this alignment to highlight manuscript tokens when a person examines the corresponding transcript word. We propose an adaptation of the True DTW dynamic programming algorithm for English handwritten documents. The integration of the dissimilarity scores from a word-model word recognizer and Levenshtein distance between the recognized word and lexicon word, as a cost metric in the DTW algorithm leading to a fast and accurate alignment, is our primary contribution. Results provided, confirm the effectiveness of our approach.
Automated detection of follow-up appointments using text mining of discharge records.
Ruud, Kari L; Johnson, Matthew G; Liesinger, Juliette T; Grafft, Carrie A; Naessens, James M
2010-06-01
To determine whether text mining can accurately detect specific follow-up appointment criteria in free-text hospital discharge records. Cross-sectional study. Mayo Clinic Rochester hospitals. Inpatients discharged from general medicine services in 2006 (n = 6481). Textual hospital dismissal summaries were manually reviewed to determine whether the records contained specific follow-up appointment arrangement elements: date, time and either physician or location for an appointment. The data set was evaluated for the same criteria using SAS Text Miner software. The two assessments were compared to determine the accuracy of text mining for detecting records containing follow-up appointment arrangements. Agreement of text-mined appointment findings with gold standard (manual abstraction) including sensitivity, specificity, positive predictive and negative predictive values (PPV and NPV). About 55.2% (3576) of discharge records contained all criteria for follow-up appointment arrangements according to the manual review, 3.2% (113) of which were missed through text mining. Text mining incorrectly identified 3.7% (107) follow-up appointments that were not considered valid through manual review. Therefore, the text mining analysis concurred with the manual review in 96.6% of the appointment findings. Overall sensitivity and specificity were 96.8 and 96.3%, respectively; and PPV and NPV were 97.0 and 96.1%, respectively. of individual appointment criteria resulted in accuracy rates of 93.5% for date, 97.4% for time, 97.5% for physician and 82.9% for location. Text mining of unstructured hospital dismissal summaries can accurately detect documentation of follow-up appointment arrangement elements, thus saving considerable resources for performance assessment and quality-related research.
Text grouping in patent analysis using adaptive K-means clustering algorithm
NASA Astrophysics Data System (ADS)
Shanie, Tiara; Suprijadi, Jadi; Zulhanif
2017-03-01
Patents are one of the Intellectual Property. Analyzing patent is one requirement in knowing well the development of technology in each country and in the world now. This study uses the patent document coming from the Espacenet server about Green Tea. Patent documents related to the technology in the field of tea is still widespread, so it will be difficult for users to information retrieval (IR). Therefore, it is necessary efforts to categorize documents in a specific group of related terms contained therein. This study uses titles patent text data with the proposed Green Tea in Statistical Text Mining methods consists of two phases: data preparation and data analysis stage. The data preparation phase uses Text Mining methods and data analysis stage is done by statistics. Statistical analysis in this study using a cluster analysis algorithm, the Adaptive K-Means Clustering Algorithm. Results from this study showed that based on the maximum value Silhouette, generate 87 clusters associated fifteen terms therein that can be utilized in the process of information retrieval needs.
Building Background Knowledge through Reading: Rethinking Text Sets
ERIC Educational Resources Information Center
Lupo, Sarah M.; Strong, John Z.; Lewis, William; Walpole, Sharon; McKenna, Michael C.
2018-01-01
To increase reading volume and help students access challenging texts, the authors propose a four-dimensional framework for text sets. The quad text set framework is designed around a target text: a challenging content area text, such as a canonical literary work, research article, or historical primary source document. The three remaining…
SureChEMBL: a large-scale, chemically annotated patent document database.
Papadatos, George; Davies, Mark; Dedman, Nathan; Chambers, Jon; Gaulton, Anna; Siddle, James; Koks, Richard; Irvine, Sean A; Pettersson, Joe; Goncharoff, Nicko; Hersey, Anne; Overington, John P
2016-01-04
SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of annotated compound-patent associations, complemented with sophisticated combined structure and keyword-based search capabilities against the compound repository and patent document corpus; given the wealth of knowledge hidden in patent documents, analysis of SureChEMBL data has immediate applications in drug discovery, medicinal chemistry and other commercial areas of chemical science. Currently, the database contains 17 million compounds extracted from 14 million patent documents. Access is available through a dedicated web-based interface and data downloads at: https://www.surechembl.org/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
SureChEMBL: a large-scale, chemically annotated patent document database
Papadatos, George; Davies, Mark; Dedman, Nathan; Chambers, Jon; Gaulton, Anna; Siddle, James; Koks, Richard; Irvine, Sean A.; Pettersson, Joe; Goncharoff, Nicko; Hersey, Anne; Overington, John P.
2016-01-01
SureChEMBL is a publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of annotated compound-patent associations, complemented with sophisticated combined structure and keyword-based search capabilities against the compound repository and patent document corpus; given the wealth of knowledge hidden in patent documents, analysis of SureChEMBL data has immediate applications in drug discovery, medicinal chemistry and other commercial areas of chemical science. Currently, the database contains 17 million compounds extracted from 14 million patent documents. Access is available through a dedicated web-based interface and data downloads at: https://www.surechembl.org/. PMID:26582922
Kok, Maaike; van der Werff, Gertruud F M; Geerling, Jenske I; Ruivenkamp, Jaap; Groothoff, Wies; van der Velden, Annette W G; Thoma, Monique; Talsma, Jaap; Costongs, Louk G P; Gans, Reinold O B; de Graeff, Pauline; Reyners, Anna K L
2018-05-24
Advance Care Planning (ACP) and its documentation, accessible to healthcare professionals regardless of where patients are staying, can improve palliative care. ACP is usually performed by trained facilitators. However, ACP conversations would be more tailored to a patient's specific situation if held by a patient's clinical healthcare team. This study assesses the feasibility of ACP by a patient's clinical healthcare team, and analyses the documented information including current and future problems within the palliative care domains. This multicentre study was conducted at the three Groningen Palliative Care Network hospitals in the Netherlands. Patients discharged from hospital with a terminal care indication received an ACP document from clinical staff (non-palliative care trained staff at hospitals I and II; specialist palliative care nurses at hospital III) after they had held ACP conversations. An anonymised copy of this ACP document was analysed. Documentation rates of patient and contact details were investigated, and documentation of current and future problems were analysed both quantitatively and qualitatively. One hundred sixty ACP documents were received between April 2013 and December 2014, with numbers increasing for each consecutive 3-month time period. Advance directives were frequently documented (82%). Documentation rates of current problems in the social (24%), psychological (27%) and spiritual (16%) domains were low compared to physical problems (85%) at hospital I and II, but consistently high (> 85%) at hospital III. Of 545 documented anticipated problems, 92% were physical or care related in nature, 2% social, 5% psychological, and < 1% spiritual. Half of the anticipated non-physical problems originated from hospital III. Hospital-initiated ACP documentation by a patient's clinical healthcare team is feasible: the number of documents received per time period increased throughout the study period, and overall, documentation rates were high. Nonetheless, symptom documentation predominantly regards physical symptoms. With the involvement of specialist palliative care nurses, psychological and spiritual problems are addressed more frequently. Whether palliative care education for non-palliative care experts will improve identification and documentation of non-physical problems remains to be investigated.
Samal, Lipika; D'Amore, John D; Bates, David W; Wright, Adam
2017-11-01
Clinical decision support tools for risk prediction are readily available, but typically require workflow interruptions and manual data entry so are rarely used. Due to new data interoperability standards for electronic health records (EHRs), other options are available. As a clinical case study, we sought to build a scalable, web-based system that would automate calculation of kidney failure risk and display clinical decision support to users in primary care practices. We developed a single-page application, web server, database, and application programming interface to calculate and display kidney failure risk. Data were extracted from the EHR using the Consolidated Clinical Document Architecture interoperability standard for Continuity of Care Documents (CCDs). EHR users were presented with a noninterruptive alert on the patient's summary screen and a hyperlink to details and recommendations provided through a web application. Clinic schedules and CCDs were retrieved using existing application programming interfaces to the EHR, and we provided a clinical decision support hyperlink to the EHR as a service. We debugged a series of terminology and technical issues. The application was validated with data from 255 patients and subsequently deployed to 10 primary care clinics where, over the course of 1 year, 569 533 CCD documents were processed. We validated the use of interoperable documents and open-source components to develop a low-cost tool for automated clinical decision support. Since Consolidated Clinical Document Architecture-based data extraction extends to any certified EHR, this demonstrates a successful modular approach to clinical decision support. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Mihara, Naoki; Ueda, Kanayo; Manabe, Shirou; Takeda, Toshihiro; Shimai, Yoshie; Horishima, Hiroyuki; Murata, Taizo; Fujii, Ayumi; Matsumura, Yasushi
2015-01-01
Recently one patient received care from several hospitals at around the same time. When the patient visited a new hospital, the new hospital's physician tried to get patient information the previous hospital. Thus, patient information is frequently exchanged between them. Many types of healthcare facilities have implemented an electronic medical record system, but in Japan, healthcare information exchange is often done by paper. In other words, after a clinical doctor prints a referral document and sends it to another hospital's physician, another hospital's doctor receives it and scans to store the EMR in his own hospital's system. It is a wasteful way to exchange healthcare information about a patient. In order to solve this problem, we have developed a cross-institutional document exchange system using clinical document architecture (CDA) with a virtual printing method.
Adaptive removal of background and white space from document images using seam categorization
NASA Astrophysics Data System (ADS)
Fillion, Claude; Fan, Zhigang; Monga, Vishal
2011-03-01
Document images are obtained regularly by rasterization of document content and as scans of printed documents. Resizing via background and white space removal is often desired for better consumption of these images, whether on displays or in print. While white space and background are easy to identify in images, existing methods such as naïve removal and content aware resizing (seam carving) each have limitations that can lead to undesirable artifacts, such as uneven spacing between lines of text or poor arrangement of content. An adaptive method based on image content is hence needed. In this paper we propose an adaptive method to intelligently remove white space and background content from document images. Document images are different from pictorial images in structure. They typically contain objects (text letters, pictures and graphics) separated by uniform background, which include both white paper space and other uniform color background. Pixels in uniform background regions are excellent candidates for deletion if resizing is required, as they introduce less change in document content and style, compared with deletion of object pixels. We propose a background deletion method that exploits both local and global context. The method aims to retain the document structural information and image quality.
Discovery of novel biomarkers and phenotypes by semantic technologies
2013-01-01
Background Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents. Results This paper reports on a pilot experiment to discover potential novel biomarkers and phenotypes for diabetes and obesity by self-organized text mining of about 120,000 PubMed abstracts, public clinical trial summaries, and internal Merck research documents. These documents were directly analyzed by the InfoCodex semantic engine, without prior human manipulations such as parsing. Recall and precision against established, but different benchmarks lie in ranges up to 30% and 50% respectively. Retrieval of known entities missed by other traditional approaches could be demonstrated. Finally, the InfoCodex semantic engine was shown to discover new diabetes and obesity biomarkers and phenotypes. Amongst these were many interesting candidates with a high potential, although noticeable noise (uninteresting or obvious terms) was generated. Conclusions The reported approach of employing autonomous self-organising semantic engines to aid biomarker discovery, supplemented by appropriate manual curation processes, shows promise and has potential to impact, conservatively, a faster alternative to vocabulary processes dependent on humans having to read and analyze all the texts. More optimistically, it could impact pharmaceutical research, for example to shorten time-to-market of novel drugs, or speed up early recognition of dead ends and adverse reactions. PMID:23402646
49 CFR 1104.2 - Document specifications.
Code of Federal Regulations, 2014 CFR
2014-10-01
... to facilitate automated processing in document sheet feeders, original documents of more than one... textual submissions. Use of color in filings is limited to images such as graphs, maps and photographs. To facilitate automated processing of color pages, color pages may not be inserted among pages containing text...
49 CFR 1104.2 - Document specifications.
Code of Federal Regulations, 2010 CFR
2010-10-01
... to facilitate automated processing in document sheet feeders, original documents of more than one... textual submissions. Use of color in filings is limited to images such as graphs, maps and photographs. To facilitate automated processing of color pages, color pages may not be inserted among pages containing text...
49 CFR 1104.2 - Document specifications.
Code of Federal Regulations, 2012 CFR
2012-10-01
... to facilitate automated processing in document sheet feeders, original documents of more than one... textual submissions. Use of color in filings is limited to images such as graphs, maps and photographs. To facilitate automated processing of color pages, color pages may not be inserted among pages containing text...
49 CFR 1104.2 - Document specifications.
Code of Federal Regulations, 2011 CFR
2011-10-01
... to facilitate automated processing in document sheet feeders, original documents of more than one... textual submissions. Use of color in filings is limited to images such as graphs, maps and photographs. To facilitate automated processing of color pages, color pages may not be inserted among pages containing text...
49 CFR 1104.2 - Document specifications.
Code of Federal Regulations, 2013 CFR
2013-10-01
... to facilitate automated processing in document sheet feeders, original documents of more than one... textual submissions. Use of color in filings is limited to images such as graphs, maps and photographs. To facilitate automated processing of color pages, color pages may not be inserted among pages containing text...
Enhancement of Text Representations Using Related Document Titles.
ERIC Educational Resources Information Center
Salton, G.; Zhang, Y.
1986-01-01
Briefly reviews various methodologies for constructing enhanced document representations, discusses their general lack of usefulness, and describes a method of document indexing which uses title words taken from bibliographically related items. Evaluation of this process indicates that it is not sufficiently reliable to warrant incorporation into…
Machine learning-based coreference resolution of concepts in clinical documents
Ware, Henry; Mullett, Charles J; El-Rawas, Oussama
2012-01-01
Objective Coreference resolution of concepts, although a very active area in the natural language processing community, has not yet been widely applied to clinical documents. Accordingly, the 2011 i2b2 competition focusing on this area is a timely and useful challenge. The objective of this research was to collate coreferent chains of concepts from a corpus of clinical documents. These concepts are in the categories of person, problems, treatments, and tests. Design A machine learning approach based on graphical models was employed to cluster coreferent concepts. Features selected were divided into domain independent and domain specific sets. Training was done with the i2b2 provided training set of 489 documents with 6949 chains. Testing was done on 322 documents. Results The learning engine, using the un-weighted average of three different measurement schemes, resulted in an F measure of 0.8423 where no domain specific features were included and 0.8483 where the feature set included both domain independent and domain specific features. Conclusion Our machine learning approach is a promising solution for recognizing coreferent concepts, which in turn is useful for practical applications such as the assembly of problem and medication lists from clinical documents. PMID:22582205
ANMCO/SIC Consensus Document: cardiology networks for outpatient heart failure care
Gulizia, Michele Massimo; Di Lenarda, Andrea; Mortara, Andrea; Battistoni, Ilaria; De Maria, Renata; Gabriele, Michele; Iacoviello, Massimo; Navazio, Alessandro; Pini, Daniela; Di Tano, Giuseppe; Marini, Marco; Ricci, Renato Pietro; Alunni, Gianfranco; Radini, Donatella; Metra, Marco; Romeo, Francesco
2017-01-01
Abstract Changing demographics and an increasing burden of multiple chronic comorbidities in Western countries dictate refocusing of heart failure (HF) services from acute in-hospital care to better support the long inter-critical out-of- hospital phases of HF. In Italy, as well as in other countries, needs of the HF population are not adequately addressed by current HF outpatient services, as documented by differences in age, gender, comorbidities and recommended therapies between patients discharged for acute hospitalized HF and those followed-up at HF clinics. The Italian Working Group on Heart Failure has drafted a guidance document for the organisation of a national HF care network. Aims of the document are to describe tasks and requirements of the different health system points of contact for HF patients, and to define how diagnosis, management and care processes should be documented and shared among health-care professionals. The document classifies HF outpatient clinics in three groups: (i) community HF clinics, devoted to management of stable patients in strict liaison with primary care, periodic re-evaluation of emerging clinical needs and prompt treatment of impending destabilizations, (ii) hospital HF clinics, that target both new onset and chronic HF patients for diagnostic assessment, treatment planning and early post-discharge follow-up. They act as main referral for general internal medicine units and community clinics, and (iii) advanced HF clinics, directed at patients with severe disease or persistent clinical instability, candidates to advanced treatment options such as heart transplant or mechanical circulatory support. Those different types of HF clinics are integrated in a dedicated network for management of HF patients on a regional basis, according to geographic features. By sharing predefined protocols and communication systems, these HF networks integrate multi-professional providers to ensure continuity of care and patient empowerment. In conclusion, This guidance document details roles and interactions of cardiology specialists, so as to best exploit the added value of their input in the care of HF patients and is intended to promote a more efficient and effective organization of HF services. PMID:28751837
Simple-random-sampling-based multiclass text classification algorithm.
Liu, Wuying; Wang, Lin; Yi, Mianzhu
2014-01-01
Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.
Communication pitfalls of traditional history and physical write-up documentation
Brown, Jeffrey L
2017-01-01
Background An unofficial standardized “write-up” outline is commonly used for documenting history and physical examinations, giving oral presentations, and teaching clinical skills. Despite general acceptance, there is an apparent discrepancy between the way clinical encounters are conducted and how they are documented. Methods Fifteen medical school websites were randomly selected from search-engine generated lists. One example of a history and physical write-up from each of six sites, one teaching outline from each of nine additional sites, and recommendations for documentation made in two commonly used textbooks were compared for similarities and differences. Results Except for minor variations in documenting background information, all sampled materials utilized the same standardized format. When the examiners’ early perceptions of the patients’ degree of illness or level of distress were described, they were categorized as “general appearance” within the physical findings. Contrary to clinical practice, none of the examples or recommendations documented these early perceptions before chief concerns and history were presented. Discussion An examiner’s initial perceptions of a patient’s affect, degree of illness, and level of distress can influence the content of the history, triage decisions, and prioritization of likely diagnoses. When chief concerns and history are shared without benefit of this information, erroneous assumptions and miscommunications can result. Conclusion This survey confirms common use of a standardized outline for documenting, communicating, and teaching history-taking and physical examination protocol. The present outline shares early observations out of clinical sequence and may provide inadequate context for accurate interpretation of chief concerns and history. Corrective actions include modifying the documentation sequence to conform to clinical practice and teaching contextual methodology for sharing patient information. PMID:28096709
Basic test framework for the evaluation of text line segmentation and text parameter extraction.
Brodić, Darko; Milivojević, Dragan R; Milivojević, Zoran
2010-01-01
Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms.
Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction
Brodić, Darko; Milivojević, Dragan R.; Milivojević, Zoran
2010-01-01
Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms. PMID:22399932
Using natural language processing to identify problem usage of prescription opioids.
Carrell, David S; Cronkite, David; Palmer, Roy E; Saunders, Kathleen; Gross, David E; Masters, Elizabeth T; Hylan, Timothy R; Von Korff, Michael
2015-12-01
Accurate and scalable surveillance methods are critical to understand widespread problems associated with misuse and abuse of prescription opioids and for implementing effective prevention and control measures. Traditional diagnostic coding incompletely documents problem use. Relevant information for each patient is often obscured in vast amounts of clinical text. We developed and evaluated a method that combines natural language processing (NLP) and computer-assisted manual review of clinical notes to identify evidence of problem opioid use in electronic health records (EHRs). We used the EHR data and text of 22,142 patients receiving chronic opioid therapy (≥70 days' supply of opioids per calendar quarter) during 2006-2012 to develop and evaluate an NLP-based surveillance method and compare it to traditional methods based on International Classification of Disease, Ninth Edition (ICD-9) codes. We developed a 1288-term dictionary for clinician mentions of opioid addiction, abuse, misuse or overuse, and an NLP system to identify these mentions in unstructured text. The system distinguished affirmative mentions from those that were negated or otherwise qualified. We applied this system to 7336,445 electronic chart notes of the 22,142 patients. Trained abstractors using a custom computer-assisted software interface manually reviewed 7751 chart notes (from 3156 patients) selected by the NLP system and classified each note as to whether or not it contained textual evidence of problem opioid use. Traditional diagnostic codes for problem opioid use were found for 2240 (10.1%) patients. NLP-assisted manual review identified an additional 728 (3.1%) patients with evidence of clinically diagnosed problem opioid use in clinical notes. Inter-rater reliability among pairs of abstractors reviewing notes was high, with kappa=0.86 and 97% agreement for one pair, and kappa=0.71 and 88% agreement for another pair. Scalable, semi-automated NLP methods can efficiently and accurately identify evidence of problem opioid use in vast amounts of EHR text. Incorporating such methods into surveillance efforts may increase prevalence estimates by as much as one-third relative to traditional methods. Copyright © 2015. Published by Elsevier Ireland Ltd.
Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor
2015-01-01
Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the ShARe/CLEF (https://sites.google.com/site/shareclefehealth/data) and i2b2 (https://i2b2.org/NLP/DataSets/) corpora needs to be requested with the individual corpus providers.
Quality of outpatient clinical notes: a stakeholder definition derived through qualitative research.
Hanson, Janice L; Stephens, Mark B; Pangaro, Louis N; Gimbel, Ronald W
2012-11-19
There are no empirically-grounded criteria or tools to define or benchmark the quality of outpatient clinical documentation. Outpatient clinical notes document care, communicate treatment plans and support patient safety, medical education, medico-legal investigations and reimbursement. Accurately describing and assessing quality of clinical documentation is a necessary improvement in an increasingly team-based healthcare delivery system. In this paper we describe the quality of outpatient clinical notes from the perspective of multiple stakeholders. Using purposeful sampling for maximum diversity, we conducted focus groups and individual interviews with clinicians, nursing and ancillary staff, patients, and healthcare administrators at six federal health care facilities between 2009 and 2011. All sessions were audio-recorded, transcribed and qualitatively analyzed using open, axial and selective coding. The 163 participants included 61 clinicians, 52 nurse/ancillary staff, 31 patients and 19 administrative staff. Three organizing themes emerged: 1) characteristics of quality in clinical notes, 2) desired elements within the clinical notes and 3) system supports to improve the quality of clinical notes. We identified 11 codes to describe characteristics of clinical notes, 20 codes to describe desired elements in quality clinical notes and 11 codes to describe clinical system elements that support quality when writing clinical notes. While there was substantial overlap between the aspects of quality described by the four stakeholder groups, only clinicians and administrators identified ease of translation into billing codes as an important characteristic of a quality note. Only patients rated prioritization of their medical problems as an aspect of quality. Nurses included care and education delivered to the patient, information added by the patient, interdisciplinary information, and infection alerts as important content. Perspectives of these four stakeholder groups provide a comprehensive description of quality in outpatient clinical documentation. The resulting description of characteristics and content necessary for quality notes provides a research-based foundation for assessing the quality of clinical documentation in outpatient health care settings.
2014-01-01
Background Clinical decision support (CDS) has been shown to be effective in improving medical safety and quality but there is little information on how telephone triage benefits from CDS. The aim of our study was to compare triage documentation quality associated with the use of a clinical decision support tool, ExpertRN©. Methods We examined 50 triage documents before and after a CDS tool was used in nursing triage. To control for the effects of CDS training we had an additional control group of triage documents created by nurses who were trained in the CDS tool, but who did not use it in selected notes. The CDS intervention cohort of triage notes was compared to both the pre-CDS notes and the CDS trained (but not using CDS) cohort. Cohorts were compared using the documentation standards of the American Academy of Ambulatory Care Nursing (AAACN). We also compared triage note content (documentation of associated positive and negative features relating to the symptoms, self-care instructions, and warning signs to watch for), and documentation defects pertinent to triage safety. Results Three of five AAACN documentation standards were significantly improved with CDS. There was a mean of 36.7 symptom features documented in triage notes for the CDS group but only 10.7 symptom features in the pre-CDS cohort (p < 0.0001) and 10.2 for the cohort that was CDS-trained but not using CDS (p < 0.0001). The difference between the mean of 10.2 symptom features documented in the pre-CDS and the mean of 10.7 symptom features documented in the CDS-trained but not using was not statistically significant (p = 0.68). Conclusions CDS significantly improves triage note documentation quality. CDS-aided triage notes had significantly more information about symptoms, warning signs and self-care. The changes in triage documentation appeared to be the result of the CDS alone and not due to any CDS training that came with the CDS intervention. Although this study shows that CDS can improve documentation, further study is needed to determine if it results in improved care. PMID:24645674
North, Frederick; Richards, Debra D; Bremseth, Kimberly A; Lee, Mary R; Cox, Debra L; Varkey, Prathibha; Stroebel, Robert J
2014-03-20
Clinical decision support (CDS) has been shown to be effective in improving medical safety and quality but there is little information on how telephone triage benefits from CDS. The aim of our study was to compare triage documentation quality associated with the use of a clinical decision support tool, ExpertRN©. We examined 50 triage documents before and after a CDS tool was used in nursing triage. To control for the effects of CDS training we had an additional control group of triage documents created by nurses who were trained in the CDS tool, but who did not use it in selected notes. The CDS intervention cohort of triage notes was compared to both the pre-CDS notes and the CDS trained (but not using CDS) cohort. Cohorts were compared using the documentation standards of the American Academy of Ambulatory Care Nursing (AAACN). We also compared triage note content (documentation of associated positive and negative features relating to the symptoms, self-care instructions, and warning signs to watch for), and documentation defects pertinent to triage safety. Three of five AAACN documentation standards were significantly improved with CDS. There was a mean of 36.7 symptom features documented in triage notes for the CDS group but only 10.7 symptom features in the pre-CDS cohort (p < 0.0001) and 10.2 for the cohort that was CDS-trained but not using CDS (p < 0.0001). The difference between the mean of 10.2 symptom features documented in the pre-CDS and the mean of 10.7 symptom features documented in the CDS-trained but not using was not statistically significant (p = 0.68). CDS significantly improves triage note documentation quality. CDS-aided triage notes had significantly more information about symptoms, warning signs and self-care. The changes in triage documentation appeared to be the result of the CDS alone and not due to any CDS training that came with the CDS intervention. Although this study shows that CDS can improve documentation, further study is needed to determine if it results in improved care.
Literature evidence in open targets - a target validation platform.
Kafkas, Şenay; Dunham, Ian; McEntyre, Johanna
2017-06-06
We present the Europe PMC literature component of Open Targets - a target validation platform that integrates various evidence to aid drug target identification and validation. The component identifies target-disease associations in documents and ranks the documents based on their confidence from the Europe PMC literature database, by using rules utilising expert-provided heuristic information. The confidence score of a given document represents how valuable the document is in the scope of target validation for a given target-disease association by taking into account the credibility of the association based on the properties of the text. The component serves the platform regularly with the up-to-date data since December, 2015. Currently, there are a total number of 1168365 distinct target-disease associations text mined from >26 million PubMed abstracts and >1.2 million Open Access full text articles. Our comparative analyses on the current available evidence data in the platform revealed that 850179 of these associations are exclusively identified by literature mining. This component helps the platform's users by providing the most relevant literature hits for a given target and disease. The text mining evidence along with the other types of evidence can be explored visually through https://www.targetvalidation.org and all the evidence data is available for download in json format from https://www.targetvalidation.org/downloads/data .
Documenting clinical pharmacist intervention before and after the introduction of a web-based tool.
Nurgat, Zubeir A; Al-Jazairi, Abdulrazaq S; Abu-Shraie, Nada; Al-Jedai, Ahmed
2011-04-01
To develop a database for documenting pharmacist intervention through a web-based application. The secondary endpoint was to determine if the new, web-based application provides any benefits with regards to documentation compliance by clinical pharmacists and ease of calculating cost savings compared with our previous method of documenting pharmacist interventions. A tertiary care hospital in Saudi Arabia. The documentation of interventions using a web-based documentation application was retrospectively compared with previous methods of documentation of clinical pharmacists' interventions (multi-user PC software). The number and types of interventions recorded by pharmacists, data mining of archived data, efficiency, cost savings, and the accuracy of the data generated. The number of documented clinical interventions increased from 4,926, using the multi-user PC software, to 6,840 for the web-based application. On average, we observed 653 interventions per clinical pharmacist using the web-based application, which showed an increase compared to an average of 493 interventions using the old multi-user PC software. However, using a paired Student's t-test there was no statistical significance difference between the two means (P = 0.201). Using a χ² test, which captured management level and the type of system used, we found a strong effect of management level (P < 2.2 × 10⁻¹⁶) on the number of documented interventions. We also found a moderately significant relationship between educational level and the number of interventions documented (P = 0.045). The mean ± SD time required to document an intervention using the web-based application was 66.55 ± 8.98 s. Using the web-based application, 29.06% of documented interventions resulted in cost-savings, while using the multi-user PC software only 4.75% of interventions did so. The majority of cost savings across both platforms resulted from the discontinuation of unnecessary drugs and a change in dosage regimen. Data collection using the web-based application was consistently more complete when compared to the multi-user PC software. The web-based application is an efficient system for documenting pharmacist interventions. Its flexibility and accessibility, as well as its detailed report functionality is a useful tool that will hopefully encourage other primary and secondary care facilities to adopt similar applications.
40 CFR Appendix A to Part 66 - Technical Support Document
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 15 2010-07-01 2010-07-01 false Technical Support Document A Appendix A to Part 66 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS...—Technical Support Document Note: For text of appendix A see appendix A to part 67. ...
[Development and integration of the Oncological Documentation System ODS].
Raab, G; van Den Bergh, M
2001-08-01
To simplify clinical routine and to improve medical quality without exceeding the existing resources. Intensifying communication and cooperation between all institutions of patients' health care. The huge amount of documentation work of physicians can no longer be done without modern tools of paperless data processing. The development of ODS was a tight cooperation between physician and technician which resulted in a mutual understanding and led to a high level of user convenience. - At present all cases of gynecology, especially gynecologic oncology can be documented and processed by ODS. Users easily will adopt the system as data entry within different program areas follows the same rules. In addition users can choose between an individual input of data and assistants guiding them through highly specific areas of documentation. ODS is a modern, modular structured and very fast multiuser database environment for in- and outpatient documentation. It automatically generates a lot of reports for clinical day to day business. Statistical routines will help the user reflecting his work and its quality. Documentation of clinical trials according to the GCP guidelines can be done by ODS using the internet or offline datasharing. As ODS is the synthesis of a computer based patient administration system and an oncological documentation database, it represents the basis for the construction of the electronical patient chart as well as the digital documentation of clinical trials. The introduction of this new technology to physicians and nurses has to be done slowly and carefully, in order to increase motivation and to improve the results.
Pace, Valerio; Farooqi, Omar; Kennedy, James; Park, Chang; Cowan, Joseph
2018-05-01
As a tertiary referral centre of spinal surgery, the Royal National Orthopaedic Hospital (RNOH) handles hundreds of spinal cases a year, often with complex pathology and complex care needs. Despite this, issues were raised at the RNOH following lack of sufficient documentation of preoperative and postoperative clinical findings in spinal patients undergoing major surgery. This is not in keeping with guidelines provided by the Royal College of Surgeons. The authors believe that a standardised clerking pro forma for surgical spinal patients admitted to RNOH would improve the quality of care provided. Therefore, the use of a standard clerking pro forma for all surgical spinal patients could be a useful tool enabling improvements in patients care and safety in keeping with General Medical Council/National Institute for Health and Care Excellence guidelines. An audit (with closure of loop) looking into the quality of the preoperative and postoperative clinical documentation for surgical spinal patients was carried out at the RNOH in 2016 (retrospective case note audit comparing preintervention and postintervention documentation standards). Our standardised pro forma allows clinicians to best utilise their time and standardises examination to be compared in a temporal manner during the patients admission and care. It is the authors understanding that this work is a unique study looking at the quality of the admission clerking for surgical spinal patients. Evidently, there remains work to be done for the widespread utilisation of the pro forma. Early results suggest that such a pro forma can significantly improve the documentation in admission clerking with improvements in the quality of care for patients. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Implementation of the common phrase index method on the phrase query for information retrieval
NASA Astrophysics Data System (ADS)
Fatmawati, Triyah; Zaman, Badrus; Werdiningsih, Indah
2017-08-01
As the development of technology, the process of finding information on the news text is easy, because the text of the news is not only distributed in print media, such as newspapers, but also in electronic media that can be accessed using the search engine. In the process of finding relevant documents on the search engine, a phrase often used as a query. The number of words that make up the phrase query and their position obviously affect the relevance of the document produced. As a result, the accuracy of the information obtained will be affected. Based on the outlined problem, the purpose of this research was to analyze the implementation of the common phrase index method on information retrieval. This research will be conducted in English news text and implemented on a prototype to determine the relevance level of the documents produced. The system is built with the stages of pre-processing, indexing, term weighting calculation, and cosine similarity calculation. Then the system will display the document search results in a sequence, based on the cosine similarity. Furthermore, system testing will be conducted using 100 documents and 20 queries. That result is then used for the evaluation stage. First, determine the relevant documents using kappa statistic calculation. Second, determine the system success rate using precision, recall, and F-measure calculation. In this research, the result of kappa statistic calculation was 0.71, so that the relevant documents are eligible for the system evaluation. Then the calculation of precision, recall, and F-measure produces precision of 0.37, recall of 0.50, and F-measure of 0.43. From this result can be said that the success rate of the system to produce relevant documents is low.
TERMTrial--terminology-based documentation systems for cooperative clinical trials.
Merzweiler, A; Weber, R; Garde, S; Haux, R; Knaup-Gregori, P
2005-04-01
Within cooperative groups of multi-center clinical trials a standardized documentation is a prerequisite for communication and sharing of data. Standardizing documentation systems means standardizing the underlying terminology. The management and consistent application of terminology systems is a difficult and fault-prone task, which should be supported by appropriate software tools. Today, documentation systems for clinical trials are often implemented as so-called Remote-Data-Entry-Systems (RDE-systems). Although there are many commercial systems, which support the development of RDE-systems there is none offering a comprehensive terminological support. Therefore, we developed the software system TERMTrial which consists of a component for the definition and management of terminology systems for cooperative groups of clinical trials and two components for the terminology-based automatic generation of trial databases and terminology-based interactive design of electronic case report forms (eCRFs). TERMTrial combines the advantages of remote data entry with a comprehensive terminological control.
Design and realization of the compound text-based test questions library management system
NASA Astrophysics Data System (ADS)
Shi, Lei; Feng, Lin; Zhao, Xin
2011-12-01
The test questions library management system is the essential part of the on-line examination system. The basic demand for which is to deal with compound text including information like images, formulae and create the corresponding Word documents. Having compared with the two current solutions of creating documents, this paper presents a design proposal of Word Automation mechanism based on OLE/COM technology, and discusses the way of Word Automation application in detail and at last provides the operating results of the system which have high reference value in improving the generated efficiency of project documents and report forms.
Semantic Clinical Guideline Documents
Eriksson, Henrik; Tu, Samson W.; Musen, Mark
2005-01-01
Decision-support systems based on clinical practice guidelines can support physicians and other health-care personnel in the process of following best practice consistently. A knowledge-based approach to represent guidelines makes it possible to encode computer-interpretable guidelines in a formal manner, perform consistency checks, and use the guidelines directly in decision-support systems. Decision-support authors and guideline users require guidelines in human-readable formats in addition to computer-interpretable ones (e.g., for guideline review and quality assurance). We propose a new document-oriented information architecture that combines knowledge-representation models with electronic and paper documents. The approach integrates decision-support modes with standard document formats to create a combined clinical-guideline model that supports on-line viewing, printing, and decision support. PMID:16779037
Introducing Text Analytics as a Graduate Business School Course
ERIC Educational Resources Information Center
Edgington, Theresa M.
2011-01-01
Text analytics refers to the process of analyzing unstructured data from documented sources, including open-ended surveys, blogs, and other types of web dialog. Text analytics has enveloped the concept of text mining, an analysis approach influenced heavily from data mining. While text mining has been covered extensively in various computer…
Carrell, David S; Cronkite, David J; Malin, Bradley A; Aberdeen, John S; Hirschman, Lynette
2016-08-05
Clinical text contains valuable information but must be de-identified before it can be used for secondary purposes. Accurate annotation of personally identifiable information (PII) is essential to the development of automated de-identification systems and to manual redaction of PII. Yet the accuracy of annotations may vary considerably across individual annotators and annotation is costly. As such, the marginal benefit of incorporating additional annotators has not been well characterized. This study models the costs and benefits of incorporating increasing numbers of independent human annotators to identify the instances of PII in a corpus. We used a corpus with gold standard annotations to evaluate the performance of teams of annotators of increasing size. Four annotators independently identified PII in a 100-document corpus consisting of randomly selected clinical notes from Family Practice clinics in a large integrated health care system. These annotations were pooled and validated to generate a gold standard corpus for evaluation. Recall rates for all PII types ranged from 0.90 to 0.98 for individual annotators to 0.998 to 1.0 for teams of three, when meas-ured against the gold standard. Median cost per PII instance discovered during corpus annotation ranged from $ 0.71 for an individual annotator to $ 377 for annotations discovered only by a fourth annotator. Incorporating a second annotator into a PII annotation process reduces unredacted PII and improves the quality of annotations to 0.99 recall, yielding clear benefit at reasonable cost; the cost advantages of annotation teams larger than two diminish rapidly.
Workneh, Gelane; Scherzer, Leah; Kirk, Brianna; Draper, Heather R; Anabwani, Gabriel; Wanless, R Sebastian; Jibril, Haruna; Gaetsewe, Neo; Thuto, Boitumelo; Tolle, Michael A
2013-01-01
Clinical mentoring by providers skilled in HIV management has been identified as a cornerstone of scaling-up antiretroviral treatment in Africa, particularly in settings where expertise is limited. However, little data exist on its effectiveness and impact on improving the quality-of-care and clinical outcomes, especially for HIV-infected children. Since 2008, the Botswana-Baylor Children's Clinical Centre of Excellence (COE) has operated an outreach mentoring programme at clinical sites around Botswana. This study is a retrospective review of 374 paediatric charts at four outreach mentoring sites (Mochudi, Phutadikobo, Molepolole and Thamaga) evaluating the effectiveness of the programme as reflected in a number of clinically-relevant areas. Charts from one visit prior to initiation of mentoring and from one visit after approximately one year of mentoring were assessed for statistically-significant differences (p<0.05) in the documentation of clinically-relevant indicators. Mochudi showed notable improvements in all indicators analysed, with particular improvements in documentation of pill count, viral load (VL) results, correct laboratory monitoring and correct antiretroviral therapy (ART) dosing (p<0.0001, p<0.0001, p<0.0001 and p<0.0001, respectively). Broad and substantial improvements were also seen in Molepolole, with the most improvement in disclosure documentation of all four sites. At Thamaga, improvements were restricted to CD4 documentation (p<0.001), recent VL and documented pill count (p<0.05 and p<0.05, respectively). Phuthadikobo showed the least amount of improvement across indicators, with only VL documentation and correct ART dosing showing statistically-significant improvements (p<0.05 and p<0.0001, respectively). These findings suggest that clinical mentoring may assist improvements in a number of important areas, including ART dosing and monitoring; adherence assessment and assurance; and disclosure. Clinical mentoring may be a valuable tool in scale-up of quality paediatric HIV care-and-treatment outside specialised centres. Further study will help refine approaches to clinical mentoring, including assuring mentoring translates into improved clinical outcomes for HIV-infected children.
Jacobs, Carmel; Graham, Ian D; Makarski, Julie; Chassé, Michaël; Fergusson, Dean; Hutton, Brian; Clemons, Mark
2014-01-01
Consensus statements and clinical practice guidelines are widely available for enhancing the care of cancer patients. Despite subtle differences in their definition and purpose, these terms are often used interchangeably. We systematically assessed the methodological quality of consensus statements and clinical practice guidelines published in three commonly read, geographically diverse, cancer-specific journals. Methods Consensus statements and clinical practice guidelines published between January 2005 and September 2013 in Current Oncology, European Journal of Cancer and Journal of Clinical Oncology were evaluated. Each publication was assessed using the Appraisal of Guidelines for Research and Evaluation II (AGREE II) rigour of development and editorial independence domains. For assessment of transparency of document development, 7 additional items were taken from the Institute of Medicine's standards for practice guidelines and the Journal of Clinical Oncology guidelines for authors of guidance documents. Consensus statements and clinical practice guidelines published between January 2005 and September 2013 in Current Oncology, European Journal of Cancer and Journal of Clinical Oncology were evaluated. Each publication was assessed using the Appraisal of Guidelines for Research and Evaluation II (AGREE II) rigour of development and editorial independence domains. For assessment of transparency of document development, 7 additional items were taken from the Institute of Medicine's standards for practice guidelines and the Journal of Clinical Oncology guidelines for authors of guidance documents. Thirty-four consensus statements and 67 clinical practice guidelines were evaluated. The rigour of development score for consensus statements over the three journals was 32% lower than that of clinical practice guidelines. The editorial independence score was 15% lower for consensus statements than clinical practice guidelines. One journal scored consistently lower than the others over both domains. No journals adhered to all the items related to the transparency of document development. One journal's consensus statements endorsed a product made by the sponsoring pharmaceutical company in 64% of cases. Guidance documents are an essential part of oncology care and should be subjected to a rigorous and validated development process. Consensus statements had lower methodological quality than clinical practice guidelines using AGREE II. At a minimum, journals should ensure that that all consensus statements and clinical practice guidelines adhere to AGREE II criteria. Journals should consider explicitly requiring guidelines to declare pharmaceutical company sponsorship and to identify the sponsor's product to enhance transparency.
Detection of figure and caption pairs based on disorder measurements
NASA Astrophysics Data System (ADS)
Faure, Claudie; Vincent, Nicole
2010-01-01
Figures inserted in documents mediate a kind of information for which the visual modality is more appropriate than the text. A complete understanding of a figure often necessitates the reading of its caption or to establish a relationship with the main text using a numbered figure identifier which is replicated in the caption and in the main text. A figure and its caption are closely related; they constitute single multimodal components (FC-pair) that Document Image Analysis cannot extract with text and graphics segmentation. We propose a method to go further than the graphics and text segmentation in order to extract FC-pairs without performing a full labelling of the page components. Horizontal and vertical text lines are detected in the pages. The graphics are associated with selected text lines to initiate the detector of FC-pairs. Spatial and visual disorders are introduced to define a layout model in terms of properties. It enables to cope with most of the numerous spatial arrangements of graphics and text lines. The detector of FC-pairs performs operations in order to eliminate the layout disorder and assigns a quality value to each FC-pair. The processed documents were collected in medic@, the digital historical collection of the BIUM (Bibliothèque InterUniversitaire Médicale). A first set of 98 pages constitutes the design set. Then 298 pages were collected to evaluate the system. The performances are the result of a full process, from the binarisation of the digital images to the detection of FC-pairs.
Stieglitz, Rolf-Dieter; Haug, Achim; Fähndrich, Erdmann; Rösler, Michael; Trabert, Wolfgang
2017-01-01
The documentation of psychopathology is core to the clinical practice of the psychiatrist and clinical psychologist. However, both in initial as well as further training and specialization in their fields, this particular aspect of their work receives scanty attention only. Yet, for the past 50 years, the Association for Methodology and Documentation in Psychiatry (AMDP) System has been in existence and available as a tool to serve precisely the purpose of offering a systematic introduction to the terminology and documentation of psychopathology. The motivation for its development was based on the need for an assessment procedure for the reliable documentation of the effectiveness of newly developed psychopharmacological substances. Subsequently, the AMDP-System began to be applied in the context of investigations into a number of methodological issues in psychiatry (e.g., the frequency and specificity of particular symptoms, the comparison of rating scales). The System then became increasingly important also in clinical practice and, today, represents the most used instrument for the documentation of psychopathology in the German-speaking countries of Europe. This paper intends to offer an overview of the AMDP-System, its origins, design, and functionality. After an initial account of the history and development of the AMDP-System, the discussion will in turn focus on the System's underlying methodological principles, the transfer of clinical skills and competencies in its practical application, and its use in research and clinical practice. Finally, potential future areas of development in relation to the AMDP-System are explored.
Stieglitz, Rolf-Dieter; Haug, Achim; Fähndrich, Erdmann; Rösler, Michael; Trabert, Wolfgang
2017-01-01
The documentation of psychopathology is core to the clinical practice of the psychiatrist and clinical psychologist. However, both in initial as well as further training and specialization in their fields, this particular aspect of their work receives scanty attention only. Yet, for the past 50 years, the Association for Methodology and Documentation in Psychiatry (AMDP) System has been in existence and available as a tool to serve precisely the purpose of offering a systematic introduction to the terminology and documentation of psychopathology. The motivation for its development was based on the need for an assessment procedure for the reliable documentation of the effectiveness of newly developed psychopharmacological substances. Subsequently, the AMDP-System began to be applied in the context of investigations into a number of methodological issues in psychiatry (e.g., the frequency and specificity of particular symptoms, the comparison of rating scales). The System then became increasingly important also in clinical practice and, today, represents the most used instrument for the documentation of psychopathology in the German-speaking countries of Europe. This paper intends to offer an overview of the AMDP-System, its origins, design, and functionality. After an initial account of the history and development of the AMDP-System, the discussion will in turn focus on the System’s underlying methodological principles, the transfer of clinical skills and competencies in its practical application, and its use in research and clinical practice. Finally, potential future areas of development in relation to the AMDP-System are explored. PMID:28439242
Abboud, Salim E; Soriano, Stephanie; Abboud, Rayan; Patel, Indravadan; Davidson, Jon; Azar, Nami R; Nakamoto, Dean A
Preprocedural evaluation of patients in an interventional radiology (IR) clinic is a complex synthesis of physical examination and imaging findings, and as IR transitions to an independent clinical specialty, such evaluations will become an increasingly critical component of a successful IR practice and quality patient care. Prior research suggests that preprocedural evaluations increased patient's perceived quality of care and may improve procedural technical success rates. Appropriate documentation of a preprocedural evaluation in the medical record is also paramount for an interventional radiologist to add value and function as an effective member of a larger IR service and multidisciplinary health care team. The purpose of this study is to examine the quality of radiology resident notes for patients seen in an outpatient IR clinic at a single academic medical center before and after the adoption of clinic note template with reminders to include platelet count, international normalized ratio, glomerular filtration rate, and plan for periprocedural coagulation status. Before adoption of the template, platelet count, international normalized ratio, glomerular filtration rate and an appropriate plan for periprocedural coagulation status were documented in 72%, 82%, 42%, and 33% of patients, respectively. After adoption of the template, appropriate documentation of platelet count, international normalized ratio, and glomerular filtration rate increased to 96%, and appropriate plan for periprocedural coagulation status was documented in 83% of patients. Patient evaluation and clinical documentation skills may not be adequately practiced during radiology residency, and tools such as templates may help increase documentation quality by radiology residents. Copyright © 2017 Elsevier Inc. All rights reserved.
Effect of an obesity best practice alert on physician documentation and referral practices.
Fitzpatrick, Stephanie L; Dickins, Kirsten; Avery, Elizabeth; Ventrelle, Jennifer; Shultz, Aaron; Kishen, Ekta; Rothschild, Steven
2017-12-01
The Centers for Medicare & Medicaid Services Electronic Health Record Meaningful Use Incentive Program requires physicians to document body mass index (BMI) and a follow-up treatment plan for adult patients with BMI ≥ 25. To examine the effect of a best practice alert on physician documentation of obesity-related care and referrals to weight management treatment, in a cluster-randomized design, 14 primary care clinics at an academic medical center were randomized to best practice alert intervention (n = 7) or comparator (n = 7). The alert was triggered when both height and weight were entered and BMI was ≥30. Both intervention and comparator clinics could document meaningful use by selecting a nutrition education handout within the alert. Intervention clinics could also select a referral option from the list of clinic and community-based weight management programs embedded in the alert. Main outcomes were proportion of eligible patients with (1) obesity-related documentation and (2) referral. There were 26,471 total primary care encounters with 12,981 unique adult patients with BMI ≥ 30 during the 6-month study period. Documentation doubled (17 to 33%) with implementation of the alert. However, intervention clinics were not significantly more likely to refer patients to weight management than comparator clinics (2.8 vs. 1.3%, p = 0.07). Although the alert was associated with increased physician meaningful use compliance, it was not an effective strategy for improving patient access to weight management services. Further research is needed to understand system-level characteristics that influence obesity management in primary care.
Whittenburg, Luann; Meetim, Aunchisa
2016-01-01
An innovative nursing documentation project conducted at Bumrungrad International Hospital in Bangkok, Thailand demonstrated patient care continuity between nursing patient assessments and nursing Plans of Care using the Clinical Care Classification System (CCC). The project developed a new generation of interactive nursing Plans of Care using the six steps of the American Nurses Association (ANA) Nursing process and the MEDCIN® clinical knowledgebase to present CCC coded concepts as a natural by-product of a nurse's documentation process. The MEDCIN® clinical knowledgebase is a standardized point-of-care terminology intended for use in electronic health record systems. The CCC is an ANA recognized nursing terminology.
Readability Formulas and User Perceptions of Electronic Health Records Difficulty: A Corpus Study.
Zheng, Jiaping; Yu, Hong
2017-03-02
Electronic health records (EHRs) are a rich resource for developing applications to engage patients and foster patient activation, thus holding a strong potential to enhance patient-centered care. Studies have shown that providing patients with access to their own EHR notes may improve the understanding of their own clinical conditions and treatments, leading to improved health care outcomes. However, the highly technical language in EHR notes impedes patients' comprehension. Numerous studies have evaluated the difficulty of health-related text using readability formulas such as Flesch-Kincaid Grade Level (FKGL), Simple Measure of Gobbledygook (SMOG), and Gunning-Fog Index (GFI). They conclude that the materials are often written at a grade level higher than common recommendations. The objective of our study was to explore the relationship between the aforementioned readability formulas and the laypeople's perceived difficulty on 2 genres of text: general health information and EHR notes. We also validated the formulas' appropriateness and generalizability on predicting difficulty levels of highly complex technical documents. We collected 140 Wikipedia articles on diabetes and 242 EHR notes with diabetes International Classification of Diseases, Ninth Revision code. We recruited 15 Amazon Mechanical Turk (AMT) users to rate difficulty levels of the documents. Correlations between laypeople's perceived difficulty levels and readability formula scores were measured, and their difference was tested. We also compared word usage and the impact of medical concepts of the 2 genres of text. The distributions of both readability formulas' scores (P<.001) and laypeople's perceptions (P=.002) on the 2 genres were different. Correlations of readability predictions and laypeople's perceptions were weak. Furthermore, despite being graded at similar levels, documents of different genres were still perceived with different difficulty (P<.001). Word usage in the 2 related genres still differed significantly (P<.001). Our findings suggested that the readability formulas' predictions did not align with perceived difficulty in either text genre. The widely used readability formulas were highly correlated with each other but did not show adequate correlation with readers' perceived difficulty. Therefore, they were not appropriate to assess the readability of EHR notes. ©Jiaping Zheng, Hong Yu. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 02.03.2017.
New Framework for Cross-Domain Document Classification
2011-03-01
classification. The following paragraphs will introduce these related works in more detail. Wang et al . attempted to improve the accuracy of text document...of using Wikipedia to develop a thesaurus [20]. Gabrilovich et al . had an approach that is more elaborate in its use of Wikipedia text [21]. The...did show a modest improvement when it is performed using the Wikipedia information. Wang et al . improved on the results of co-clustering algorithm [24
Enhancement of the Shared Graphics Workspace.
1987-12-31
participants to share videodisc images and computer graphics displayed in color and text and facsimile information displayed in black on amber. They...could annotate the information in up to five * colors and print the annotated version at both sites, using a standard fax machine. The SGWS also used a fax...system to display a document, whether text or photo, the camera scans the document, digitizes the data, and sends it via direct memory access (DMA) to
Essie: A Concept-based Search Engine for Structured Biomedical Text
Ide, Nicholas C.; Loane, Russell F.; Demner-Fushman, Dina
2007-01-01
This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain. PMID:17329729
An IR-Based Approach Utilizing Query Expansion for Plagiarism Detection in MEDLINE.
Nawab, Rao Muhammad Adeel; Stevenson, Mark; Clough, Paul
2017-01-01
The identification of duplicated and plagiarized passages of text has become an increasingly active area of research. In this paper, we investigate methods for plagiarism detection that aim to identify potential sources of plagiarism from MEDLINE, particularly when the original text has been modified through the replacement of words or phrases. A scalable approach based on Information Retrieval is used to perform candidate document selection-the identification of a subset of potential source documents given a suspicious text-from MEDLINE. Query expansion is performed using the ULMS Metathesaurus to deal with situations in which original documents are obfuscated. Various approaches to Word Sense Disambiguation are investigated to deal with cases where there are multiple Concept Unique Identifiers (CUIs) for a given term. Results using the proposed IR-based approach outperform a state-of-the-art baseline based on Kullback-Leibler Distance.
Min, Yul Ha; Park, Hyeoun-Ae; Chung, Eunja; Lee, Hyunsook
2013-12-01
The purpose of this paper is to describe the components of a next-generation electronic nursing records system ensuring full semantic interoperability and integrating evidence into the nursing records system. A next-generation electronic nursing records system based on detailed clinical models and clinical practice guidelines was developed at Seoul National University Bundang Hospital in 2013. This system has two components, a terminology server and a nursing documentation system. The terminology server manages nursing narratives generated from entity-attribute-value triplets of detailed clinical models using a natural language generation system. The nursing documentation system provides nurses with a set of nursing narratives arranged around the recommendations extracted from clinical practice guidelines. An electronic nursing records system based on detailed clinical models and clinical practice guidelines was successfully implemented in a hospital in Korea. The next-generation electronic nursing records system can support nursing practice and nursing documentation, which in turn will improve data quality.
Automatic indexing of scanned documents: a layout-based approach
NASA Astrophysics Data System (ADS)
Esser, Daniel; Schuster, Daniel; Muthmann, Klemens; Berger, Michael; Schill, Alexander
2012-01-01
Archiving official written documents such as invoices, reminders and account statements in business and private area gets more and more important. Creating appropriate index entries for document archives like sender's name, creation date or document number is a tedious manual work. We present a novel approach to handle automatic indexing of documents based on generic positional extraction of index terms. For this purpose we apply the knowledge of document templates stored in a common full text search index to find index positions that were successfully extracted in the past.
ERIC Educational Resources Information Center
Sheehan, Kathleen M.
2015-01-01
The "TextEvaluator"® text analysis tool is a fully automated text complexity evaluation tool designed to help teachers, curriculum specialists, textbook publishers, and test developers select texts that are consistent with the text complexity guidelines specified in the Common Core State Standards.This paper documents the procedure used…
76 FR 27309 - Committee on Measures of Student Success
Federal Register 2010, 2011, 2012, 2013, 2014
2011-05-11
... version of this document is the document published in the Federal Register. Free Internet access to the... text or Adobe Portable Document Format (PDF) on the Internet at the following site: http://www.ed.gov/news/fed-register/index.html . To use PDF you must have Adobe Acrobat Reader, which is available free...
76 FR 50198 - Committee on Measures of Student Success
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-12
...: The official version of this document is the document published in the Federal Register. Free Internet... Federal Register, in text or Adobe Portable Document Format (PDF) on the Internet at the following site... is available free at this site. If you have questions about using PDF, call the U.S. Government...
10 CFR 2.304 - Formal requirements for documents; signatures; acceptance for filing.
Code of Federal Regulations, 2010 CFR
2010-01-01
... documents. In addition to the requirements in this part, paper documents must be stapled or bound on the left side; typewritten, printed, or otherwise reproduced in permanent form on good unglazed paper of... not less than one inch. Text must be double-spaced, except that quotations may be single-spaced and...
Evaluating Combinations of Ranked Lists and Visualizations of Inter-Document Similarity.
ERIC Educational Resources Information Center
Allan, James; Leuski, Anton; Swan, Russell; Byrd, Donald
2001-01-01
Considers how ideas from document clustering can be used to improve retrieval accuracy of ranked lists in interactive systems and how to evaluate system effectiveness. Describes a TREC (Text Retrieval Conference) study that constructed and evaluated systems that present the user with ranked lists and a visualization of inter-document similarities.…
Combining approaches to on-line handwriting information retrieval
NASA Astrophysics Data System (ADS)
Peña Saldarriaga, Sebastián; Viard-Gaudin, Christian; Morin, Emmanuel
2010-01-01
In this work, we propose to combine two quite different approaches for retrieving handwritten documents. Our hypothesis is that different retrieval algorithms should retrieve different sets of documents for the same query. Therefore, significant improvements in retrieval performances can be expected. The first approach is based on information retrieval techniques carried out on the noisy texts obtained through handwriting recognition, while the second approach is recognition-free using a word spotting algorithm. Results shows that for texts having a word error rate (WER) lower than 23%, the performances obtained with the combined system are close to the performances obtained on clean digital texts. In addition, for poorly recognized texts (WER > 52%), an improvement of nearly 17% can be observed with respect to the best available baseline method.
Human Rights Texts: Converting Human Rights Primary Source Documents into Data.
Fariss, Christopher J; Linder, Fridolin J; Jones, Zachary M; Crabtree, Charles D; Biek, Megan A; Ross, Ana-Sophia M; Kaur, Taranamol; Tsai, Michael
2015-01-01
We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.
Human Rights Texts: Converting Human Rights Primary Source Documents into Data
Fariss, Christopher J.; Linder, Fridolin J.; Jones, Zachary M.; Crabtree, Charles D.; Biek, Megan A.; Ross, Ana-Sophia M.; Kaur, Taranamol; Tsai, Michael
2015-01-01
We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability. PMID:26418817
National Survey of Patients’ Bill of Rights Statutes
Jacob, Dan M.; Hochhauser, Mark; Parker, Ruth M.
2009-01-01
BACKGROUND Despite vigorous national debate between 1999–2001 the federal patients’ bill of rights (PBOR) was not enacted. However, states have enacted legislation and the Joint Commission defined an accreditation standard to present patients with their rights. Because such initiatives can be undermined by overly complex language, we surveyed the readability of hospital PBOR documents as well as texts mandated by state law. METHODS State Web sites and codes were searched to identify PBOR statutes for general patient populations. The rights addressed were compared with the 12 themes presented in the American Hospital Association’s (AHA) PBOR text of 2002. In addition, we obtained PBOR texts from a sample of hospitals in each state. Readability was evaluated using Prose, a software program which reports an average of eight readability formulas. RESULTS Of 23 states with a PBOR statute for the general public, all establish a grievance policy, four protect a private right of action, and one stipulates fines for violations. These laws address an average of 7.4 of the 12 AHA themes. Nine states’ statutes specify PBOR text for distribution to patients. These documents have an average readability of 15th grade (range, 11.6, New York, to 17.0, Minnesota). PBOR documents from 240 US hospitals have an average readability of 14th grade (range, 8.2 to 17.0). CONCLUSIONS While the average U.S. adult reads at an 8th grade reading level, an advanced college reading level is routinely required to read PBOR documents. Patients are not likely to learn about their rights from documents they cannot read. PMID:19189192
Richardson, Karen J; Sengstack, Patricia; Doucette, Jeffrey N; Hammond, William E; Schertz, Matthew; Thompson, Julie; Johnson, Constance
2016-02-01
The primary aim of this performance improvement project was to determine whether the electronic health record implementation of stroke-specific nursing documentation flowsheet templates and clinical decision support alerts improved the nursing documentation of eligible stroke patients in seven stroke-certified emergency departments. Two system enhancements were introduced into the electronic record in an effort to improve nursing documentation: disease-specific documentation flowsheets and clinical decision support alerts. Using a pre-post design, project measures included six stroke management goals as defined by the National Institute of Neurological Disorders and Stroke and three clinical decision support measures based on entry of orders used to trigger documentation reminders for nursing: (1) the National Institutes of Health's Stroke Scale, (2) neurological checks, and (3) dysphagia screening. Data were reviewed 6 months prior (n = 2293) and 6 months following the intervention (n = 2588). Fisher exact test was used for statistical analysis. Statistical significance was found for documentation of five of the six stroke management goals, although effect sizes were small. Customizing flowsheets to meet the needs of nursing workflow showed improvement in the completion of documentation. The effects of the decision support alerts on the completeness of nursing documentation were not statistically significant (likely due to lack of order entry). For example, an order for the National Institutes of Health Stroke Scale was entered only 10.7% of the time, which meant no alert would fire for nursing in the postintervention group. Future work should focus on decision support alerts that trigger reminders for clinicians to place relevant orders for this population.
The Ecological Approach to Text Visualization.
ERIC Educational Resources Information Center
Wise, James A.
1999-01-01
Presents both theoretical and technical bases on which to build a "science of text visualization." The Spatial Paradigm for Information Retrieval and Exploration (SPIRE) text-visualization system, which images information from free-text documents as natural terrains, serves as an example of the "ecological approach" in its visual metaphor, its…
Academic Journal Embargoes and Full Text Databases.
ERIC Educational Resources Information Center
Brooks, Sam
2003-01-01
Documents the reasons for embargoes of academic journals in full text databases (i.e., publisher-imposed delays on the availability of full text content) and provides insight regarding common misconceptions. Tables present data on selected journals covering a cross-section of subjects and publishers and comparing two full text business databases.…
Text Classification for Organizational Researchers
Kobayashi, Vladimer B.; Mol, Stefan T.; Berkers, Hannah A.; Kismihók, Gábor; Den Hartog, Deanne N.
2017-01-01
Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output. PMID:29881249
Review of the role of NICE in promoting the adoption of innovative cardiac technologies.
Groves, Peter H; Pomfrett, Chris; Marlow, Mirella
2018-05-17
The National Institute for Health and Care Excellence (NICE) Medical Technologies Evaluation Programme (MTEP) promotes the adoption of innovative diagnostic and therapeutic technologies into National Health Service (NHS) clinical practice through the publication of guidance and briefing documents. Since the inception of the programme in 2009, there have been 7 medical technologiesguidance, 3 diagnostics guidance and 23 medtechinnovation briefing documents published that are relevant to the heart and circulation. Medical technologies guidance is published by NICE for selected single technologies if they offer plausible additional benefits to patients and the healthcare system. Diagnostic guidance is published for diagnostic technologies if they have the potential to improve health outcomes, but if their introduction may be associated with an increase in overall cost to the NHS. Medtechinnovation briefings provide evidence-based advice to those considering the implementation of new medical devices or diagnostic technologies. This review provides reference to all of the guidance and briefing medical technology documents that NICE has published that are relevant to the heart and circulation and reflect on their diverse recommendations. The interaction of MTEP with other NICE programmes is integral to its effectiveness and the means by which consistency is ensured across the different NICE programmes is described. The importance of the input of clinical experts from the cardiovascular professional community and the engagement by NICE with cardiovascular professional societies is highlighted as being fundamental to ensuring the quality of guidance outputs as well as to promoting their implementation and adoption. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Using complex networks for text classification: Discriminating informative and imaginative documents
NASA Astrophysics Data System (ADS)
de Arruda, Henrique F.; Costa, Luciano da F.; Amancio, Diego R.
2016-01-01
Statistical methods have been widely employed in recent years to grasp many language properties. The application of such techniques have allowed an improvement of several linguistic applications, such as machine translation and document classification. In the latter, many approaches have emphasised the semantical content of texts, as is the case of bag-of-word language models. These approaches have certainly yielded reasonable performance. However, some potential features such as the structural organization of texts have been used only in a few studies. In this context, we probe how features derived from textual structure analysis can be effectively employed in a classification task. More specifically, we performed a supervised classification aiming at discriminating informative from imaginative documents. Using a networked model that describes the local topological/dynamical properties of function words, we achieved an accuracy rate of up to 95%, which is much higher than similar networked approaches. A systematic analysis of feature relevance revealed that symmetry and accessibility measurements are among the most prominent network measurements. Our results suggest that these measurements could be used in related language applications, as they play a complementary role in characterising texts.
Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B
2012-01-01
Objective There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. Materials and methods We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. Results An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. Discussion A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. Conclusion We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries. PMID:22319176
Automated encoding of clinical documents based on natural language processing.
Friedman, Carol; Shagina, Lyudmila; Lussier, Yves; Hripcsak, George
2004-01-01
The aim of this study was to develop a method based on natural language processing (NLP) that automatically maps an entire clinical document to codes with modifiers and to quantitatively evaluate the method. An existing NLP system, MedLEE, was adapted to automatically generate codes. The method involves matching of structured output generated by MedLEE consisting of findings and modifiers to obtain the most specific code. Recall and precision applied to Unified Medical Language System (UMLS) coding were evaluated in two separate studies. Recall was measured using a test set of 150 randomly selected sentences, which were processed using MedLEE. Results were compared with a reference standard determined manually by seven experts. Precision was measured using a second test set of 150 randomly selected sentences from which UMLS codes were automatically generated by the method and then validated by experts. Recall of the system for UMLS coding of all terms was .77 (95% CI.72-.81), and for coding terms that had corresponding UMLS codes recall was .83 (.79-.87). Recall of the system for extracting all terms was .84 (.81-.88). Recall of the experts ranged from .69 to .91 for extracting terms. The precision of the system was .89 (.87-.91), and precision of the experts ranged from .61 to .91. Extraction of relevant clinical information and UMLS coding were accomplished using a method based on NLP. The method appeared to be comparable to or better than six experts. The advantage of the method is that it maps text to codes along with other related information, rendering the coded output suitable for effective retrieval.
NASA Astrophysics Data System (ADS)
Chen, Andrew A.; Meng, Frank; Morioka, Craig A.; Churchill, Bernard M.; Kangarloo, Hooshang
2005-04-01
Managing pediatric patients with neurogenic bladder (NGB) involves regular laboratory, imaging, and physiologic testing. Using input from domain experts and current literature, we identified specific data points from these tests to develop the concept of an electronic disease vector for NGB. An information extraction engine was used to extract the desired data elements from free-text and semi-structured documents retrieved from the patient"s medical record. Finally, a Java-based presentation engine created graphical visualizations of the extracted data. After precision, recall, and timing evaluation, we conclude that these tools may enable clinically useful, automatically generated, and diagnosis-specific visualizations of patient data, potentially improving compliance and ultimately, outcomes.
Lin, Ching-Heng; Wu, Nai-Yuan; Lai, Wei-Shao; Liou, Der-Ming
2015-01-01
Electronic medical records with encoded entries should enhance the semantic interoperability of document exchange. However, it remains a challenge to encode the narrative concept and to transform the coded concepts into a standard entry-level document. This study aimed to use a novel approach for the generation of entry-level interoperable clinical documents. Using HL7 clinical document architecture (CDA) as the example, we developed three pipelines to generate entry-level CDA documents. The first approach was a semi-automatic annotation pipeline (SAAP), the second was a natural language processing (NLP) pipeline, and the third merged the above two pipelines. We randomly selected 50 test documents from the i2b2 corpora to evaluate the performance of the three pipelines. The 50 randomly selected test documents contained 9365 words, including 588 Observation terms and 123 Procedure terms. For the Observation terms, the merged pipeline had a significantly higher F-measure than the NLP pipeline (0.89 vs 0.80, p<0.0001), but a similar F-measure to that of the SAAP (0.89 vs 0.87). For the Procedure terms, the F-measure was not significantly different among the three pipelines. The combination of a semi-automatic annotation approach and the NLP application seems to be a solution for generating entry-level interoperable clinical documents. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.comFor numbered affiliation see end of article.
Hosseini, Masoud; Jones, Josette; Faiola, Anthony; Vreeman, Daniel J; Wu, Huanmei; Dixon, Brian E
2017-10-01
Due to the nature of information generation in health care, clinical documents contain duplicate and sometimes conflicting information. Recent implementation of Health Information Exchange (HIE) mechanisms in which clinical summary documents are exchanged among disparate health care organizations can proliferate duplicate and conflicting information. To reduce information overload, a system to automatically consolidate information across multiple clinical summary documents was developed for an HIE network. The system receives any number of Continuity of Care Documents (CCDs) and outputs a single, consolidated record. To test the system, a randomly sampled corpus of 522 CCDs representing 50 unique patients was extracted from a large HIE network. The automated methods were compared to manual consolidation of information for three key sections of the CCD: problems, allergies, and medications. Manual consolidation of 11,631 entries was completed in approximately 150h. The same data were automatically consolidated in 3.3min. The system successfully consolidated 99.1% of problems, 87.0% of allergies, and 91.7% of medications. Almost all of the inaccuracies were caused by issues involving the use of standardized terminologies within the documents to represent individual information entries. This study represents a novel, tested tool for de-duplication and consolidation of CDA documents, which is a major step toward improving information access and the interoperability among information systems. While more work is necessary, automated systems like the one evaluated in this study will be necessary to meet the informatics needs of providers and health systems in the future. Copyright © 2017 Elsevier Inc. All rights reserved.
Overview of Historical Earthquake Document Database in Japan and Future Development
NASA Astrophysics Data System (ADS)
Nishiyama, A.; Satake, K.
2014-12-01
In Japan, damage and disasters from historical large earthquakes have been documented and preserved. Compilation of historical earthquake documents started in the early 20th century and 33 volumes of historical document source books (about 27,000 pages) have been published. However, these source books are not effectively utilized for researchers due to a contamination of low-reliability historical records and a difficulty for keyword searching by characters and dates. To overcome these problems and to promote historical earthquake studies in Japan, construction of text database started in the 21 century. As for historical earthquakes from the beginning of the 7th century to the early 17th century, "Online Database of Historical Documents in Japanese Earthquakes and Eruptions in the Ancient and Medieval Ages" (Ishibashi, 2009) has been already constructed. They investigated the source books or original texts of historical literature, emended the descriptions, and assigned the reliability of each historical document on the basis of written age. Another database compiled the historical documents for seven damaging earthquakes occurred along the Sea of Japan coast in Honshu, central Japan in the Edo period (from the beginning of the 17th century to the middle of the 19th century) and constructed text database and seismic intensity data base. These are now publicized on the web (written only in Japanese). However, only about 9 % of the earthquake source books have been digitized so far. Therefore, we plan to digitize all of the remaining historical documents by the research-program which started in 2014. The specification of the data base will be similar for previous ones. We also plan to combine this database with liquefaction traces database, which will be constructed by other research program, by adding the location information described in historical documents. Constructed database would be utilized to estimate the distributions of seismic intensities and tsunami heights.
Kleczka, Bernadette; Musiega, Anita; Rabut, Grace; Wekesa, Phoebe; Mwaniki, Paul; Marx, Michael; Kumar, Pratap
2018-06-01
The United Nations' Sustainable Development Goal #3.8 targets 'access to quality essential healthcare services'. Clinical practice guidelines are an important tool for ensuring quality of clinical care, but many challenges prevent their use in low-resource settings. Monitoring the use of guidelines relies on cumbersome clinical audits of paper records, and electronic systems face financial and other limitations. Here we describe a unique approach to generating digital data from paper using guideline-based templates, rubber stamps and mobile phones. The Guidelines Adherence in Slums Project targeted ten private sector primary healthcare clinics serving informal settlements in Nairobi, Kenya. Each clinic was provided with rubber stamp templates to support documentation and management of commonly encountered outpatient conditions. Participatory design methods were used to customize templates to the workflows and infrastructure of each clinic. Rubber stamps were used to print templates into paper charts, providing clinicians with checklists for use during consultations. Templates used bubble format data entry, which could be digitized from images taken on mobile phones. Besides rubber stamp templates, the intervention included booklets of guideline compilations, one Android phone for digitizing images of templates, and one data feedback/continuing medical education session per clinic each month. In this paper we focus on the effect of the intervention on documentation of three non-communicable diseases in one clinic. Seventy charts of patients enrolled in the chronic disease program (hypertension/diabetes, n=867; chronic respiratory diseases, n=223) at one of the ten intervention clinics were sampled. Documentation of each individual patient encounter in the pre-intervention (January-March 2016) and post-intervention period (May-July) was scored for information in four dimensions - general data, patient assessment, testing, and management. Control criteria included information with no counterparts in templates (e.g. notes on presenting complaints, vital signs). Documentation scores for each patient were compared between both pre- and post-intervention periods and between encounters documented with and without templates (post-intervention only). The total number of patient encounters in the pre-intervention (282) and post-intervention periods (264) did not differ. Mean documentation scores increased significantly in the post-intervention period on average by 21%, 24% and 17% for hypertension, diabetes and chronic respiratory diseases, respectively. Differences were greater (47%, 43% and 27%, respectively) when documentation with and without templates was compared. Changes between pre- vs.post-intervention, and with vs.without template, varied between individual dimensions of documentation. Overall, documentation improved more for general data and patient assessment than in testing or management. The use of templates improves paper-based documentation of patient care, a first step towards improving the quality of care. Rubber stamps provide a simple and low-cost method to print templates on demand. In combination with ubiquitously available mobile phones, information entered on paper can be easily and rapidly digitized. This 'frugal innovation' in m-Health can empower small, private sector facilities, where large numbers of urban patients seek healthcare, to generate digital data on routine outpatient care. These data can form the basis for evidence-based quality improvement efforts at large scale, and help deliver on the SDG promise of quality essential healthcare services for all. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Hui; Wang, Deqing; Wu, Wenjun; Hu, Hongping
2012-11-01
In today's business environment, enterprises are increasingly under pressure to process the vast amount of data produced everyday within enterprises. One method is to focus on the business intelligence (BI) applications and increasing the commercial added-value through such business analytics activities. Term weighting scheme, which has been used to convert the documents as vectors in the term space, is a vital task in enterprise Information Retrieval (IR), text categorisation, text analytics, etc. When determining term weight in a document, the traditional TF-IDF scheme sets weight value for the term considering only its occurrence frequency within the document and in the entire set of documents, which leads to some meaningful terms that cannot get the appropriate weight. In this article, we propose a new term weighting scheme called Term Frequency - Function of Document Frequency (TF-FDF) to address this issue. Instead of using monotonically decreasing function such as Inverse Document Frequency, FDF presents a convex function that dynamically adjusts weights according to the significance of the words in a document set. This function can be manually tuned based on the distribution of the most meaningful words which semantically represent the document set. Our experiments show that the TF-FDF can achieve higher value of Normalised Discounted Cumulative Gain in IR than that of TF-IDF and its variants, and improving the accuracy of relevance ranking of the IR results.
Kostelec, Pablo; Emanuele Garbelli, Pietro; Emanuele Garbelli, Pietro
2017-01-01
On-call weekends in medicine can be a busy and stressful time for junior doctors, as they are responsible for a larger pool of patients, most of whom they would have never met. Clinical handover to the weekend team is extremely important and any communication errors may have a profound impact on patient care, potentially even resulting in avoidable harm or death. Several senior clinical bodies have issued guidelines on best practice in written and verbal handover. These include: standardisation, use of pro forma documents prompting doctors to document vital information (such as ceiling of care/resuscitation status) and prioritisation according to clinical urgency. These guidelines were not consistently followed in our hospital site at the onset of 2014 and junior doctors were becoming increasingly dissatisfied with the handover processes. An initial audit of handover documents used across the medical division on two separate weekends in January 2014, revealed high variability in compliance with documentation of key information. For example, ceiling of care was documented for only 14-42% of patients and resuscitation status in 26-72% of patients respectively. Additionally, each ward used their own self-designed pro forma and patients were not prioritised by clinical urgency. Within six months from the introduction of a standardised, hospital-wide weekend handover pro forma across the medical division and following initial improvements to its layout, ceiling of therapy and resuscitation status were documented in approximately 80% of patients (with some minor variability). Moreover, 100% of patients in acute medicine and 75% of those in general medicine were prioritised by clinical urgency and all wards used the same handover pro forma.
Makam, Anil N; Lanham, Holly J; Batchelor, Kim; Samal, Lipika; Moran, Brett; Howell-Stampley, Temple; Kirk, Lynne; Cherukuri, Manjula; Santini, Noel; Leykum, Luci K; Halm, Ethan A
2013-08-09
Despite considerable financial incentives for adoption, there is little evidence available about providers' use and satisfaction with key functions of electronic health records (EHRs) that meet "meaningful use" criteria. We surveyed primary care providers (PCPs) in 11 general internal medicine and family medicine practices affiliated with 3 health systems in Texas about their use and satisfaction with performing common tasks (documentation, medication prescribing, preventive services, problem list) in the Epic EHR, a common commercial system. Most practices had greater than 5 years of experience with the Epic EHR. We used multivariate logistic regression to model predictors of being a structured documenter, defined as using electronic templates or prepopulated dot phrases to document at least two of the three note sections (history, physical, assessment and plan). 146 PCPs responded (70%). The majority used free text to document the history (51%) and assessment and plan (54%) and electronic templates to document the physical exam (57%). Half of PCPs were structured documenters (55%) with family medicine specialty (adjusted OR 3.3, 95% CI, 1.4-7.8) and years since graduation (nonlinear relationship with youngest and oldest having lowest probabilities) being significant predictors. Nearly half (43%) reported spending at least one extra hour beyond each scheduled half-day clinic completing EHR documentation. Three-quarters were satisfied with documenting completion of pneumococcal vaccinations and half were satisfied with documenting cancer screening (57% for breast, 45% for colorectal, and 46% for cervical). Fewer were satisfied with reminders for overdue pneumococcal vaccination (48%) and cancer screening (38% for breast, 37% for colorectal, and 31% for cervical). While most believed the problem list was helpful (70%) and kept an up-to-date list for their patients (68%), half thought they were unreliable and inaccurate (51%). Dissatisfaction with and suboptimal use of key functions of the EHR may mitigate the potential for EHR use to improve preventive health and chronic disease management. Future work should optimize use of key functions and improve providers' time efficiency.
Tsuji, Shintarou; Nishimoto, Naoki; Ogasawara, Katsuhiko
2008-07-20
Although large medical texts are stored in electronic format, they are seldom reused because of the difficulty of processing narrative texts by computer. Morphological analysis is a key technology for extracting medical terms correctly and automatically. This process parses a sentence into its smallest unit, the morpheme. Phrases consisting of two or more technical terms, however, cause morphological analysis software to fail in parsing the sentence and output unprocessed terms as "unknown words." The purpose of this study was to reduce the number of unknown words in medical narrative text processing. The results of parsing the text with additional dictionaries were compared with the analysis of the number of unknown words in the national examination for radiologists. The ratio of unknown words was reduced 1.0% to 0.36% by adding terminologies of radiological technology, MeSH, and ICD-10 labels. The terminology of radiological technology was the most effective resource, being reduced by 0.62%. This result clearly showed the necessity of additional dictionary selection and trends in unknown words. The potential for this investigation is to make available a large body of clinical information that would otherwise be inaccessible for applications other than manual health care review by personnel.
ERIC Educational Resources Information Center
Armbruster, Bonnie B.; Anderson, Thomas H.
Idea-mapping (i-mapping), a way of representing ideas from a text in the form of a diagram, is defined and illustrated in this document as a way to help students "see" how the ideas they read are linked to each other. The first portion of the document discusses the fundamental relationships found in texts (A is a characteristic of B, A…
Language Classification using N-grams Accelerated by FPGA-based Bloom Filters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacob, A; Gokhale, M
N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.
32 CFR 1801.44 - Action by appeals authority.
Code of Federal Regulations, 2010 CFR
2010-07-01
... CENTER PUBLIC RIGHTS UNDER THE PRIVACY ACT OF 1974 Action On Privacy Act Administrative Appeals § 1801.44... request, the document(s) (sanitized and full text) at issue, and the findings of any concerned office...
Elkbuli, Adel; Godelman, Steven; Miller, Ashley; Boneva, Dessy; Bernal, Eileen; Hai, Shaikh; McKenney, Mark
2018-05-01
Clinical documentation can be an underappreciated. Trauma Centers (TCs) are now routinely evaluated for quality performance. TCs with poor documentation may not accurately reflect actual injury burden or comorbidities and can impact accuracy of mortality measures. Markers exist to adjust crude death rates for injury severity: observed over expected deaths (O/E) adjust for injury; Case Mix Index (CMI) reflects disease burden, and Severity of Illness (SOI) measures organ dysfunction. We aim to evaluate the impact of implementing a Clinical Documentation Improvement Program (CDIP) on reported outcomes. Review of 2-years of prospectively collected data for trauma patients, during the implementation of CDIP. A two-group prospective observational study design was used to evaluate the pre-implementation and the post-implementation phase of improved clinical documentation. T-test and Chi-Squared were used with significance defined as p < 0.05. In the pre-implementation period, there were 49 deaths out of 1419 (3.45%), while post-implementation period, had 38 deaths out of 1454 (2.61%), (non-significant). There was however, a significant difference between O/E ratios. In the pre-phase, the O/E was 1.36 and 0.70 in the post-phase (p < 0.001). The two groups also differed on CMI with a pre-group mean of 2.48 and a post-group of 2.87 (p < 0.001), indicating higher injury burden in the post-group. SOI started at 2.12 and significantly increased to 2.91, signifying more organ system dysfunction (p < 0.018). Improved clinical documentation results in improved accuracy of measures of mortality, injury severity, and comorbidities and a more accurate reflection in O/E mortality ratios, CMI, and SOI. Copyright © 2018 IJS Publishing Group Ltd. Published by Elsevier Ltd. All rights reserved.
McLean, Andrew; Lawlor, Jenine; Mitchell, Rob; Kault, David; O'Kane, Carl; Lees, Michelle
2015-02-01
To evaluate the impact of More Learning for Interns in Emergency (MoLIE) on clinical documentation in the ED of a large regional hospital. MoLIE was implemented at The Townsville Hospital (TTH) in 2010, and has since provided ED interns with structured off-floor teaching and a dedicated clinical supervisor. A pre- and post-intervention study was conducted using retrospective medical record review methodology. Charts were selected by identifying all TTH ED patients seen by interns in the period 2008-2011. Two hundred pre-intervention records (2008-2009) and 200 post-intervention records (2010-2011) were reviewed. These were randomly selected following an initial screen by an ED staff specialist. The quality of clinical documentation for five common ED presentations (asthma, chest pain, lacerations, abdominal pain and upper limb fractures) was assessed. For each presentation, documentation quality was scored out of 10 using predefined criteria. An improvement of two or more was thought to be clinically significant. Mean scores for each group were compared using a Student's t-test for independent samples. Mean documentation scores (and 95% confidence intervals) were 5.55 (5.17-5.93) in 2008, 5.42 (4.98-5.86) in 2009, 6.37 (5.99-6.75) in 2010 and 6.08 (5.71-6.45) in 2011. There was a statistically but not clinically significant improvement in scores pre- and post-intervention (P ≤ 0.001). The introduction of MoLIE was associated with a small but statistically significant improvement in documentation, despite an 80% increase in intern placements. These results suggest that structured training programmes have potential to improve intern performance while simultaneously enhancing training capacity. The impact on quality of care requires further evaluation. © 2015 Australasian College for Emergency Medicine and Australasian Society for Emergency Medicine.
Text, photo, and line extraction in scanned documents
NASA Astrophysics Data System (ADS)
Erkilinc, M. Sezer; Jaber, Mustafa; Saber, Eli; Bauer, Peter; Depalov, Dejan
2012-07-01
We propose a page layout analysis algorithm to classify a scanned document into different regions such as text, photo, or strong lines. The proposed scheme consists of five modules. The first module performs several image preprocessing techniques such as image scaling, filtering, color space conversion, and gamma correction to enhance the scanned image quality and reduce the computation time in later stages. Text detection is applied in the second module wherein wavelet transform and run-length encoding are employed to generate and validate text regions, respectively. The third module uses a Markov random field based block-wise segmentation that employs a basis vector projection technique with maximum a posteriori probability optimization to detect photo regions. In the fourth module, methods for edge detection, edge linking, line-segment fitting, and Hough transform are utilized to detect strong edges and lines. In the last module, the resultant text, photo, and edge maps are combined to generate a page layout map using K-Means clustering. The proposed algorithm has been tested on several hundred documents that contain simple and complex page layout structures and contents such as articles, magazines, business cards, dictionaries, and newsletters, and compared against state-of-the-art page-segmentation techniques with benchmark performance. The results indicate that our methodology achieves an average of ˜89% classification accuracy in text, photo, and background regions.
Goldstein, Ayelet; Shahar, Yuval
2016-06-01
Design and implement an intelligent free-text summarization system: The system's input includes large numbers of longitudinal, multivariate, numeric and symbolic clinical raw data, collected over varying periods of time, and in different complex contexts, and a suitable medical knowledge base. The system then automatically generates a textual summary of the data. We aim to prove the feasibility of implementing such a system, and to demonstrate its potential benefits for clinicians and for enhancement of quality of care. We have designed a new, domain-independent, knowledge-based system, the CliniText system, for automated summarization in free text of longitudinal medical records of any duration, in any context. The system is composed of six components: (1) A temporal abstraction module generates all possible abstractions from the patient's raw data using a temporal-abstraction knowledge base; (2) The abductive reasoning module infers abstractions or events from the data, which were not explicitly included in the database; (3) The pruning module filters out raw or abstract data based on predefined heuristics; (4) The document structuring module organizes the remaining raw or abstract data, according to the desired format; (5) The microplanning module, groups the raw or abstract data and creates referring expressions; (6) The surface realization module, generates the text, and applies the grammar rules of the chosen language. We have performed an initial technical evaluation of the system in the cardiac intensive-care and diabetes domains. We also summarize the results of a more detailed evaluation study that we have performed in the intensive-care domain that assessed the completeness, correctness, and overall quality of the system's generated text, and its potential benefits to clinical decision making. We assessed these measures for 31 letters originally composed by clinicians, and for the same letters when generated by the CliniText system. We have successfully implemented all of the components of the CliniText system in software. We have also been able to create a comprehensive temporal-abstraction knowledge base to support its functionality, mostly in the intensive-care domain. The initial technical evaluation of the system in the cardiac intensive-care and diabetes domains has shown great promise, proving the feasibility of constructing and operating such systems. The detailed results of the evaluation in the intensive-care domain are out of scope of the current paper, and we refer the reader to a more detailed source. In all of the letters composed by clinicians, there were at least two important items per letter missed that were included by the CliniText system. The clinicians' letters got a significantly better grade in three out of four measured quality parameters, as judged by an expert; however, the variance in the quality was much higher in the clinicians' letters. In addition, three clinicians answered questions based on the discharge letter 40% faster, and answered four out of the five questions equally well or significantly better, when using the CliniText-generated letters, than when using the clinician-composed letters. Constructing a working system for automated summarization in free text of large numbers of varying periods of multivariate longitudinal clinical data is feasible. So is the construction of a large knowledge base, designed to support such a system, in a complex clinical domain, such as the intensive-care domain. The integration of the quality and functionality results suggests that the optimal discharge letter should exploit both human and machine, possibly by creating a machine-generated draft that will be polished by a human clinician. Copyright © 2016 Elsevier Inc. All rights reserved.
Support Vector Machines: Relevance Feedback and Information Retrieval.
ERIC Educational Resources Information Center
Drucker, Harris; Shahrary, Behzad; Gibbon, David C.
2002-01-01
Compares support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred. Includes nine tables. (Contains 24…
17 CFR 4.1 - Requirements as to form.
Code of Federal Regulations, 2013 CFR
2013-04-01
... table of contents is required, the electronic document must either include page numbers in the text or... as to form. (a) Each document distributed pursuant to this part 4 must be: (1) Clear and legible; (2...” disclosed under this part 4 must be displayed in capital letters and in boldface type. (c) Where a document...
17 CFR 4.1 - Requirements as to form.
Code of Federal Regulations, 2010 CFR
2010-04-01
... table of contents is required, the electronic document must either include page numbers in the text or... as to form. (a) Each document distributed pursuant to this part 4 must be: (1) Clear and legible; (2...” disclosed under this part 4 must be displayed in capital letters and in boldface type. (c) Where a document...
17 CFR 4.1 - Requirements as to form.
Code of Federal Regulations, 2012 CFR
2012-04-01
... table of contents is required, the electronic document must either include page numbers in the text or... as to form. (a) Each document distributed pursuant to this part 4 must be: (1) Clear and legible; (2...” disclosed under this part 4 must be displayed in capital letters and in boldface type. (c) Where a document...
17 CFR 4.1 - Requirements as to form.
Code of Federal Regulations, 2011 CFR
2011-04-01
... table of contents is required, the electronic document must either include page numbers in the text or... as to form. (a) Each document distributed pursuant to this part 4 must be: (1) Clear and legible; (2...” disclosed under this part 4 must be displayed in capital letters and in boldface type. (c) Where a document...
17 CFR 4.1 - Requirements as to form.
Code of Federal Regulations, 2014 CFR
2014-04-01
... table of contents is required, the electronic document must either include page numbers in the text or... as to form. (a) Each document distributed pursuant to this part 4 must be: (1) Clear and legible; (2...” disclosed under this part 4 must be displayed in capital letters and in boldface type. (c) Where a document...
EFL Learners' Multiple Documents Literacy: Effects of a Strategy-Directed Intervention Program
ERIC Educational Resources Information Center
Karimi, Mohammad Nabi
2015-01-01
There is a substantial body of L2 research documenting the central role of strategy instruction in reading comprehension. However, this line of research has been conducted mostly within the single text paradigm of reading research. With reading literacy undergoing a marked shift from single source reading to multiple documents literacy, little is…
American Catholic Higher Education. Essential Documents, 1967-1990.
ERIC Educational Resources Information Center
Gallin, Alice, Ed.
This reference volume contains the texts of documents pertinent to the development of Catholic higher education during the years from 1967 to 1990. The documents reveal church officials' and university presidents' collaborative efforts to address the questions of what it means it mean to be a university or college and what it means for such an…
Business Documents Don't Have to Be Boring
ERIC Educational Resources Information Center
Schultz, Benjamin
2006-01-01
With business documents, visuals can serve to enhance the written word in conveying the message. Images can be especially effective when used subtly, on part of the page, on successive pages to provide continuity, or even set as watermarks over the entire page. A main reason given for traditional text-only business documents is that they are…
Validating competence: a new credential for clinical documentation improvement practitioners.
Ryan, Jessica; Patena, Karen; Judd, Wallace; Niederpruem, Mike
2013-01-01
As the health information management (HIM) profession continues to expand and become more specialized, there is an ever-increasing need to identify emerging HIM workforce roles that require a codified level of proficiency and professional standards. The Commission on Certification for Health Informatics and Information Management (CCHIIM) explored one such role-clinical documentation improvement (CDI) practitioner-to define the tasks and responsibilities of the job as well as the knowledge required to perform them effectively. Subject-matter experts (SMEs) defined the CDI specialty by following best practices for job analysis methodology. A random sample of 4,923 CDI-related professionals was surveyed regarding the tasks and knowledge required for the job. The survey data were used to create a weighted blueprint of the six major domains that make up the CDI practitioner role, which later formed the foundation for the clinical documentation improvement practitioner (CDIP) credential. As a result, healthcare organizations can be assured that their certified documentation improvement practitioners have demonstrated excellence in clinical care, treatment, coding guidelines, and reimbursement methodologies.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-10-12
... 11-134] Facilitating the Deployment of Text-to-911 and Other Next Generation 911 Applications... 911 Public Safety Answering Points (PSAPs) via text, photos, videos, and data and enhance the... 22, 2011. The full text of this document is available for public inspection during regular business...
Different Words for the Same Concept: Learning Collaboratively from Multiple Documents
ERIC Educational Resources Information Center
Jucks, Regina; Paus, Elisabeth
2013-01-01
This study investigated how varying the lexical encodings of technical terms in multiple texts influences learners' dyadic processing of scientific-related information. Fifty-seven pairs of college students read journalistic texts on depression. Each partner in a dyad received one text; for half of the dyads the partner's text contained different…
[Law courts and clinical documentation].
Jiménez Carnicero, M P; Magallón, A I; Gordillo, A
2006-01-01
Background. Until 2004, requests for clinical documentation proceeding from the Judicial Administration on Specialist Care of Pamplona were received in six different centres and were processed independently, with different procedures, and documents were even sent in duplicate, with the resulting work load. This article describes the procedure for processing requests for documentation proceeding from the Law Courts and analyses the requests received. Methods. A circuit was set up to channel the judicial requests that arrived at the Specialist Health Care Centres of Pamplona and at the Juridical Regime Service of the Health System of Navarra-Osasunbidea, and a Higher Technician in Health Documentation was contracted to centralise these requests. A proceedings protocol was established to unify criteria and speed up the process, and a database was designed to register the proceedings. Results. In the course of 2004, 210 requests for documentation by legal requirement were received. Of these, 24 were claims of patrimonial responsibility and 13 were requested by lawyers with the patient's authorisation. The most frequent jurisdictional order was penal (43.33%). Ninety-three point one five percent (93.15%) of the requests proceeded from law courts in the autonomous community of Navarra. The centre that received the greatest number of requests was the "Príncipe de Viana" Consultation Centre (33.73%).The most frequently requested documentation was a copy of reports (109) and a copy of the complete clinical record (39). On two occasions the original clinical record was required. The average time of response was 6.6 days. Conclusions. The centralisation of administration has brought greater agility to the process and homogeneity in the criteria of processing. Less time is involved in preparing and dispatching the documentation, the dispatch of duplicate documents is avoided, the work load has been reduced and the dispersal of documentation is avoided, a situation that guarantees greater privacy for the patient.
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.
Wu, Y; Denny, J C; Rosenbloom, S T; Miller, R A; Giuse, D A; Song, M; Xu, H
2015-01-01
To save time, healthcare providers frequently use abbreviations while authoring clinical documents. Nevertheless, abbreviations that authors deem unambiguous often confuse other readers, including clinicians, patients, and natural language processing (NLP) systems. Most current clinical NLP systems "post-process" notes long after clinicians enter them into electronic health record systems (EHRs). Such post-processing cannot guarantee 100% accuracy in abbreviation identification and disambiguation, since multiple alternative interpretations exist. Authors describe a prototype system for real-time Clinical Abbreviation Recognition and Disambiguation (rCARD) - i.e., a system that interacts with authors during note generation to verify correct abbreviation senses. The rCARD system design anticipates future integration with web-based clinical documentation systems to improve quality of healthcare records. When clinicians enter documents, rCARD will automatically recognize each abbreviation. For abbreviations with multiple possible senses, rCARD will show a ranked list of possible meanings with the best predicted sense at the top. The prototype application embodies three word sense disambiguation (WSD) methods to predict the correct senses of abbreviations. We then conducted three experments to evaluate rCARD, including 1) a performance evaluation of different WSD methods; 2) a time evaluation of real-time WSD methods; and 3) a user study of typing clinical sentences with abbreviations using rCARD. Using 4,721 sentences containing 25 commonly observed, highly ambiguous clinical abbreviations, our evaluation showed that the best profile-based method implemented in rCARD achieved a reasonable WSD accuracy of 88.8% (comparable to SVM - 89.5%) and the cost of time for the different WSD methods are also acceptable (ranging from 0.630 to 1.649 milliseconds within the same network). The preliminary user study also showed that the extra time costs by rCARD were about 5% of total document entry time and users did not feel a significant delay when using rCARD for clinical document entry. The study indicates that it is feasible to integrate a real-time, NLP-enabled abbreviation recognition and disambiguation module with clinical documentation systems.
Automatic system for computer program documentation
NASA Technical Reports Server (NTRS)
Simmons, D. B.; Elliott, R. W.; Arseven, S.; Colunga, D.
1972-01-01
Work done on a project to design an automatic system for computer program documentation aids was made to determine what existing programs could be used effectively to document computer programs. Results of the study are included in the form of an extensive bibliography and working papers on appropriate operating systems, text editors, program editors, data structures, standards, decision tables, flowchart systems, and proprietary documentation aids. The preliminary design for an automated documentation system is also included. An actual program has been documented in detail to demonstrate the types of output that can be produced by the proposed system.
Machine Learning and Decision Support in Critical Care
Johnson, Alistair E. W.; Ghassemi, Mohammad M.; Nemati, Shamim; Niehaus, Katherine E.; Clifton, David A.; Clifford, Gari D.
2016-01-01
Clinical data management systems typically provide caregiver teams with useful information, derived from large, sometimes highly heterogeneous, data sources that are often changing dynamically. Over the last decade there has been a significant surge in interest in using these data sources, from simply re-using the standard clinical databases for event prediction or decision support, to including dynamic and patient-specific information into clinical monitoring and prediction problems. However, in most cases, commercial clinical databases have been designed to document clinical activity for reporting, liability and billing reasons, rather than for developing new algorithms. With increasing excitement surrounding “secondary use of medical records” and “Big Data” analytics, it is important to understand the limitations of current databases and what needs to change in order to enter an era of “precision medicine.” This review article covers many of the issues involved in the collection and preprocessing of critical care data. The three challenges in critical care are considered: compartmentalization, corruption, and complexity. A range of applications addressing these issues are covered, including the modernization of static acuity scoring; on-line patient tracking; personalized prediction and risk assessment; artifact detection; state estimation; and incorporation of multimodal data sources such as genomic and free text data. PMID:27765959
Onboard shuttle on-line software requirements system: Prototype
NASA Technical Reports Server (NTRS)
Kolkhorst, Barbara; Ogletree, Barry
1989-01-01
The prototype discussed here was developed as proof of a concept for a system which could support high volumes of requirements documents with integrated text and graphics; the solution proposed here could be extended to other projects whose goal is to place paper documents in an electronic system for viewing and printing purposes. The technical problems (such as conversion of documentation between word processors, management of a variety of graphics file formats, and difficulties involved in scanning integrated text and graphics) would be very similar for other systems of this type. Indeed, technological advances in areas such as scanning hardware and software and display terminals insure that some of the problems encountered here will be solved in the near-term (less than five years). Examples of these solvable problems include automated input of integrated text and graphics, errors in the recognition process, and the loss of image information which results from the digitization process. The solution developed for the Online Software Requirements System is modular and allows hardware and software components to be upgraded or replaced as industry solutions mature. The extensive commercial software content allows the NASA customer to apply resources to solving the problem and maintaining documents.
Bridge, Heather; Smolskis, Mary; Bianchine, Peter; Dixon, Dennis O; Kelly, Grace; Herpin, Betsey; Tavel, Jorge
2009-08-01
A clinical research protocol document must reflect both sound scientific rationale as well as local, national and, when applicable, international regulatory and human subject protections requirements. These requirements originate from a variety of sources, undergo frequent revision and are subject to interpretation. Tools to assist clinical investigators in the production of clinical protocols could facilitate navigating these requirements and ultimately increase the efficiency of clinical research. The National Institute of Allergy and Infectious Diseases (NIAID) developed templates for investigators to serve as the foundation for protocol development. These protocol templates are designed as tools to support investigators in developing clinical protocols. NIAID established a series of working groups to determine how to improve its capacity to conduct clinical research more efficiently and effectively. The Protocol Template Working Group was convened to determine what protocol templates currently existed within NIAID and whether standard NIAID protocol templates should be produced. After review and assessment of existing protocol documents and requirements, the group reached consensus about required and optional content, determined the format and identified methods for distribution as well as education of investigators in the use of these templates. The templates were approved by the NIAID Executive Committee in 2006 and posted as part of the NIAID Clinical Research Toolkit [1] website for broad access. These documents require scheduled revisions to stay current with regulatory and policy changes. The structure of any clinical protocol template, whether comprehensive or specific to a particular study phase, setting or design, affects how it is used by investigators. Each structure presents its own set of advantages and disadvantages. While useful, protocol templates are not stand-alone tools for creating an optimal protocol document, but must be complemented by institutional resources and support. Education and guidance of investigators in the appropriate use of templates is necessary to ensure a complete yet concise protocol document. Due to changing regulatory requirements, clinical protocol templates cannot become static, but require frequent revisions.
Comparative Analysis of Document level Text Classification Algorithms using R
NASA Astrophysics Data System (ADS)
Syamala, Maganti; Nalini, N. J., Dr; Maguluri, Lakshamanaphaneendra; Ragupathy, R., Dr.
2017-08-01
From the past few decades there has been tremendous volumes of data available in Internet either in structured or unstructured form. Also, there is an exponential growth of information on Internet, so there is an emergent need of text classifiers. Text mining is an interdisciplinary field which draws attention on information retrieval, data mining, machine learning, statistics and computational linguistics. And to handle this situation, a wide range of supervised learning algorithms has been introduced. Among all these K-Nearest Neighbor(KNN) is efficient and simplest classifier in text classification family. But KNN suffers from imbalanced class distribution and noisy term features. So, to cope up with this challenge we use document based centroid dimensionality reduction(CentroidDR) using R Programming. By combining these two text classification techniques, KNN and Centroid classifiers, we propose a scalable and effective flat classifier, called MCenKNN which works well substantially better than CenKNN.
PuReD-MCL: a graph-based PubMed document clustering methodology.
Theodosiou, T; Darzentas, N; Angelis, L; Ouzounis, C A
2008-09-01
Biomedical literature is the principal repository of biomedical knowledge, with PubMed being the most complete database collecting, organizing and analyzing such textual knowledge. There are numerous efforts that attempt to exploit this information by using text mining and machine learning techniques. We developed a novel approach, called PuReD-MCL (Pubmed Related Documents-MCL), which is based on the graph clustering algorithm MCL and relevant resources from PubMed. PuReD-MCL avoids using natural language processing (NLP) techniques directly; instead, it takes advantage of existing resources, available from PubMed. PuReD-MCL then clusters documents efficiently using the MCL graph clustering algorithm, which is based on graph flow simulation. This process allows users to analyse the results by highlighting important clues, and finally to visualize the clusters and all relevant information using an interactive graph layout algorithm, for instance BioLayout Express 3D. The methodology was applied to two different datasets, previously used for the validation of the document clustering tool TextQuest. The first dataset involves the organisms Escherichia coli and yeast, whereas the second is related to Drosophila development. PuReD-MCL successfully reproduces the annotated results obtained from TextQuest, while at the same time provides additional insights into the clusters and the corresponding documents. Source code in perl and R are available from http://tartara.csd.auth.gr/~theodos/
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. House Committee on the Judiciary.
This document contains witnesses' testimonies and prepared statements from the Congressional hearing called to consider enactment of H.R. 2673, a bill to facilitate implementation of the 1980 Hague Convention on the Civil Aspects of International Child Abduction. The text of H.R. 2673 is included in the document as is the text of H.R. 3971, a bill…
Graphics-based intelligent search and abstracting using Data Modeling
NASA Astrophysics Data System (ADS)
Jaenisch, Holger M.; Handley, James W.; Case, Carl T.; Songy, Claude G.
2002-11-01
This paper presents an autonomous text and context-mining algorithm that converts text documents into point clouds for visual search cues. This algorithm is applied to the task of data-mining a scriptural database comprised of the Old and New Testaments from the Bible and the Book of Mormon, Doctrine and Covenants, and the Pearl of Great Price. Results are generated which graphically show the scripture that represents the average concept of the database and the mining of the documents down to the verse level.
Developing a business-practice model for pharmacy services in ambulatory settings.
Harris, Ila M; Baker, Ed; Berry, Tricia M; Halloran, Mary Ann; Lindauer, Kathleen; Ragucci, Kelly R; McGivney, Melissa Somma; Taylor, A Thomas; Haines, Stuart T
2008-02-01
A business-practice model is a guide, or toolkit, to assist managers and clinical pharmacy practitioners in the exploration, proposal, development and implementation of new clinical pharmacy services and/or the enhancement of existing services. This document was developed by the American College of Clinical Pharmacy Task Force on Ambulatory Practice to assist clinical pharmacy practitioners and administrators in the development of business-practice models for new and existing clinical pharmacy services in ambulatory settings. This document provides detailed instructions, examples, and resources on conducting a market assessment and a needs assessment, types of clinical services, operations, legal and regulatory issues, marketing and promotion, service development and exit plan, evaluation of service outcomes, and financial considerations in the development of a clinical pharmacy service in the ambulatory environment. Available literature is summarized, and an appendix provides valuable citations and resources. As ambulatory care practices continue to evolve, there will be increased knowledge of how to initiate and expand the services. This document is intended to serve as an essential resource to assist in the growth and development of clinical pharmacy services in the ambulatory environment.
Reyes, Cynthia; Greenbaum, Alissa; Porto, Catherine; Russell, John C
2017-03-01
Accurate clinical documentation (CD) is necessary for many aspects of modern health care, including excellent communication, quality metrics reporting, and legal documentation. New requirements have mandated adoption of ICD-10-CM coding systems, adding another layer of complexity to CD. A clinical documentation improvement (CDI) and ICD-10 training program was created for health care providers in our academic surgery department. We aimed to assess the impact of our CDI curriculum by comparing quality metrics, coding, and reimbursement before and after implementation of our CDI program. A CDI/ICD-10 training curriculum was instituted in September 2014 for all members of our university surgery department. The curriculum consisted of didactic lectures, 1-on-1 provider training, case reviews, e-learning modules, and CD queries from nurse CDI staff and hospital coders. Outcomes parameters included monthly documentation completion rates, severity of illness (SOI), risk of mortality (ROM), case-mix index (CMI), all-payer refined diagnosis-related groups (APR-DRG), and Surgical Care Improvement Program (SCIP) metrics. Financial gain from responses to CDI queries was determined retrospectively. Surgery department delinquent documentation decreased by 85% after CDI implementation. Compliance with SCIP measures improved from 85% to 97%. Significant increases in surgical SOI, ROM, CMI, and APR-DRG (all p < 0.01) were found after CDI/ICD-10 training implementation. Provider responses to CDI queries resulted in an estimated $4,672,786 increase in charges. Clinical documentation improvement/ICD-10 training in an academic surgery department is an effective method to improve documentation rates, increase the hospital estimated reimbursement based on more accurate CD, and provide better compliance with surgical quality measures. Copyright © 2016 American College of Surgeons. All rights reserved.
Term Familiarity to indicate Perceived and Actual Difficulty of Text in Medical Digital Libraries.
Leroy, Gondy; Endicott, James E
2011-10-01
With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, term familiarity , which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.
Krebs, Erin E; Bair, Matthew J; Carey, Timothy S; Weinberger, Morris
2010-03-01
Researchers and quality improvement advocates sometimes use review of chart-documented pain care processes to assess the quality of pain management. Studies have found that primary care providers frequently fail to document pain assessment and management. To assess documentation of pain care processes in an academic primary care clinic and evaluate the validity of this documentation as a measure of pain care delivered. Prospective observational study. 237 adult patients at a university-affiliated internal medicine clinic who reported any pain in the last week. Immediately after a visit, we asked patients to report the pain treatment they received. Patients completed the Brief Pain Inventory (BPI) to assess pain severity at baseline and 1 month later. We extracted documentation of pain care processes from the medical record and used kappa statistics to assess agreement between documentation and patient report of pain treatment. Using multivariable linear regression, we modeled whether documented or patient-reported pain care predicted change in pain at 1 month. Participants' mean age was 53.7 years, 66% were female, and 74% had chronic pain. Physicians documented pain assessment for 83% of visits. Patients reported receiving pain treatment more often (67%) than was documented by physicians (54%). Agreement between documentation and patient report was moderate for receiving a new pain medication (k = 0.50) and slight for receiving pain management advice (k = 0.13). In multivariable models, documentation of new pain treatment was not associated with change in pain (p = 0.134). In contrast, patient-reported receipt of new pain treatment predicted pain improvement (p = 0.005). Chart documentation underestimated pain care delivered, compared with patient report. Documented pain care processes had no relationship with pain outcomes at 1 month, but patient report of receiving care predicted clinically significant improvement. Chart review measures may not accurately reflect the pain management patients receive in primary care.
NASA Astrophysics Data System (ADS)
de Andrade Lopes, Alneu; Minghim, Rosane; Melo, Vinícius; Paulovich, Fernando V.
2006-01-01
The current availability of information many times impair the tasks of searching, browsing and analyzing information pertinent to a topic of interest. This paper presents a methodology to create a meaningful graphical representation of documents corpora targeted at supporting exploration of correlated documents. The purpose of such an approach is to produce a map from a document body on a research topic or field based on the analysis of their contents, and similarities amongst articles. The document map is generated, after text pre-processing, by projecting the data in two dimensions using Latent Semantic Indexing. The projection is followed by hierarchical clustering to support sub-area identification. The map can be interactively explored, helping to narrow down the search for relevant articles. Tests were performed using a collection of documents pre-classified into three research subject classes: Case-Based Reasoning, Information Retrieval, and Inductive Logic Programming. The map produced was capable of separating the main areas and approaching documents by their similarity, revealing possible topics, and identifying boundaries between them. The tool can deal with the exploration of inter-topics and intra-topic relationship and is useful in many contexts that need deciding on relevant articles to read, such as scientific research, education, and training.
ERIC Educational Resources Information Center
Larsen, Kent S., Ed.
Materials in this resource document were compiled for use in a Washington seminar directed to the interests of state and local government to develop strategies for privacy protection. Included are the texts of issue papers and supporting documents in the following subject areas: (1) criminal justice information; (2) public employee records; (3)…
AFT-QuEST Consortium Yearbook. Proceedings of the AFT-QuEST Consortium (April 22-26, 1973).
ERIC Educational Resources Information Center
American Federation of Teachers, Washington, DC.
This document is a report on the proceedings of the 1973 American Federation of Teachers-Quality Educational Standards in Teaching (AFT-QuEST) consortium sponsored by the AFT. Included in this document are the texts of speeches and outlines of workshops and iscussions. The document is divided into the following sections: goals, major proposals,…
A Comparison of Product Realization Frameworks
1993-10-01
software (integrated FrameMaker ). Also included are BOLD for on-line documentation delivery, printer/plotter support, and 18 network licensing support. AMPLE...are built with DSS. Documentation tools include an on-line information system (BOLD), text editing (Notepad), word processing (integrated FrameMaker ...within an application. FrameMaker is fully integrated with the Falcon Framework to provide consistent documentation capabilities within engineering
Conjunctive Cohesion in English Language EU Documents--A Corpus-Based Analysis and Its Implications
ERIC Educational Resources Information Center
Trebits, Anna
2009-01-01
This paper reports the findings of a study which forms part of a larger-scale research project investigating the use of English in the documents of the European Union (EU). The documents of the EU show various features of texts written for legal, business and other specific purposes. Moreover, the translation services of the EU institutions often…
Research notes : information at your fingertips!
DOT National Transportation Integrated Search
2000-03-01
TRIS Online includes full-text reports or links to publishers or suppliers of the original documents. You will find titles, publication dates, authors, abstracts, and document sources. : Each year over 20,000 new records are added to TRIS. The databa...
Supporting the education evidence portal via text mining
Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John
2010-01-01
The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679
NASA Astrophysics Data System (ADS)
Tirupattur, Naveen; Lapish, Christopher C.; Mukhopadhyay, Snehasis
2011-06-01
Text mining, sometimes alternately referred to as text analytics, refers to the process of extracting high-quality knowledge from the analysis of textual data. Text mining has wide variety of applications in areas such as biomedical science, news analysis, and homeland security. In this paper, we describe an approach and some relatively small-scale experiments which apply text mining to neuroscience research literature to find novel associations among a diverse set of entities. Neuroscience is a discipline which encompasses an exceptionally wide range of experimental approaches and rapidly growing interest. This combination results in an overwhelmingly large and often diffuse literature which makes a comprehensive synthesis difficult. Understanding the relations or associations among the entities appearing in the literature not only improves the researchers current understanding of recent advances in their field, but also provides an important computational tool to formulate novel hypotheses and thereby assist in scientific discoveries. We describe a methodology to automatically mine the literature and form novel associations through direct analysis of published texts. The method first retrieves a set of documents from databases such as PubMed using a set of relevant domain terms. In the current study these terms yielded a set of documents ranging from 160,909 to 367,214 documents. Each document is then represented in a numerical vector form from which an Association Graph is computed which represents relationships between all pairs of domain terms, based on co-occurrence. Association graphs can then be subjected to various graph theoretic algorithms such as transitive closure and cycle (circuit) detection to derive additional information, and can also be visually presented to a human researcher for understanding. In this paper, we present three relatively small-scale problem-specific case studies to demonstrate that such an approach is very successful in replicating a neuroscience expert's mental model of object-object associations entirely by means of text mining. These preliminary results provide the confidence that this type of text mining based research approach provides an extremely powerful tool to better understand the literature and drive novel discovery for the neuroscience community.
Development of a Search Strategy for an Evidence Based Retrieval Service
Ho, Gah Juan; Liew, Su May; Ng, Chirk Jenn; Hisham Shunmugam, Ranita; Glasziou, Paul
2016-01-01
Background Physicians are often encouraged to locate answers for their clinical queries via an evidence-based literature search approach. The methods used are often not clearly specified. Inappropriate search strategies, time constraint and contradictory information complicate evidence retrieval. Aims Our study aimed to develop a search strategy to answer clinical queries among physicians in a primary care setting Methods Six clinical questions of different medical conditions seen in primary care were formulated. A series of experimental searches to answer each question was conducted on 3 commonly advocated medical databases. We compared search results from a PICO (patients, intervention, comparison, outcome) framework for questions using different combinations of PICO elements. We also compared outcomes from doing searches using text words, Medical Subject Headings (MeSH), or a combination of both. All searches were documented using screenshots and saved search strategies. Results Answers to all 6 questions using the PICO framework were found. A higher number of systematic reviews were obtained using a 2 PICO element search compared to a 4 element search. A more optimal choice of search is a combination of both text words and MeSH terms. Despite searching using the Systematic Review filter, many non-systematic reviews or narrative reviews were found in PubMed. There was poor overlap between outcomes of searches using different databases. The duration of search and screening for the 6 questions ranged from 1 to 4 hours. Conclusion This strategy has been shown to be feasible and can provide evidence to doctors’ clinical questions. It has the potential to be incorporated into an interventional study to determine the impact of an online evidence retrieval system. PMID:27935993
v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text
Divita, Guy; Carter, Marjorie E.; Tran, Le-Thuy; Redd, Doug; Zeng, Qing T; Duvall, Scott; Samore, Matthew H.; Gundlapalli, Adi V.
2016-01-01
Introduction: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of “best-of-breed” functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. Background: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. Innovation: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. Discussion: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. Conclusion: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records. PMID:27683667
Illinois Occupational Skill Standards: Clinical Laboratory Science/Biotechnology Cluster.
ERIC Educational Resources Information Center
Illinois Occupational Skill Standards and Credentialing Council, Carbondale.
This document, which is intended to serve as a guide for workforce preparation program providers, details the Illinois Occupational Skill Standards for clinical laboratory occupations programs. The document begins with a brief overview of the Illinois perspective on occupational skill standards and credentialing, the process used to develop the…
Documenting Art Therapy Clinical Knowledge Using Interviews
ERIC Educational Resources Information Center
Regev, Dafna
2017-01-01
Practicing art therapists have vast stores of knowledge and experience, but in most cases, their work is not documented, and their clinical knowledge does not enter the academic discourse. This article proposes a systematic approach to the collection of practice knowledge about art therapy based on conducting interviews with art therapists who…
ERIC Educational Resources Information Center
Lobo, Rosale Constance
2017-01-01
Registered Nurses use clinical documentation to describe care planning processes, measure quality outcomes, support reimbursement, and defend litigation. The Connecticut Department of Health, guided by federal Conditions of Participation, defines state-level healthcare policy to include required care planning processes. Nurses are educated in care…
Improving Warfarin Management Within the Medical Home: A Health-System Approach.
Rose, Anne E; Robinson, Erin N; Premo, Joan A; Hauschild, Lori J; Trapskin, Philip J; McBride, Ann M
2017-03-01
Anticoagulation clinics have been considered the optimal strategy for warfarin management with demonstrated improved patient outcomes through increased time in therapeutic international normalized ratio (INR) range, decreased critical INR values, and decreased anticoagulation-related adverse events. However, not all health systems are able to support a specialized anticoagulation clinic or may see patient volume exceed available anticoagulation clinic resources. The purpose of this study was to utilize an anticoagulation clinic model to standardize warfarin management in a primary care clinic setting. A warfarin management program was developed that included standardized patient assessment, protocolized warfarin-dosing algorithm, and electronic documentation and reporting tools. Primary care clinics were targeted for training and implementation of this program. The warfarin management program was applied to over 2000 patients and implemented at 39 clinic sites. A total of 160 nurses and 15 pharmacists were trained on the program. Documentation of warfarin dose and date of the next INR increased from 70% to 90% (P <.0001), documentation occurring within 24 hours of the INR result increased from 75% to 87% (P <.0001), and monitoring the INR at least every 4 weeks increased from 71% to 83% (P <.0001) per patient encounter. Time in therapeutic INR range improved from 65% to 75%. Incorporating a standardized approach to warfarin management in the primary care setting significantly improves warfarin-related documentation and time in therapeutic INR range. Copyright © 2016 Elsevier Inc. All rights reserved.
The Council of Europe and Sport, 1966-1998. Volume III: Texts of the Anti-Doping Convention.
ERIC Educational Resources Information Center
Council of Europe, Strasbourg (France).
This document presents texts in the field of sports and doping that were adopted by various committees of the Council of Europe. The seven sections present: (1) "Texts Adopted by the Committee of Ministers, 1996-1988"; (2) "Texts Adopted at the Conferences of European Ministers Responsible for Sport Since 1978" and…
A Study of Readability of Texts in Bangla through Machine Learning Approaches
ERIC Educational Resources Information Center
Sinha, Manjira; Basu, Anupam
2016-01-01
In this work, we have investigated text readability in Bangla language. Text readability is an indicator of the suitability of a given document with respect to a target reader group. Therefore, text readability has huge impact on educational content preparation. The advances in the field of natural language processing have enabled the automatic…
A Survey of Text Materials Used in Aviation Maintenance Technician Schools. Final Report.
ERIC Educational Resources Information Center
Allen, David; Bowers, William K.
The report documents the results of a national survey of book publishing firms and aviation maintenance technician schools to (1) identify the text materials used in the training of aviation mechanics; (2) appraise the suitability and availability of identified text materials; and (3) determine the adequacy of the text materials in meeting the…
Automation for System Safety Analysis
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Fleming, Land; Throop, David; Thronesbery, Carroll; Flores, Joshua; Bennett, Ted; Wennberg, Paul
2009-01-01
This presentation describes work to integrate a set of tools to support early model-based analysis of failures and hazards due to system-software interactions. The tools perform and assist analysts in the following tasks: 1) extract model parts from text for architecture and safety/hazard models; 2) combine the parts with library information to develop the models for visualization and analysis; 3) perform graph analysis and simulation to identify and evaluate possible paths from hazard sources to vulnerable entities and functions, in nominal and anomalous system-software configurations and scenarios; and 4) identify resulting candidate scenarios for software integration testing. There has been significant technical progress in model extraction from Orion program text sources, architecture model derivation (components and connections) and documentation of extraction sources. Models have been derived from Internal Interface Requirements Documents (IIRDs) and FMEA documents. Linguistic text processing is used to extract model parts and relationships, and the Aerospace Ontology also aids automated model development from the extracted information. Visualizations of these models assist analysts in requirements overview and in checking consistency and completeness.
Correcting geometric and photometric distortion of document images on a smartphone
NASA Astrophysics Data System (ADS)
Simon, Christian; Williem; Park, In Kyu
2015-01-01
A set of document image processing algorithms for improving the optical character recognition (OCR) capability of smartphone applications is presented. The scope of the problem covers the geometric and photometric distortion correction of document images. The proposed framework was developed to satisfy industrial requirements. It is implemented on an off-the-shelf smartphone with limited resources in terms of speed and memory. Geometric distortions, i.e., skew and perspective distortion, are corrected by sending horizontal and vertical vanishing points toward infinity in a downsampled image. Photometric distortion includes image degradation from moiré pattern noise and specular highlights. Moiré pattern noise is removed using low-pass filters with different sizes independently applied to the background and text region. The contrast of the text in a specular highlighted area is enhanced by locally enlarging the intensity difference between the background and text while the noise is suppressed. Intensive experiments indicate that the proposed methods show a consistent and robust performance on a smartphone with a runtime of less than 1 s.
Boost OCR accuracy using iVector based system combination approach
NASA Astrophysics Data System (ADS)
Peng, Xujun; Cao, Huaigu; Natarajan, Prem
2015-01-01
Optical character recognition (OCR) is a challenging task because most existing preprocessing approaches are sensitive to writing style, writing material, noises and image resolution. Thus, a single recognition system cannot address all factors of real document images. In this paper, we describe an approach to combine diverse recognition systems by using iVector based features, which is a newly developed method in the field of speaker verification. Prior to system combination, document images are preprocessed and text line images are extracted with different approaches for each system, where iVector is transformed from a high-dimensional supervector of each text line and is used to predict the accuracy of OCR. We merge hypotheses from multiple recognition systems according to the overlap ratio and the predicted OCR score of text line images. We present evaluation results on an Arabic document database where the proposed method is compared against the single best OCR system using word error rate (WER) metric.
Counting seizures: The primary outcome measure in epileptology from the patients' perspective.
Blachut, Barbara; Hoppe, Christian; Surges, Rainer; Stahl, Jutta; Elger, Christian E; Helmstaedter, Christoph
2015-07-01
Patient-reported seizure counts represent a key outcome measure for individual treatments and clinical studies in epileptology. Video-EEG based research, however, demonstrated lack of validity due to underreporting. Here we examined the practice of keeping seizure diaries and the patients' attitudes toward seizure counting. Anticipating a low return rate, a comprehensive survey was mailed to 1100 adult outpatients. Besides methods and reasons to document or not to document seizures, the questionnaire addressed clinical, personality and sociodemographic characteristics as well as the subjective experience of seizures. Questionnaires from 170 patients (15%) could be included in our analysis. Patients estimated to be aware of 5.3 out of 10 daytime seizures (nocturnal seizures: 2.6) while they supposed that relatives/colleagues noticed 7.1 (nocturnal: 4.6). Almost two-thirds of the patients reported to keep a seizure diary with a self-estimated documentation rate of 8.7 out of 10 noticed daytime seizures (nocturnal: 7.7). Documenters and non-documenters showed only marginal group differences with regard to clinical, personality and sociodemographic characteristics. Importantly, patients were more committed to keep a seizure diary when they judged it to be relevant for clinical treatment decisions. Patients appear to know that they underreport seizures. According to their view, seizure unawareness as induced by seizures themselves seems to be a more important factor than omitting documentation of noticed seizures. Thus, the potential to improve the validity of seizure diaries of electronic devices which facilitate documenting noticed seizures appears limited. Copyright © 2015 British Epilepsy Association. Published by Elsevier Ltd. All rights reserved.
High-Reproducibility and High-Accuracy Method for Automated Topic Classification
NASA Astrophysics Data System (ADS)
Lancichinetti, Andrea; Sirer, M. Irmak; Wang, Jane X.; Acuna, Daniel; Körding, Konrad; Amaral, Luís A. Nunes
2015-01-01
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent searching, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
Evaluation of clinical pharmacy services in a hematology/oncology outpatient setting.
Shah, Sachin; Dowell, Jonathan; Greene, Shane
2006-09-01
The Veterans Affairs North Texas Health Care System in Dallas, TX, provides a unique opportunity for clinical pharmacists to work as providers. Even though clinical pharmacists are actively involved in patient care, many of their efforts remain undocumented, resulting in an underestimation of the importance of their services and missed opportunities for improvements and new directions. To document and evaluate the services of a hematology/oncology clinical pharmacy in the outpatient setting. Pendragon Forms 3.2 software was used to design the documentation template. The template was designed to collect diagnoses, supportive care issues, drug-specific interventions, and prescriptions written. This template was uploaded to the personal digital assistant (PDA) for documentation. Patient-specific information was documented in a password-protected PDA. Data collected from November 1, 2002, to October 31, 2003, were retrospectively analyzed. Clinical pharmacists were involved in 423 patient visits for chemotherapy follow-up or disease management. Cancer diagnoses included colorectal (n = 99), multiple myeloma (59), non-small cell lung (56), chronic lymphocytic leukemia (44), myelodysplastic syndromes (22), and chronic myelogenous leukemia (19). During the 423 patient visits, 342 supportive care issues were addressed including anemia (34%), pain management (22%), constipation/diarrhea (15%), and nausea/vomiting (8%). Major drug-specific interventions included drug addition (41%), discontinuation (23%), and adjustment (21%). Four hundred forty-five prescriptions were filled, of which 181 were new and 150 were refilled. This is the first study, as of July 25, 2006, to document considerable contribution of an outpatient clinical pharmacist in direct cancer patient care. Although the disease management and supportive care issues addressed here may differ based on institution and patient population, the results of our study show that clinical pharmacists have ever-growing roles in the management of these patients.
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.
Liu, Yuanchao; Liu, Ming; Wang, Xin
2015-01-01
The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach. PMID:25794172
Use of Co-occurrences for Temporal Expressions Annotation
NASA Astrophysics Data System (ADS)
Craveiro, Olga; Macedo, Joaquim; Madeira, Henrique
The annotation or extraction of temporal information from text documents is becoming increasingly important in many natural language processing applications such as text summarization, information retrieval, question answering, etc.. This paper presents an original method for easy recognition of temporal expressions in text documents. The method creates semantically classified temporal patterns, using word co-occurrences obtained from training corpora and a pre-defined seed keywords set, derived from the used language temporal references. A participation on a Portuguese named entity evaluation contest showed promising effectiveness and efficiency results. This approach can be adapted to recognize other type of expressions or languages, within other contexts, by defining the suitable word sets and training corpora.
Garcia-Garcia, Hector M; McFadden, Eugène P; Farb, Andrew; Mehran, Roxana; Stone, Gregg W; Spertus, John; Onuma, Yoshinobu; Morel, Marie-Angèle; van Es, Gerrit-Anne; Zuckerman, Bram; Fearon, William F; Taggart, David; Kappetein, Arie-Pieter; Krucoff, Mitchell W; Vranckx, Pascal; Windecker, Stephan; Cutlip, Donald; Serruys, Patrick W
2018-06-14
The Academic Research Consortium (ARC)-2 initiative revisited the clinical and angiographic end point definitions in coronary device trials, proposed in 2007, to make them more suitable for use in clinical trials that include increasingly complex lesion and patient populations and incorporate novel devices such as bioresorbable vascular scaffolds. In addition, recommendations for the incorporation of patient-related outcomes in clinical trials are proposed. Academic Research Consortium-2 is a collaborative effort between academic research organizations in the United States and Europe, device manufacturers, and European, US, and Asian regulatory bodies. Several in-person meetings were held to discuss the changes that have occurred in the device landscape and in clinical trials and regulatory pathways in the last decade. The consensus-based end point definitions in this document are endorsed by the stakeholders of this document and strongly advocated for clinical trial purposes. This Academic Research Consortium-2 document provides further standardization of end point definitions for coronary device trials, incorporating advances in technology and knowledge. Their use will aid interpretation of trial outcomes and comparison among studies, thus facilitating the evaluation of the safety and effectiveness of these devices.
Ensuring Cross-Cultural Equivalence in Translation of Research Consents and Clinical Documents
Lee, Cheng-Chih; Li, Denise; Arai, Shoshana; Puntillo, Kathleen
2010-01-01
The aim of this article is to describe a formal process used to translate research study materials from English into traditional Chinese characters. This process may be useful for translating documents for use by both research participants and clinical patients. A modified Brislin model was used as the systematic translation process. Four bilingual translators were involved, and a Flaherty 3-point scale was used to evaluate the translated documents. The linguistic discrepancies that arise in the process of ensuring cross-cultural congruency or equivalency between the two languages are presented to promote the development of patient-accessible cross-cultural documents. PMID:18948451
ASM Based Synthesis of Handwritten Arabic Text Pages
Al-Hamadi, Ayoub; Elzobi, Moftah; El-etriby, Sherif; Ghoneim, Ahmed
2015-01-01
Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available. PMID:26295059
ASM Based Synthesis of Handwritten Arabic Text Pages.
Dinges, Laslo; Al-Hamadi, Ayoub; Elzobi, Moftah; El-Etriby, Sherif; Ghoneim, Ahmed
2015-01-01
Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.
Document similarity measures and document browsing
NASA Astrophysics Data System (ADS)
Ahmadullin, Ildus; Fan, Jian; Damera-Venkata, Niranjan; Lim, Suk Hwan; Lin, Qian; Liu, Jerry; Liu, Sam; O'Brien-Strain, Eamonn; Allebach, Jan
2011-03-01
Managing large document databases is an important task today. Being able to automatically com- pare document layouts and classify and search documents with respect to their visual appearance proves to be desirable in many applications. We measure single page documents' similarity with respect to distance functions between three document components: background, text, and saliency. Each document component is represented as a Gaussian mixture distribution; and distances between dierent documents' components are calculated as probabilistic similarities between corresponding distributions. The similarity measure between documents is represented as a weighted sum of the components' distances. Using this document similarity measure, we propose a browsing mechanism operating on a document dataset. For these purposes, we use a hierarchical browsing environment which we call the document similarity pyramid. It allows the user to browse a large document dataset and to search for documents in the dataset that are similar to the query. The user can browse the dataset on dierent levels of the pyramid, and zoom into the documents that are of interest.
Takeda, Toshihiro; Ueda, Kanayo; Manabe, Shiro; Teramoto, Kei; Mihara, Naoki; Matsumura, Yasushi
2013-01-01
Standard Japanese electronic medical record (EMR) systems are associated with major shortcomings. For example, they do not assure lifelong readability of records because each document requires its own viewing software program, a system that is difficult to maintain over long periods of time. It can also be difficult for users to comprehend a patient's clinical history because different classes of documents can only be accessed from their own window. To address these problems, we developed a document-based electronic medical record that aggregates all documents for a patient in a PDF or DocuWorks format. We call this system the Document Archiving and Communication System (DACS). There are two types of viewers in the DACS: the Matrix View, which provides a time line of a patient's history, and the Tree View, which stores the documents in hierarchical document classes. We placed 2,734 document classes into 11 categories. A total of 22,3972 documents were entered per month. The frequency of use of the DACS viewer was 268,644 instances per month. The DACS viewer was used to assess a patient's clinical history.
Pankau, Thomas; Wichmann, Gunnar; Neumuth, Thomas; Preim, Bernhard; Dietz, Andreas; Stumpp, Patrick; Boehm, Andreas
2015-10-01
Many treatment approaches are available for head and neck cancer (HNC), leading to challenges for a multidisciplinary medical team in matching each patient with an appropriate regimen. In this effort, primary diagnostics and its reliable documentation are indispensable. A three-dimensional (3D) documentation system was developed and tested to determine its influence on interpretation of these data, especially for TNM classification. A total of 42 HNC patient data sets were available, including primary diagnostics such as panendoscopy, performed and evaluated by an experienced head and neck surgeon. In addition to the conventional panendoscopy form and report, a 3D representation was generated with the "Tumor Therapy Manager" (TTM) software. These cases were randomly re-evaluated by 11 experienced otolaryngologists from five hospitals, half with and half without the TTM data. The accuracy of tumor staging was assessed by pre-post comparison of the TNM classification. TNM staging showed no significant differences in tumor classification (T) with and without 3D from TTM. However, there was a significant decrease in standard deviation from 0.86 to 0.63 via TTM ([Formula: see text]). In nodal staging without TTM, the lymph nodes (N) were significantly underestimated with [Formula: see text] classes compared with [Formula: see text] with TTM ([Formula: see text]). Likewise, the standard deviation was reduced from 0.79 to 0.69 ([Formula: see text]). There was no influence of TTM results on the evaluation of distant metastases (M). TNM staging was more reproducible and nodal staging more accurate when 3D documentation of HNC primary data was available to experienced otolaryngologists. The more precise assessment of the tumor classification with TTM should provide improved decision-making concerning therapy, especially within the interdisciplinary tumor board.
On the Reconstruction of Text Phylogeny Trees: Evaluation and Analysis of Textual Relationships
Marmerola, Guilherme D.; Dias, Zanoni; Goldenstein, Siome; Rocha, Anderson
2016-01-01
Over the history of mankind, textual records change. Sometimes due to mistakes during transcription, sometimes on purpose, as a way to rewrite facts and reinterpret history. There are several classical cases, such as the logarithmic tables, and the transmission of antique and medieval scholarship. Today, text documents are largely edited and redistributed on the Web. Articles on news portals and collaborative platforms (such as Wikipedia), source code, posts on social networks, and even scientific publications or literary works are some examples in which textual content can be subject to changes in an evolutionary process. In this scenario, given a set of near-duplicate documents, it is worthwhile to find which one is the original and the history of changes that created the whole set. Such functionality would have immediate applications on news tracking services, detection of plagiarism, textual criticism, and copyright enforcement, for instance. However, this is not an easy task, as textual features pointing to the documents’ evolutionary direction may not be evident and are often dataset dependent. Moreover, side information, such as time stamps, are neither always available nor reliable. In this paper, we propose a framework for reliably reconstructing text phylogeny trees, and seamlessly exploring new approaches on a wide range of scenarios of text reusage. We employ and evaluate distinct combinations of dissimilarity measures and reconstruction strategies within the proposed framework, and evaluate each approach with extensive experiments, including a set of artificial near-duplicate documents with known phylogeny, and from documents collected from Wikipedia, whose modifications were made by Internet users. We also present results from qualitative experiments in two different applications: text plagiarism and reconstruction of evolutionary trees for manuscripts (stemmatology). PMID:27992446
NASA Astrophysics Data System (ADS)
Boling, M. E.
1989-09-01
Prototypes were assembled pursuant to recommendations made in report K/DSRD-96, Issues and Approaches for Electronic Document Approval and Transmittal Using Digital Signatures and Text Authentication, and to examine and discover the possibilities for integrating available hardware and software to provide cost effective systems for digital signatures and text authentication. These prototypes show that on a LAN, a multitasking, windowed, mouse/keyboard menu-driven interface can be assembled to provide easy and quick access to bit-mapped images of documents, electronic forms and electronic mail messages with a means to sign, encrypt, deliver, receive or retrieve and authenticate text and signatures. In addition they show that some of this same software may be used in a classified environment using host to terminal transactions to accomplish these same operations. Finally, a prototype was developed demonstrating that binary files may be signed electronically and sent by point to point communication and over ARPANET to remote locations where the authenticity of the code and signature may be verified. Related studies on the subject of electronic signatures and text authentication using public key encryption were done within the Department of Energy. These studies include timing studies of public key encryption software and hardware and testing of experimental user-generated host resident software for public key encryption. This software used commercially available command-line source code. These studies are responsive to an initiative within the Office of the Secretary of Defense (OSD) for the protection of unclassified but sensitive data. It is notable that these related studies are all built around the same commercially available public key encryption products from the private sector and that the software selection was made independently by each study group.
Visualizing the semantic content of large text databases using text maps
NASA Technical Reports Server (NTRS)
Combs, Nathan
1993-01-01
A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.
[Microbiological diagnosis of human immunodeficiency virus infection].
Álvarez Estévez, Marta; Reina González, Gabriel; Aguilera Guirao, Antonio; Rodríguez Martín, Carmen; García García, Federico
2015-10-01
This document attempts to update the main tasks and roles of the Clinical Microbiology laboratory in HIV diagnosis and monitoring. The document is divided into three parts. The first deals with HIV diagnosis and how serological testing has changed in the last few years, aiming to improve diagnosis and to minimize missed opportunities for diagnosis. Technological improvements for HIV Viral Load are shown in the second part of the document, which also includes a detailed description of the clinical significance of low-level and very low-level viremia. Finally, the third part of the document deals with resistance to antiretroviral drugs, incorporating clinical indications for integrase and tropism testing, as well as the latest knowledge on minority variants. Copyright © 2014 Elsevier España, S.L.U. y Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
Enriching a document collection by integrating information extraction and PDF annotation
NASA Astrophysics Data System (ADS)
Powley, Brett; Dale, Robert; Anisimoff, Ilya
2009-01-01
Modern digital libraries offer all the hyperlinking possibilities of the World Wide Web: when a reader finds a citation of interest, in many cases she can now click on a link to be taken to the cited work. This paper presents work aimed at providing the same ease of navigation for legacy PDF document collections that were created before the possibility of integrating hyperlinks into documents was ever considered. To achieve our goal, we need to carry out two tasks: first, we need to identify and link citations and references in the text with high reliability; and second, we need the ability to determine physical PDF page locations for these elements. We demonstrate the use of a high-accuracy citation extraction algorithm which significantly improves on earlier reported techniques, and a technique for integrating PDF processing with a conventional text-stream based information extraction pipeline. We demonstrate these techniques in the context of a particular document collection, this being the ACL Anthology; but the same approach can be applied to other document sets.
Munkhdalai, Tsendsuren; Li, Meijing; Batsuren, Khuyagbaatar; Park, Hyeon Ah; Choi, Nak Hyeon; Ryu, Keun Ho
2015-01-01
Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-01
... restrictions has been revised. The Designated List and the regulatory text in that document contain language which is inadvertently not consistent with the rest of the document as to the historical period that the...
10 CFR 961.11 - Text of the contract.
Code of Federal Regulations, 2014 CFR
2014-01-01
... program including information on cost projections, project plans and progress reports. 5. (a) Beginning on...-type documents or computer software (including computer programs, computer software data bases, and computer software documentation). Examples of technical data include research and engineering data...
10 CFR 961.11 - Text of the contract.
Code of Federal Regulations, 2013 CFR
2013-01-01
... program including information on cost projections, project plans and progress reports. 5. (a) Beginning on...-type documents or computer software (including computer programs, computer software data bases, and computer software documentation). Examples of technical data include research and engineering data...
76 FR 17353 - Aviation Communications
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-29
... publication). The full text of this document is available for public inspection and copying during regular... Printing, Inc., 445 12th Street, SW., Room CY-B402, Washington, DC 20554. The full text may also be...
Implementation of a School-wide Clinical Intervention Documentation System
Stevenson, T. Lynn; Fox, Brent I.; Andrus, Miranda; Carroll, Dana
2011-01-01
Objective. To evaluate the effectiveness and impact of a customized Web-based software program implemented in 2006 for school-wide documentation of clinical interventions by pharmacy practice faculty members, pharmacy residents, and student pharmacists. Methods. The implementation process, directed by a committee of faculty members and school administrators, included preparation and refinement of the software, user training, development of forms and reports, and integration of the documentation process within the curriculum. Results. Use of the documentation tool consistently increased from May 2007 to December 2010. Over 187,000 interventions were documented with over $6.2 million in associated cost avoidance. Conclusions. Successful implementation of a school-wide documentation tool required considerable time from the oversight committee and a comprehensive training program for all users, with ongoing monitoring of data collection practices. Data collected proved to be useful to show the impact of faculty members, residents, and student pharmacists at affiliated training sites. PMID:21829264