Sample records for information resource dictionary

  1. Concept dictionary creation and maintenance under resource constraints: lessons from the AMPATH Medical Record System.

    PubMed

    Were, Martin C; Mamlin, Burke W; Tierney, William M; Wolfe, Ben; Biondich, Paul G

    2007-10-11

    The challenges of creating and maintaining concept dictionaries are compounded in resource-limited settings. Approaches to alleviate this burden need to be based on information derived in these settings. We created a concept dictionary and evaluated new concept proposals for an open source EMR in a resource-limited setting. Overall, 87% of the concepts in the initial dictionary were used. There were 5137 new concepts proposed, with 77% of these proposed only once. Further characterization of new concept proposals revealed that 41% were due to deficiency in the existing dictionary, and 19% were synonyms to existing concepts. 25% of the requests contained misspellings, 41% were complex terms, and 17% were ambiguous. Given the resource-intensive nature of dictionary creation and maintenance, there should be considerations for centralizing the concept dictionary service, using standards, prioritizing concept proposals, and redesigning the user-interface to reduce this burden in settings with limited resources.

  2. Concept Dictionary Creation and Maintenance Under Resource Constraints: Lessons from the AMPATH Medical Record System

    PubMed Central

    Were, Martin C.; Mamlin, Burke W.; Tierney, William M.; Wolfe, Ben; Biondich, Paul G.

    2007-01-01

    The challenges of creating and maintaining concept dictionaries are compounded in resource-limited settings. Approaches to alleviate this burden need to be based on information derived in these settings. We created a concept dictionary and evaluated new concept proposals for an open source EMR in a resource-limited setting. Overall, 87% of the concepts in the initial dictionary were used. There were 5137 new concepts proposed, with 77% of these proposed only once. Further characterization of new concept proposals revealed that 41% were due to deficiency in the existing dictionary, and 19% were synonyms to existing concepts. 25% of the requests contained misspellings, 41% were complex terms, and 17% were ambiguous. Given the resource-intensive nature of dictionary creation and maintenance, there should be considerations for centralizing the concept dictionary service, using standards, prioritizing concept proposals, and redesigning the user-interface to reduce this burden in settings with limited resources. PMID:18693945

  3. The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval

    DTIC Science & Technology

    2003-02-01

    FEB 2003 2. REPORT TYPE 3. DATES COVERED 00-00-2003 to 00-00-2003 4. TITLE AND SUBTITLE The Effect of Bilingual Term List Size on Dictionary ...298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 The Effect of Bilingual Term List Size on Dictionary -Based Cross-Language Information Retrieval Dina...are extensively used as a resource for dictionary -based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written

  4. Developing a hybrid dictionary-based bio-entity recognition technique.

    PubMed

    Song, Min; Yu, Hwanjo; Han, Wook-Shin

    2015-01-01

    Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall.

  5. Developing a hybrid dictionary-based bio-entity recognition technique

    PubMed Central

    2015-01-01

    Background Bio-entity extraction is a pivotal component for information extraction from biomedical literature. The dictionary-based bio-entity extraction is the first generation of Named Entity Recognition (NER) techniques. Methods This paper presents a hybrid dictionary-based bio-entity extraction technique. The approach expands the bio-entity dictionary by combining different data sources and improves the recall rate through the shortest path edit distance algorithm. In addition, the proposed technique adopts text mining techniques in the merging stage of similar entities such as Part of Speech (POS) expansion, stemming, and the exploitation of the contextual cues to further improve the performance. Results The experimental results show that the proposed technique achieves the best or at least equivalent performance among compared techniques, GENIA, MESH, UMLS, and combinations of these three resources in F-measure. Conclusions The results imply that the performance of dictionary-based extraction techniques is largely influenced by information resources used to build the dictionary. In addition, the edit distance algorithm shows steady performance with three different dictionaries in precision whereas the context-only technique achieves a high-end performance with three difference dictionaries in recall. PMID:26043907

  6. NCI Dictionary of Genetics Terms

    Cancer.gov

    A dictionary of more than 150 genetics-related terms written for healthcare professionals. This resource was developed to support the comprehensive, evidence-based, peer-reviewed PDQ cancer genetics information summaries.

  7. Parsing and Tagging of Bilingual Dictionary

    DTIC Science & Technology

    2003-09-01

    LAMP-TR-106 CAR-TR-991 CS-TR-4529 UMIACS-TR-2003-97 PARSING ANS TAGGING OF BILINGUAL DICTIONARY Huanfeng Ma1,2, Burcu Karagol-Ayan1,2, David... dictionaries hold great potential as a source of lexical resources for training and testing automated systems for optical character recognition, machine...translation, and cross-language information retrieval. In this paper, we describe a system for extracting term lexicons from printed bilingual dictionaries

  8. A Relational Data Dictionary Compatible with the National Bureau of Standards Information Resource Dictionary System.

    DTIC Science & Technology

    1985-12-01

    85 UNCLSSIFIED F/ 3/2 NL mhhhhhhhhhhhhl 4y 1.0 &32 MICROCOPY RESOLUTIOf TEST CKART. N NAVAL POSTGRADUATE SCHOOL Monterey, California DTIC ELECTE...Concern over corporate information resources has resulted from the explosive growth in the size, complexity and number of data bases available to...validity, and relevance, and usability of the data that is available. As a result , there has been a growing interest in two tools which .,%... provide

  9. Translation lexicon acquisition from bilingual dictionaries

    NASA Astrophysics Data System (ADS)

    Doermann, David S.; Ma, Huanfeng; Karagol-Ayan, Burcu; Oard, Douglas W.

    2001-12-01

    Bilingual dictionaries hold great potential as a source of lexical resources for training automated systems for optical character recognition, machine translation and cross-language information retrieval. In this work we describe a system for extracting term lexicons from printed copies of bilingual dictionaries. We describe our approach to page and definition segmentation and entry parsing. We have used the approach to parse a number of dictionaries and demonstrate the results for retrieval using a French-English Dictionary to generate a translation lexicon and a corpus of English queries applied to French documents to evaluation cross-language IR.

  10. D1-3: Marshfield Dictionary of Clinical and Translational Science (MD-CTS): An Online Reference for Clinical and Translational Science Terminology

    PubMed Central

    Finamore, Joe; Ray, William; Kadolph, Chris; Rastegar-Mojarad, Majid; Ye, Zhan; Jacqueline, Bohne; Tachinardi, Umberto; Mendonça, Eneida; Finnegan, Brian; Bartkowiak, Barbara; Weichelt, Bryan; Lin, Simon

    2014-01-01

    Background/Aims New terms are rapidly appearing in the literature and practice of clinical medicine and translational research. To catalog real-world usage of medical terms, we report the first construction of an online dictionary of clinical and translational medicinal terms, which are computationally generated in near real-time using a big data approach. This project is NIH CTSA-funded and developed by the Marshfield Clinic Research Foundation in conjunction with University of Wisconsin - Madison. Currently titled Marshfield Dictionary of Clinical and Translational Science (MD-CTS), this application is a Google-like word search tool. By entering a term into the search bar, MD-CTS will display that term’s definition, usage examples, contextual terms, related images, and ontological information. A prototype is available for public viewing at http://spellchecker.mfldclin.edu/. Methods We programmatically derived the lexicon for MD-CTS from scholarly communications by parsing through 15,156,745 MEDLINE abstracts and extracting all of the unique words found therein. We then ran this list through several filters in order to remove words that were not relevant for searching, such as common English words and numeric expressions. We then loaded the resulting 1,795,769 terms into SQL tables. Each term is cross-referenced with every occurrence in all abstracts in which it was found. Additional information is aggregated from Wiktionary, Bioportal, and Wikipedia in real-time and displayed on-screen. From this lexicon we created a supplemental dictionary resource (updated quarterly) to be used in Microsoft Office® products. Results We evaluated the utility of MD-CTS by creating a list of 100 words derived from recent clinical and translational medicine publications in the week of July 22, 2013. We then performed comparative searches for each term with Taber’s Cyclopedic Medical Dictionary, Stedman’s Medical Dictionary, Dorland’s Illustrated Medical Dictionary, Medical Subject Headings (MeSH), and MD-CTS. We compared our supplemental dictionary resource to OpenMedSpell for effectiveness in accuracy of term recognition. Conclusions In summary, we developed an online mobile and desktop reference, which comprehensively integrates Wiktionary (term information), Bioportal (ontological information), Wikipedia (related images), and Medline abstract information (term usage) for scientists and clinicians to browse in real-time. We also created a supplemental dictionary resource to be used in Microsoft Office® products.

  11. Toward better public health reporting using existing off the shelf approaches: A comparison of alternative cancer detection approaches using plaintext medical data and non-dictionary based feature selection.

    PubMed

    Kasthurirathne, Suranga N; Dixon, Brian E; Gichoya, Judy; Xu, Huiping; Xia, Yuni; Mamlin, Burke; Grannis, Shaun J

    2016-04-01

    Increased adoption of electronic health records has resulted in increased availability of free text clinical data for secondary use. A variety of approaches to obtain actionable information from unstructured free text data exist. These approaches are resource intensive, inherently complex and rely on structured clinical data and dictionary-based approaches. We sought to evaluate the potential to obtain actionable information from free text pathology reports using routinely available tools and approaches that do not depend on dictionary-based approaches. We obtained pathology reports from a large health information exchange and evaluated the capacity to detect cancer cases from these reports using 3 non-dictionary feature selection approaches, 4 feature subset sizes, and 5 clinical decision models: simple logistic regression, naïve bayes, k-nearest neighbor, random forest, and J48 decision tree. The performance of each decision model was evaluated using sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Decision models parameterized using automated, informed, and manual feature selection approaches yielded similar results. Furthermore, non-dictionary classification approaches identified cancer cases present in free text reports with evaluation measures approaching and exceeding 80-90% for most metrics. Our methods are feasible and practical approaches for extracting substantial information value from free text medical data, and the results suggest that these methods can perform on par, if not better, than existing dictionary-based approaches. Given that public health agencies are often under-resourced and lack the technical capacity for more complex methodologies, these results represent potentially significant value to the public health field. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. CLIR Experiments at Maryland for TREC-2002: Evidence Combination for Arabic-English Retrieval

    DTIC Science & Technology

    2002-01-01

    translation resources of three types (machine translation lexicons, a printed bilingual dictionary that had been manually rekeyed, and translation...on both the term list and the collection). • The Salmone Arabic-to-English dictionary , which was made available for use in the TREC-CLIR track by...Tufts University. No translation preference information is provided in this dictionary , but it does include rich markup describing morphology and part

  13. A dictionary to identify small molecules and drugs in free text.

    PubMed

    Hettne, Kristina M; Stierum, Rob H; Schuemie, Martijn J; Hendriksen, Peter J M; Schijvenaars, Bob J A; Mulligen, Erik M van; Kleinjans, Jos; Kors, Jan A

    2009-11-15

    From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database identifiers. We developed a dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus. Rule-based term filtering, manual check of highly frequent terms and disambiguation rules were applied. We tested the combined dictionary and the dictionaries derived from the individual resources on an annotated corpus, and conclude the following: (i) each of the different processing steps increase precision with a minor loss of recall; (ii) the overall performance of the combined dictionary is acceptable (precision 0.67, recall 0.40 (0.80 for trivial names); (iii) the combined dictionary performed better than the dictionary in the chemical recognizer OSCAR3; (iv) the performance of a dictionary based on ChemIDplus alone is comparable to the performance of the combined dictionary. The combined dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web site http://www.biosemantics.org/chemlist.

  14. Label consistent K-SVD: learning a discriminative dictionary for recognition.

    PubMed

    Jiang, Zhuolin; Lin, Zhe; Davis, Larry S

    2013-11-01

    A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistency constraint called "discriminative sparse-code error" and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. The incremental dictionary learning algorithm is presented for the situation of limited memory resources. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse-coding techniques for face, action, scene, and object category recognition under the same learning conditions.

  15. Technical Standards for Command and Control Information Systems (CCISs)

    DTIC Science & Technology

    1992-01-01

    initiation, Conformance Testing 149 management, scheduling, resource allocation , logical and IEEE P1 003 146 physical device access, interrupt handling...70 5.2.3 Remote Data Access (RDA) ........................................... 72 5.2.4 Information Resource Dictionary...146 7.2.1.2 POSIX Conformance Testing .............................. 149 7.2.2 Consortia Recommendations

  16. An Investigation into the Effect of English Learners' Dictionaries on International Students' Acquisition of the English Article System

    ERIC Educational Resources Information Center

    Miller, Julia

    2006-01-01

    Learners' dictionaries are a resource which is often overlooked by both students and teachers of English as a Second Language. The wealth of grammatical information contained within them, however, can help students to improve their English language skills and, ipso facto, their academic writing. In this study, four groups of university ESL…

  17. Evaluating Online Bilingual Dictionaries: The Case of Popular Free English-Polish Dictionaries

    ERIC Educational Resources Information Center

    Lew, Robert; Szarowska, Agnieszka

    2017-01-01

    Language learners today exhibit a strong preference for free online resources. One problem with such resources is that their quality can vary dramatically. Building on related work on monolingual resources for English, we propose an evaluation framework for online bilingual dictionaries, designed to assess lexicographic quality in four major…

  18. Graded Lexicons: New Resources for Educational Purposes and Much More

    ERIC Educational Resources Information Center

    Gala, Núria; Billami, Mokhtar B.; François, Thomas; Bernhard, Delphine

    2015-01-01

    Computational tools and resources play an important role for vocabulary acquisition. Although a large variety of dictionaries and learning games are available, few resources provide information about the complexity of a word, either for learning or for comprehension. The idea here is to use frequency counts combined with intralexical variables to…

  19. Library Information Resource Book For Staff.

    ERIC Educational Resources Information Center

    Potts, Ken; And Others

    This guide is the Northern Illinois University (NIU) Libraries' quick reference tool for providing information about its collections, facilities, and services. The articles are arranged in an alphabetic, dictionary format with numerous cross-references, and highlight information on the following: administrative offices; company annual reports;…

  20. A study of actions in operative notes.

    PubMed

    Wang, Yan; Pakhomov, Serguei; Burkart, Nora E; Ryan, James O; Melton, Genevieve B

    2012-01-01

    Operative notes contain rich information about techniques, instruments, and materials used in procedures. To assist development of effective information extraction (IE) techniques for operative notes, we investigated the sublanguage used to describe actions within the operative report 'procedure description' section. Deep parsing results of 362,310 operative notes with an expanded Stanford parser using the SPECIALIST Lexicon resulted in 200 verbs (92% coverage) including 147 action verbs. Nominal action predicates for each action verb were gathered from WordNet, SPECIALIST Lexicon, New Oxford American Dictionary and Stedman's Medical Dictionary. Coverage gaps were seen in existing lexical, domain, and semantic resources (Unified Medical Language System (UMLS) Metathesaurus, SPECIALIST Lexicon, WordNet and FrameNet). Our findings demonstrate the need to construct surgical domain-specific semantic resources for IE from operative notes.

  1. Gene/protein name recognition based on support vector machine using dictionary as features.

    PubMed

    Mitsumori, Tomohiro; Fation, Sevrani; Murata, Masaki; Doi, Kouichi; Doi, Hirohumi

    2005-01-01

    Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.

  2. The Power of Math Dictionaries in the Classroom

    ERIC Educational Resources Information Center

    Patterson, Lynn Gannon; Young, Ashlee Futrell

    2013-01-01

    This article investigates the value of a math dictionary in the elementary classroom and if elementary students prefer using a traditional math dictionary or a dictionary on an iPad. In each child's journey to reading with understanding, the dictionary can be a comforting and valuable resource. Would students find a math dictionary to be a…

  3. Information Resource Management in the DCSPLANS (Deputy Chief of Staff for Plans) Branch of the U.S. Army Military Personnel Center

    DTIC Science & Technology

    1986-11-15

    Richard E. Broome [Broome 1985], Major Robert M. DiBona [ DiBona 1985], Major Robert A. Kirsch II [Kirsch 1985], and Major Alan F. Noel, Jr. [Noel 1985...Major Kirsch expanded this prototype to be compatible with the emerging Federal standards for dictionary systems. Major DiBona analyzed data validation...described in [Noel 1985, Kirsch 1985]. The implementation of edit validation rules in a dictionary environment is covered in [ DiBona 1985] and implementation

  4. Recognition of chemical entities: combining dictionary-based and grammar-based approaches.

    PubMed

    Akhondi, Saber A; Hettne, Kristina M; van der Horst, Eelke; van Mulligen, Erik M; Kors, Jan A

    2015-01-01

    The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized compounds at the document level (CDI task). We investigated an ensemble approach where dictionary-based named entity recognition is used along with grammar-based recognizers to extract compounds from text. We assessed the performance of ten different commercial and publicly available lexical resources using an open source indexing system (Peregrine), in combination with three different chemical compound recognizers and a set of regular expressions to recognize chemical database identifiers. The effect of different stop-word lists, case-sensitivity matching, and use of chunking information was also investigated. We focused on lexical resources that provide chemical structure information. To rank the different compounds found in a text, we used a term confidence score based on the normalized ratio of the term frequencies in chemical and non-chemical journals. The use of stop-word lists greatly improved the performance of the dictionary-based recognition, but there was no additional benefit from using chunking information. A combination of ChEBI and HMDB as lexical resources, the LeadMine tool for grammar-based recognition, and the regular expressions, outperformed any of the individual systems. On the test set, the F-scores were 77.8% (recall 71.2%, precision 85.8%) for the CEM task and 77.6% (recall 71.7%, precision 84.6%) for the CDI task. Missed terms were mainly due to tokenization issues, poor recognition of formulas, and term conjunctions. We developed an ensemble system that combines dictionary-based and grammar-based approaches for chemical named entity recognition, outperforming any of the individual systems that we considered. The system is able to provide structure information for most of the compounds that are found. Improved tokenization and better recognition of specific entity types is likely to further improve system performance.

  5. Recognition of chemical entities: combining dictionary-based and grammar-based approaches

    PubMed Central

    2015-01-01

    Background The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized compounds at the document level (CDI task). We investigated an ensemble approach where dictionary-based named entity recognition is used along with grammar-based recognizers to extract compounds from text. We assessed the performance of ten different commercial and publicly available lexical resources using an open source indexing system (Peregrine), in combination with three different chemical compound recognizers and a set of regular expressions to recognize chemical database identifiers. The effect of different stop-word lists, case-sensitivity matching, and use of chunking information was also investigated. We focused on lexical resources that provide chemical structure information. To rank the different compounds found in a text, we used a term confidence score based on the normalized ratio of the term frequencies in chemical and non-chemical journals. Results The use of stop-word lists greatly improved the performance of the dictionary-based recognition, but there was no additional benefit from using chunking information. A combination of ChEBI and HMDB as lexical resources, the LeadMine tool for grammar-based recognition, and the regular expressions, outperformed any of the individual systems. On the test set, the F-scores were 77.8% (recall 71.2%, precision 85.8%) for the CEM task and 77.6% (recall 71.7%, precision 84.6%) for the CDI task. Missed terms were mainly due to tokenization issues, poor recognition of formulas, and term conjunctions. Conclusions We developed an ensemble system that combines dictionary-based and grammar-based approaches for chemical named entity recognition, outperforming any of the individual systems that we considered. The system is able to provide structure information for most of the compounds that are found. Improved tokenization and better recognition of specific entity types is likely to further improve system performance. PMID:25810767

  6. Creating a medical dictionary using word alignment: the influence of sources and resources.

    PubMed

    Nyström, Mikael; Merkel, Magnus; Petersson, Håkan; Ahlfeldt, Hans

    2007-11-23

    Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10.

  7. Creating a medical dictionary using word alignment: The influence of sources and resources

    PubMed Central

    Nyström, Mikael; Merkel, Magnus; Petersson, Håkan; Åhlfeldt, Hans

    2007-01-01

    Background Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. Methods We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. Results The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. Conclusion More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10. PMID:18036221

  8. Evaluation of Controlled Vocabulary Resources for Development of a Consumer Entry Vocabulary for Diabetes

    PubMed Central

    Monga, Harpreet K; Sievert, MaryEllen C; Hall, Joan Houston; Longo, Daniel R

    2001-01-01

    Background Digital information technology can facilitate informed decision making by individuals regarding their personal health care. The digital divide separates those who do and those who do not have access to or otherwise make use of digital information. To close the digital divide, health care communications research must address a fundamental issue, the consumer vocabulary problem: consumers of health care, at least those who are laypersons, are not always familiar with the professional vocabulary and concepts used by providers of health care and by providers of health care information, and, conversely, health care and health care information providers are not always familiar with the vocabulary and concepts used by consumers. One way to address this problem is to develop a consumer entry vocabulary for health care communications. Objectives To evaluate the potential of controlled vocabulary resources for supporting the development of consumer entry vocabulary for diabetes. Methods We used folk medical terms from the Dictionary of American Regional English project to create exended versions of 3 controlled vocabulary resources: the Unified Medical Language System Metathesaurus, the Eurodicautom of the European Commission's Translation Service, and the European Commission Glossary of popular and technical medical terms. We extracted consumer terms from consumer-authored materials, and physician terms from physician-authored materials. We used our extended versions of the vocabulary resources to link diabetes-related terms used by health care consumers to synonymous, nearly-synonymous, or closely-related terms used by family physicians. We also examined whether retrieval of diabetes-related World Wide Web information sites maintained by nonprofit health care professional organizations, academic organizations, or governmental organizations can be improved by substituting a physician term for its related consumer term in the query. Results The Dictionary of American Regional English extension of the Metathesaurus provided coverage, either direct or indirect, of approximately 23% of the natural language consumer-term-physician-term pairs. The Dictionary of American Regional English extension of the Eurodicautom provided coverage for 16% of the term pairs. Both the Metathesaurus and the Eurodicautom indirectly related more terms than they directly related. A high percentage of covered term pairs, with more indirectly covered pairs than directly covered pairs, might be one way to make the most out of expensive controlled vocabulary resources. We compared retrieval of diabetes-related Web information sites using the physician terms to retrieval using related consumer terms We based the comparison on retrieval of sites maintained by non-profit healthcare professional organizations, academic organizations, or governmental organizations. The number of such sites in the first 20 results from a search was increased by substituting a physician term for its related consumer term in the query. This suggests that the Dictionary of American Regional English extensions of the Metathesaurus and Eurodicautom may be used to provide useful links from natural language consumer terms to natural language physician terms. Conclusions The Dictionary of American Regional English extensions of the Metathesaurus and Eurodicautom should be investigated further for support of consumer entry vocabulary for diabetes. PMID:11720966

  9. Data Base Directions: Information Resource Management - Strategies and Tools. Proceedings of the Workshop of the National Bureau of Standards and the Association for Computing Machinery (Ft. Lauderdale, Florida, October 20-22, 1980).

    ERIC Educational Resources Information Center

    Goldfine, Alan H., Ed.

    This workshop investigated how managers can evaluate, select, and effectively use information resource management (IRM) tools, especially data dictionary systems (DDS). An executive summary, which provides a definition of IRM as developed by workshop participants, precedes the keynote address, "Data: The Raw Material of a Paper Factory,"…

  10. Information on Quantifiers and Argument Structure in English Learner's Dictionaries.

    ERIC Educational Resources Information Center

    Lee, Thomas Hun-tak

    1993-01-01

    Lexicographers have been arguing for the inclusion of abstract and complex grammatical information in dictionaries. This paper examines the extent to which information about quantifiers and the argument structure of verbs is encoded in English learner's dictionaries. The Oxford Advanced Learner's Dictionary (1989), the Longman Dictionary of…

  11. The Influence of Electronic Dictionaries on Vocabulary Knowledge Extension

    ERIC Educational Resources Information Center

    Rezaei, Mojtaba; Davoudi, Mohammad

    2016-01-01

    Vocabulary learning needs special strategies in language learning process. The use of dictionaries is a great help in vocabulary learning and nowadays the emergence of electronic dictionaries has added a new and valuable resource for vocabulary learning. The present study aims to explore the influence of Electronic Dictionaries (ED) Vs. Paper…

  12. FRS Download Summary and Data Element Dictionary ...

    EPA Pesticide Factsheets

    ECHO, Enforcement and Compliance History Online, provides compliance and enforcement information for approximately 800,000 EPA-regulated facilities nationwide. ECHO includes permit, inspection, violation, enforcement action, and penalty information about facilities regulated under the Clean Air Act (CAA) Stationary Source Program, Clean Water Act (CWA) National Pollutant Elimination Discharge System (NPDES), and/or Resource Conservation and Recovery Act (RCRA). Information also is provided on surrounding demographics when available.

  13. RCRAInfo Download Summary and Data Element Dictionary ...

    EPA Pesticide Factsheets

    ECHO, Enforcement and Compliance History Online, provides compliance and enforcement information for approximately 800,000 EPA-regulated facilities nationwide. ECHO includes permit, inspection, violation, enforcement action, and penalty information about facilities regulated under the Clean Air Act (CAA) Stationary Source Program, Clean Water Act (CWA) National Pollutant Elimination Discharge System (NPDES), and/or Resource Conservation and Recovery Act (RCRA). Information also is provided on surrounding demographics when available.

  14. Resources into Higher Education. Bibliographic Series No. 31.

    ERIC Educational Resources Information Center

    Ahrens, Joan

    Varied information sources on higher education held by the Arkansas University library are listed. The publications, which are organized by type of material include abstracts and indexes, sources for dissertations and theses, directories of educational personnel and institutions, dictionaries, encyclopedias, handbooks, statistical sources, and…

  15. English for Specific Purposes. Information Guide 2.

    ERIC Educational Resources Information Center

    British Council, London (England). English-Teaching Information Centre.

    This bibliography of materials for teachers of English for specific purposes lists textbooks, technical readers, articles, resource books, reports, dictionaries, reference books, bibliographies, word frequency lists, catalogues of teaching aids, games and activities, current research in Britain, documents available in the archives of the English…

  16. Data Management Standards in Computer-aided Acquisition and Logistic Support (CALS)

    NASA Technical Reports Server (NTRS)

    Jefferson, David K.

    1990-01-01

    Viewgraphs and discussion on data management standards in computer-aided acquisition and logistic support (CALS) are presented. CALS is intended to reduce cost, increase quality, and improve timeliness of weapon system acquisition and support by greatly improving the flow of technical information. The phase 2 standards, industrial environment, are discussed. The information resource dictionary system (IRDS) is described.

  17. ICIS-Air Download Summary and Data Element Dictionary ...

    EPA Pesticide Factsheets

    ECHO, Enforcement and Compliance History Online, provides compliance and enforcement information for approximately 800,000 EPA-regulated facilities nationwide. ECHO includes permit, inspection, violation, enforcement action, and penalty information about facilities regulated under the Clean Air Act (CAA) Stationary Source Program, Clean Water Act (CWA) National Pollutant Elimination Discharge System (NPDES), and/or Resource Conservation and Recovery Act (RCRA). Information also is provided on surrounding demographics when available.

  18. ICIS-FE&C Download Summary and Data Element Dictionary ...

    EPA Pesticide Factsheets

    ECHO, Enforcement and Compliance History Online, provides compliance and enforcement information for approximately 800,000 EPA-regulated facilities nationwide. ECHO includes permit, inspection, violation, enforcement action, and penalty information about facilities regulated under the Clean Air Act (CAA) Stationary Source Program, Clean Water Act (CWA) National Pollutant Elimination Discharge System (NPDES), and/or Resource Conservation and Recovery Act (RCRA). Information also is provided on surrounding demographics when available.

  19. ICIS-NPDES Limit Summary and Data Element Dictionary ...

    EPA Pesticide Factsheets

    ECHO, Enforcement and Compliance History Online, provides compliance and enforcement information for approximately 800,000 EPA-regulated facilities nationwide. ECHO includes permit, inspection, violation, enforcement action, and penalty information about facilities regulated under the Clean Air Act (CAA) Stationary Source Program, Clean Water Act (CWA) National Pollutant Elimination Discharge System (NPDES), and/or Resource Conservation and Recovery Act (RCRA). Information also is provided on surrounding demographics when available.

  20. Civil Enforcement Case Report Data Dictionary | ECHO | US ...

    EPA Pesticide Factsheets

    ECHO, Enforcement and Compliance History Online, provides compliance and enforcement information for approximately 800,000 EPA-regulated facilities nationwide. ECHO includes permit, inspection, violation, enforcement action, and penalty information about facilities regulated under the Clean Air Act (CAA) Stationary Source Program, Clean Water Act (CWA) National Pollutant Elimination Discharge System (NPDES), and/or Resource Conservation and Recovery Act (RCRA). Information also is provided on surrounding demographics when available.

  1. ICIS-NPDES DMR Summary and Data Element Dictionary ...

    EPA Pesticide Factsheets

    ECHO, Enforcement and Compliance History Online, provides compliance and enforcement information for approximately 800,000 EPA-regulated facilities nationwide. ECHO includes permit, inspection, violation, enforcement action, and penalty information about facilities regulated under the Clean Air Act (CAA) Stationary Source Program, Clean Water Act (CWA) National Pollutant Elimination Discharge System (NPDES), and/or Resource Conservation and Recovery Act (RCRA). Information also is provided on surrounding demographics when available.

  2. Speech and Language and Language Translation (SALT)

    DTIC Science & Technology

    2012-12-01

    Resources are classified as: Parallel Text Dictionaries Monolingual Text Other Dictionaries are further classified as: Text: can download entire...not clear how many are translated http://www.redsea-online.com/modules.php?name= dictionary Monolingual Text Monolingual Text; An Crubadan web...attached to a following word. A program could be written to detach the character د from unknown words, when the remaining word matches a dictionary

  3. Adaptive structured dictionary learning for image fusion based on group-sparse-representation

    NASA Astrophysics Data System (ADS)

    Yang, Jiajie; Sun, Bin; Luo, Chengwei; Wu, Yuzhong; Xu, Limei

    2018-04-01

    Dictionary learning is the key process of sparse representation which is one of the most widely used image representation theories in image fusion. The existing dictionary learning method does not use the group structure information and the sparse coefficients well. In this paper, we propose a new adaptive structured dictionary learning algorithm and a l1-norm maximum fusion rule that innovatively utilizes grouped sparse coefficients to merge the images. In the dictionary learning algorithm, we do not need prior knowledge about any group structure of the dictionary. By using the characteristics of the dictionary in expressing the signal, our algorithm can automatically find the desired potential structure information that hidden in the dictionary. The fusion rule takes the physical meaning of the group structure dictionary, and makes activity-level judgement on the structure information when the images are being merged. Therefore, the fused image can retain more significant information. Comparisons have been made with several state-of-the-art dictionary learning methods and fusion rules. The experimental results demonstrate that, the dictionary learning algorithm and the fusion rule both outperform others in terms of several objective evaluation metrics.

  4. A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources.

    PubMed

    Moon, Sungrim; Pakhomov, Serguei; Liu, Nathan; Ryan, James O; Melton, Genevieve B

    2014-01-01

    To create a sense inventory of abbreviations and acronyms from clinical texts. The most frequently occurring abbreviations and acronyms from 352,267 dictated clinical notes were used to create a clinical sense inventory. Senses of each abbreviation and acronym were manually annotated from 500 random instances and lexically matched with long forms within the Unified Medical Language System (UMLS V.2011AB), Another Database of Abbreviations in Medline (ADAM), and Stedman's Dictionary, Medical Abbreviations, Acronyms & Symbols, 4th edition (Stedman's). Redundant long forms were merged after they were lexically normalized using Lexical Variant Generation (LVG). The clinical sense inventory was found to have skewed sense distributions, practice-specific senses, and incorrect uses. Of 440 abbreviations and acronyms analyzed in this study, 949 long forms were identified in clinical notes. This set was mapped to 17,359, 5233, and 4879 long forms in UMLS, ADAM, and Stedman's, respectively. After merging long forms, only 2.3% matched across all medical resources. The UMLS, ADAM, and Stedman's covered 5.7%, 8.4%, and 11% of the merged clinical long forms, respectively. The sense inventory of clinical abbreviations and acronyms and anonymized datasets generated from this study are available for public use at http://www.bmhi.umn.edu/ihi/research/nlpie/resources/index.htm ('Sense Inventories', website). Clinical sense inventories of abbreviations and acronyms created using clinical notes and medical dictionary resources demonstrate challenges with term coverage and resource integration. Further work is needed to help with standardizing abbreviations and acronyms in clinical care and biomedicine to facilitate automated processes such as text-mining and information extraction.

  5. Supporting infobuttons with terminological knowledge.

    PubMed Central

    Cimino, J. J.; Elhanan, G.; Zeng, Q.

    1997-01-01

    We have developed several prototype applications which integrate clinical systems with on-line information resources by using patient data to drive queries in response to user information needs. We refer to these collectively as infobuttons because they are evoked with a minimum of keyboard entry. We make use of knowledge in our terminology, the Medical Entities Dictionary (MED) to assist with the selection of appropriate queries and resources, as well as the translation of patient data to forms recognized by the resources. This paper describes the kinds of knowledge in the MED, including literal attributes, hierarchical links and other semantic links, and how this knowledge is used in system integration. PMID:9357682

  6. Supporting infobuttons with terminological knowledge.

    PubMed

    Cimino, J J; Elhanan, G; Zeng, Q

    1997-01-01

    We have developed several prototype applications which integrate clinical systems with on-line information resources by using patient data to drive queries in response to user information needs. We refer to these collectively as infobuttons because they are evoked with a minimum of keyboard entry. We make use of knowledge in our terminology, the Medical Entities Dictionary (MED) to assist with the selection of appropriate queries and resources, as well as the translation of patient data to forms recognized by the resources. This paper describes the kinds of knowledge in the MED, including literal attributes, hierarchical links and other semantic links, and how this knowledge is used in system integration.

  7. Rdesign: A data dictionary with relational database design capabilities in Ada

    NASA Technical Reports Server (NTRS)

    Lekkos, Anthony A.; Kwok, Teresa Ting-Yin

    1986-01-01

    Data Dictionary is defined to be the set of all data attributes, which describe data objects in terms of their intrinsic attributes, such as name, type, size, format and definition. It is recognized as the data base for the Information Resource Management, to facilitate understanding and communication about the relationship between systems applications and systems data usage and to help assist in achieving data independence by permitting systems applications to access data knowledge of the location or storage characteristics of the data in the system. A research and development effort to use Ada has produced a data dictionary with data base design capabilities. This project supports data specification and analysis and offers a choice of the relational, network, and hierarchical model for logical data based design. It provides a highly integrated set of analysis and design transformation tools which range from templates for data element definition, spreadsheet for defining functional dependencies, normalization, to logical design generator.

  8. Evaluating Online Dictionaries From Faculty Prospective: A Case Study Performed On English Faculty Members At King Saud University--Wadi Aldawaser Branch

    ERIC Educational Resources Information Center

    Abouserie, Hossam Eldin Mohamed Refaat

    2010-01-01

    The purpose of this study was to evaluate online dictionaries from faculty prospective. The study tried to obtain in depth information about various forms of dictionaries the faculty used; degree of awareness and accessing online dictionaries; types of online dictionaries accessed; basic features of information provided; major benefits gained…

  9. Paper, Electronic or Online? Different Dictionaries for Different Activities

    ERIC Educational Resources Information Center

    Pasfield-Neofitou, Sarah

    2009-01-01

    Despite research suggesting that teachers highly influence their students' knowledge and use of language learning resources such as dictionaries (Loucky, 2005; Yamane, 2006), it appears that dictionary selection and use is considered something to be dealt with outside the classroom. As a result, many students receive too little advice to be able…

  10. From Afar to Zulu: A Dictionary of African Cultures.

    ERIC Educational Resources Information Center

    Haskins, Jim; Biondi, Joann

    This resource provides information on over 30 of Africa's most populous and well-known ethnic groups. The text concisely describes the history, traditions, environment, social structure, religion, and daily lifestyles of these diverse cultures. Each entry opens with a map outlining the area populated by the group and a list of key data regarding…

  11. A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources

    PubMed Central

    Moon, Sungrim; Pakhomov, Serguei; Liu, Nathan; Ryan, James O; Melton, Genevieve B

    2014-01-01

    Objective To create a sense inventory of abbreviations and acronyms from clinical texts. Methods The most frequently occurring abbreviations and acronyms from 352 267 dictated clinical notes were used to create a clinical sense inventory. Senses of each abbreviation and acronym were manually annotated from 500 random instances and lexically matched with long forms within the Unified Medical Language System (UMLS V.2011AB), Another Database of Abbreviations in Medline (ADAM), and Stedman's Dictionary, Medical Abbreviations, Acronyms & Symbols, 4th edition (Stedman's). Redundant long forms were merged after they were lexically normalized using Lexical Variant Generation (LVG). Results The clinical sense inventory was found to have skewed sense distributions, practice-specific senses, and incorrect uses. Of 440 abbreviations and acronyms analyzed in this study, 949 long forms were identified in clinical notes. This set was mapped to 17 359, 5233, and 4879 long forms in UMLS, ADAM, and Stedman's, respectively. After merging long forms, only 2.3% matched across all medical resources. The UMLS, ADAM, and Stedman's covered 5.7%, 8.4%, and 11% of the merged clinical long forms, respectively. The sense inventory of clinical abbreviations and acronyms and anonymized datasets generated from this study are available for public use at http://www.bmhi.umn.edu/ihi/research/nlpie/resources/index.htm (‘Sense Inventories’, website). Conclusions Clinical sense inventories of abbreviations and acronyms created using clinical notes and medical dictionary resources demonstrate challenges with term coverage and resource integration. Further work is needed to help with standardizing abbreviations and acronyms in clinical care and biomedicine to facilitate automated processes such as text-mining and information extraction. PMID:23813539

  12. Dictionary Based Machine Translation from Kannada to Telugu

    NASA Astrophysics Data System (ADS)

    Sindhu, D. V.; Sagar, B. M.

    2017-08-01

    Machine Translation is a task of translating from one language to another language. For the languages with less linguistic resources like Kannada and Telugu Dictionary based approach is the best approach. This paper mainly focuses on Dictionary based machine translation for Kannada to Telugu. The proposed methodology uses dictionary for translating word by word without much correlation of semantics between them. The dictionary based machine translation process has the following sub process: Morph analyzer, dictionary, transliteration, transfer grammar and the morph generator. As a part of this work bilingual dictionary with 8000 entries is developed and the suffix mapping table at the tag level is built. This system is tested for the children stories. In near future this system can be further improved by defining transfer grammar rules.

  13. Cancer Information Summaries: Screening/Detection

    MedlinePlus

    ... Español 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  14. Improving Feature Representation Based on a Neural Network for Author Profiling in Social Media Texts

    PubMed Central

    2016-01-01

    We introduce a lexical resource for preprocessing social media data. We show that a neural network-based feature representation is enhanced by using this resource. We conducted experiments on the PAN 2015 and PAN 2016 author profiling corpora and obtained better results when performing the data preprocessing using the developed lexical resource. The resource includes dictionaries of slang words, contractions, abbreviations, and emoticons commonly used in social media. Each of the dictionaries was built for the English, Spanish, Dutch, and Italian languages. The resource is freely available. PMID:27795703

  15. A dictionary server for supplying context sensitive medical knowledge.

    PubMed

    Ruan, W; Bürkle, T; Dudeck, J

    2000-01-01

    The Giessen Data Dictionary Server (GDDS), developed at Giessen University Hospital, integrates clinical systems with on-line, context sensitive medical knowledge to help with making medical decisions. By "context" we mean the clinical information that is being presented at the moment the information need is occurring. The dictionary server makes use of a semantic network supported by a medical data dictionary to link terms from clinical applications to their proper information sources. It has been designed to analyze the network structure itself instead of knowing the layout of the semantic net in advance. This enables us to map appropriate information sources to various clinical applications, such as nursing documentation, drug prescription and cancer follow up systems. This paper describes the function of the dictionary server and shows how the knowledge stored in the semantic network is used in the dictionary service.

  16. LeadMine: a grammar and dictionary driven approach to entity recognition.

    PubMed

    Lowe, Daniel M; Sayle, Roger A

    2015-01-01

    Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as "black box" systems, our approach directly links the output entities to the input dictionaries and grammars. Our approach also allows correction of errors in detected entities, which can assist with entity resolution.

  17. LeadMine: a grammar and dictionary driven approach to entity recognition

    PubMed Central

    2015-01-01

    Background Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Results Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Conclusions Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as "black box" systems, our approach directly links the output entities to the input dictionaries and grammars. Our approach also allows correction of errors in detected entities, which can assist with entity resolution. PMID:25810776

  18. The Effect of Bilingual Term List Size on Dictionary-Based Cross-Language Information Retrieval

    DTIC Science & Technology

    2006-01-01

    The Effect of Bilingual Term List Size on Dictionary -Based Cross-Language Information Retrieval Dina Demner-Fushman Department of Computer Science... dictionary -based Cross-Language Information Retrieval (CLIR), in which the goal is to find documents written in one natural language based on queries that...in which the documents are written. In dictionary -based CLIR techniques, the princi- pal source of translation knowledge is a translation lexicon

  19. Emo, love and god: making sense of Urban Dictionary, a crowd-sourced online dictionary.

    PubMed

    Nguyen, Dong; McGillivray, Barbara; Yasseri, Taha

    2018-05-01

    The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the 'wisdom of the crowd' has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often unmonitored environment of such projects may make them susceptible to low-quality content. In this work, we focus on Urban Dictionary, a crowd-sourced online dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the dictionary's voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation.

  20. A Selected Bibliography of Dictionaries. General Information Series, No. 9. Indochinese Refugee Education Guides. Revised.

    ERIC Educational Resources Information Center

    Center for Applied Linguistics, Arlington, VA.

    The purpose of this bulletin is to provide the American teacher or sponsor with information on the use, limitations and availability of dictionaries that can be used by Indochinese refugees. The introductory material contains descriptions of both monolingual and bilingual dictionaries, a discussion of the inadequacies of bilingual dictionaries in…

  1. Specifications for a Federal Information Processing Standard Data Dictionary System

    NASA Technical Reports Server (NTRS)

    Goldfine, A.

    1984-01-01

    The development of a software specification that Federal agencies may use in evaluating and selecting data dictionary systems (DDS) is discussed. To supply the flexibility needed by widely different applications and environments in the Federal Government, the Federal Information Processing Standard (FIPS) specifies a core DDS together with an optimal set of modules. The focus and status of the development project are described. Functional specifications for the FIPS DDS are examined for the dictionary, the dictionary schema, and the dictionary processing system. The DDS user interfaces and DDS software interfaces are discussed as well as dictionary administration.

  2. The SMAP Dictionary Management System

    NASA Technical Reports Server (NTRS)

    Smith, Kevin A.; Swan, Christoper A.

    2014-01-01

    The Soil Moisture Active Passive (SMAP) Dictionary Management System is a web-based tool to develop and store a mission dictionary. A mission dictionary defines the interface between a ground system and a spacecraft. In recent years, mission dictionaries have grown in size and scope, making it difficult for engineers across multiple disciplines to coordinate the dictionary development effort. The Dictionary Management Systemaddresses these issues by placing all dictionary information in one place, taking advantage of the efficiencies inherent in co-locating what were once disparate dictionary development efforts.

  3. Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary

    NASA Astrophysics Data System (ADS)

    Liu, Wuying; Wang, Lin

    2018-03-01

    The large-scale bilingual parallel resource is significant to statistical learning and deep learning in natural language processing. This paper addresses the automatic construction issue of the Korean-Chinese domain dictionary, and presents a novel unsupervised construction method based on the natural annotation in the raw corpus. We firstly extract all Korean-Chinese word pairs from Korean texts according to natural annotations, secondly transform the traditional Chinese characters into the simplified ones, and finally distill out a bilingual domain dictionary after retrieving the simplified Chinese words in an extra Chinese domain dictionary. The experimental results show that our method can automatically build multiple Korean-Chinese domain dictionaries efficiently.

  4. A dictionary server for supplying context sensitive medical knowledge.

    PubMed Central

    Ruan, W.; Bürkle, T.; Dudeck, J.

    2000-01-01

    The Giessen Data Dictionary Server (GDDS), developed at Giessen University Hospital, integrates clinical systems with on-line, context sensitive medical knowledge to help with making medical decisions. By "context" we mean the clinical information that is being presented at the moment the information need is occurring. The dictionary server makes use of a semantic network supported by a medical data dictionary to link terms from clinical applications to their proper information sources. It has been designed to analyze the network structure itself instead of knowing the layout of the semantic net in advance. This enables us to map appropriate information sources to various clinical applications, such as nursing documentation, drug prescription and cancer follow up systems. This paper describes the function of the dictionary server and shows how the knowledge stored in the semantic network is used in the dictionary service. PMID:11079978

  5. The Database Query Support Processor (QSP)

    NASA Technical Reports Server (NTRS)

    1993-01-01

    The number and diversity of databases available to users continues to increase dramatically. Currently, the trend is towards decentralized, client server architectures that (on the surface) are less expensive to acquire, operate, and maintain than information architectures based on centralized, monolithic mainframes. The database query support processor (QSP) effort evaluates the performance of a network level, heterogeneous database access capability. Air Force Material Command's Rome Laboratory has developed an approach, based on ANSI standard X3.138 - 1988, 'The Information Resource Dictionary System (IRDS)' to seamless access to heterogeneous databases based on extensions to data dictionary technology. To successfully query a decentralized information system, users must know what data are available from which source, or have the knowledge and system privileges necessary to find out this information. Privacy and security considerations prohibit free and open access to every information system in every network. Even in completely open systems, time required to locate relevant data (in systems of any appreciable size) would be better spent analyzing the data, assuming the original question was not forgotten. Extensions to data dictionary technology have the potential to more fully automate the search and retrieval for relevant data in a decentralized environment. Substantial amounts of time and money could be saved by not having to teach users what data resides in which systems and how to access each of those systems. Information describing data and how to get it could be removed from the application and placed in a dedicated repository where it belongs. The result simplified applications that are less brittle and less expensive to build and maintain. Software technology providing the required functionality is off the shelf. The key difficulty is in defining the metadata required to support the process. The database query support processor effort will provide quantitative data on the amount of effort required to implement an extended data dictionary at the network level, add new systems, adapt to changing user needs, and provide sound estimates on operations and maintenance costs and savings.

  6. Dictionaries and distributions: Combining expert knowledge and large scale textual data content analysis : Distributed dictionary representation.

    PubMed

    Garten, Justin; Hoover, Joe; Johnson, Kate M; Boghrati, Reihane; Iskiwitch, Carol; Dehghani, Morteza

    2018-02-01

    Theory-driven text analysis has made extensive use of psychological concept dictionaries, leading to a wide range of important results. These dictionaries have generally been applied through word count methods which have proven to be both simple and effective. In this paper, we introduce Distributed Dictionary Representations (DDR), a method that applies psychological dictionaries using semantic similarity rather than word counts. This allows for the measurement of the similarity between dictionaries and spans of text ranging from complete documents to individual words. We show how DDR enables dictionary authors to place greater emphasis on construct validity without sacrificing linguistic coverage. We further demonstrate the benefits of DDR on two real-world tasks and finally conduct an extensive study of the interaction between dictionary size and task performance. These studies allow us to examine how DDR and word count methods complement one another as tools for applying concept dictionaries and where each is best applied. Finally, we provide references to tools and resources to make this method both available and accessible to a broad psychological audience.

  7. A Locality-Constrained and Label Embedding Dictionary Learning Algorithm for Image Classification.

    PubMed

    Zhengming Li; Zhihui Lai; Yong Xu; Jian Yang; Zhang, David

    2017-02-01

    Locality and label information of training samples play an important role in image classification. However, previous dictionary learning algorithms do not take the locality and label information of atoms into account together in the learning process, and thus their performance is limited. In this paper, a discriminative dictionary learning algorithm, called the locality-constrained and label embedding dictionary learning (LCLE-DL) algorithm, was proposed for image classification. First, the locality information was preserved using the graph Laplacian matrix of the learned dictionary instead of the conventional one derived from the training samples. Then, the label embedding term was constructed using the label information of atoms instead of the classification error term, which contained discriminating information of the learned dictionary. The optimal coding coefficients derived by the locality-based and label-based reconstruction were effective for image classification. Experimental results demonstrated that the LCLE-DL algorithm can achieve better performance than some state-of-the-art algorithms.

  8. Domain Adaptation of Translation Models for Multilingual Applications

    DTIC Science & Technology

    2009-04-01

    expansion effect that corpus (or dictionary ) based trans- lation introduces - however, this effect is maintained even with monolingual query expansion [12...every day; bilingual web pages are harvested as parallel corpora as the quantity of non-English data on the web increases; online dictionaries of...approach is to customize translation models to a domain, by automatically selecting the resources ( dictionaries , parallel corpora) that are best for

  9. Emo, love and god: making sense of Urban Dictionary, a crowd-sourced online dictionary

    PubMed Central

    McGillivray, Barbara

    2018-01-01

    The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the ‘wisdom of the crowd’ has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often unmonitored environment of such projects may make them susceptible to low-quality content. In this work, we focus on Urban Dictionary, a crowd-sourced online dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the dictionary’s voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation. PMID:29892417

  10. A Standard-Driven Data Dictionary for Data Harmonization of Heterogeneous Datasets in Urban Geological Information Systems

    NASA Astrophysics Data System (ADS)

    Liu, G.; Wu, C.; Li, X.; Song, P.

    2013-12-01

    The 3D urban geological information system has been a major part of the national urban geological survey project of China Geological Survey in recent years. Large amount of multi-source and multi-subject data are to be stored in the urban geological databases. There are various models and vocabularies drafted and applied by industrial companies in urban geological data. The issues such as duplicate and ambiguous definition of terms and different coding structure increase the difficulty of information sharing and data integration. To solve this problem, we proposed a national standard-driven information classification and coding method to effectively store and integrate urban geological data, and we applied the data dictionary technology to achieve structural and standard data storage. The overall purpose of this work is to set up a common data platform to provide information sharing service. Research progresses are as follows: (1) A unified classification and coding method for multi-source data based on national standards. Underlying national standards include GB 9649-88 for geology and GB/T 13923-2006 for geography. Current industrial models are compared with national standards to build a mapping table. The attributes of various urban geological data entity models are reduced to several categories according to their application phases and domains. Then a logical data model is set up as a standard format to design data file structures for a relational database. (2) A multi-level data dictionary for data standardization constraint. Three levels of data dictionary are designed: model data dictionary is used to manage system database files and enhance maintenance of the whole database system; attribute dictionary organizes fields used in database tables; term and code dictionary is applied to provide a standard for urban information system by adopting appropriate classification and coding methods; comprehensive data dictionary manages system operation and security. (3) An extension to system data management function based on data dictionary. Data item constraint input function is making use of the standard term and code dictionary to get standard input result. Attribute dictionary organizes all the fields of an urban geological information database to ensure the consistency of term use for fields. Model dictionary is used to generate a database operation interface automatically with standard semantic content via term and code dictionary. The above method and technology have been applied to the construction of Fuzhou Urban Geological Information System, South-East China with satisfactory results.

  11. Students' Understanding of Dictionary Entries: A Study with Respect to Four Learners' Dictionaries.

    ERIC Educational Resources Information Center

    Jana, Abhra; Amritavalli, Vijaya; Amritavalli, R.

    2003-01-01

    Investigates the effects of definitional information in the form of dictionary entries, on second language learners' vocabulary learning in an instructed setting. Indian students (Native Hindi speakers) of English received monolingual English dictionary entries of five previously unknown words from four different learner's dictionaries. Results…

  12. MD-CTS: An integrated terminology reference of clinical and translational medicine.

    PubMed

    Ray, Will; Finamore, Joe; Rastegar-Mojarad, Majid; Kadolph, Chris; Ye, Zhan; Bohne, Jacquie; Xu, Yin; Burish, Dan; Sondelski, Joshua; Easker, Melissa; Finnegan, Brian; Bartkowiak, Barbara; Smith, Catherine Arnott; Tachinardi, Umberto; Mendonca, Eneida A; Weichelt, Bryan; Lin, Simon M

    2016-01-01

    New vocabularies are rapidly evolving in the literature relative to the practice of clinical medicine and translational research. To provide integrated access to new terms, we developed a mobile and desktop online reference-Marshfield Dictionary of Clinical and Translational Science (MD-CTS). It is the first public resource that comprehensively integrates Wiktionary (word definition), BioPortal (ontology), Wiki (image reference), and Medline abstract (word usage) information. MD-CTS is accessible at http://spellchecker.mfldclin.edu/. The website provides a broadened capacity for the wider clinical and translational science community to keep pace with newly emerging scientific vocabulary. An initial evaluation using 63 randomly selected biomedical words suggests that online references generally provided better coverage (73%-95%) than paper-based dictionaries (57-71%).

  13. Recent research in data description of the measurement property resource on common data dictionary

    NASA Astrophysics Data System (ADS)

    Lu, Tielin; Fan, Zitian; Wang, Chunxi; Liu, Xiaojing; Wang, Shuo; Zhao, Hua

    2018-03-01

    A method for measurement equipment data description has been proposed based on the property resource analysis. The applications of common data dictionary (CDD) to devices and equipment is mainly used in digital factory to advance the management not only in the enterprise, also to the different enterprise in the same data environment. In this paper, we can make a brief of the data flow in the whole manufacture enterprise and the automatic trigger the process of the data exchange. Furthermore,the application of the data dictionary is available for the measurement and control equipment, which can also be used in other different industry in smart manufacture.

  14. DES-ncRNA: A knowledgebase for exploring information about human micro and long noncoding RNAs based on literature-mining.

    PubMed

    Salhi, Adil; Essack, Magbubah; Alam, Tanvir; Bajic, Vladan P; Ma, Lina; Radovanovic, Aleksandar; Marchand, Benoit; Schmeier, Sebastian; Zhang, Zhang; Bajic, Vladimir B

    2017-07-03

    Noncoding RNAs (ncRNAs), particularly microRNAs (miRNAs) and long ncRNAs (lncRNAs), are important players in diseases and emerge as novel drug targets. Thus, unraveling the relationships between ncRNAs and other biomedical entities in cells are critical for better understanding ncRNA roles that may eventually help develop their use in medicine. To support ncRNA research and facilitate retrieval of relevant information regarding miRNAs and lncRNAs from the plethora of published ncRNA-related research, we developed DES-ncRNA ( www.cbrc.kaust.edu.sa/des_ncrna ). DES-ncRNA is a knowledgebase containing text- and data-mined information from public scientific literature and other public resources. Exploration of mined information is enabled through terms and pairs of terms from 19 topic-specific dictionaries including, for example, antibiotics, toxins, drugs, enzymes, mutations, pathways, human genes and proteins, drug indications and side effects, mutations, diseases, etc. DES-ncRNA contains approximately 878,000 associations of terms from these dictionaries of which 36,222 (5,373) are with regards to miRNAs (lncRNAs). We provide several ways to explore information regarding ncRNAs to users including controlled generation of association networks as well as hypotheses generation. We show an example how DES-ncRNA can aid research on Alzheimer disease and suggest potential therapeutic role for Fasudil. DES-ncRNA is a powerful tool that can be used on its own or as a complement to the existing resources, to support research in human ncRNA. To our knowledge, this is the only knowledgebase dedicated to human miRNAs and lncRNAs derived primarily through literature-mining enabling exploration of a broad spectrum of associated biomedical entities, not paralleled by any other resource.

  15. DES-ncRNA: A knowledgebase for exploring information about human micro and long noncoding RNAs based on literature-mining

    PubMed Central

    Salhi, Adil; Essack, Magbubah; Alam, Tanvir; Bajic, Vladan P.; Ma, Lina; Radovanovic, Aleksandar; Marchand, Benoit; Zhang, Zhang; Bajic, Vladimir B.

    2017-01-01

    ABSTRACT Noncoding RNAs (ncRNAs), particularly microRNAs (miRNAs) and long ncRNAs (lncRNAs), are important players in diseases and emerge as novel drug targets. Thus, unraveling the relationships between ncRNAs and other biomedical entities in cells are critical for better understanding ncRNA roles that may eventually help develop their use in medicine. To support ncRNA research and facilitate retrieval of relevant information regarding miRNAs and lncRNAs from the plethora of published ncRNA-related research, we developed DES-ncRNA (www.cbrc.kaust.edu.sa/des_ncrna). DES-ncRNA is a knowledgebase containing text- and data-mined information from public scientific literature and other public resources. Exploration of mined information is enabled through terms and pairs of terms from 19 topic-specific dictionaries including, for example, antibiotics, toxins, drugs, enzymes, mutations, pathways, human genes and proteins, drug indications and side effects, mutations, diseases, etc. DES-ncRNA contains approximately 878,000 associations of terms from these dictionaries of which 36,222 (5,373) are with regards to miRNAs (lncRNAs). We provide several ways to explore information regarding ncRNAs to users including controlled generation of association networks as well as hypotheses generation. We show an example how DES-ncRNA can aid research on Alzheimer disease and suggest potential therapeutic role for Fasudil. DES-ncRNA is a powerful tool that can be used on its own or as a complement to the existing resources, to support research in human ncRNA. To our knowledge, this is the only knowledgebase dedicated to human miRNAs and lncRNAs derived primarily through literature-mining enabling exploration of a broad spectrum of associated biomedical entities, not paralleled by any other resource. PMID:28387604

  16. Using Bilingual Dictionaries.

    ERIC Educational Resources Information Center

    Thompson, Geoff

    1987-01-01

    Monolingual dictionaries have serious disadvantages in many language teaching situations; bilingual dictionaries are potentially more efficient and more motivating sources of information for language learners. (Author/CB)

  17. Getting the Most out of the Dictionary

    ERIC Educational Resources Information Center

    Marckwardt, Albert H.

    2012-01-01

    The usefulness of the dictionary as a reliable source of information for word meanings, spelling, and pronunciation is widely recognized. But even in these obvious matters, the information that the dictionary has to offer is not always accurately interpreted. With respect to pronunciation there seem to be two general pitfalls: (1) the…

  18. A Proposal To Develop the Axiological Aspect in Onomasiological Dictionaries.

    ERIC Educational Resources Information Center

    Felices Lago, Angel Miguel

    It is argued that English dictionaries currently provide evaluative information in addition to descriptive information about the words they contain, and that this aspect of dictionaries should be developed and expanded on. First, the historical background and distribution of the axiological parameter in English-language onomasiological…

  19. Consolidated Environmental Resource Database Information Process (CERDIP)

    DTIC Science & Technology

    2015-11-19

    Secretary of the Army for Installations, Energy and Environment [OASA(IE&E)] ESOH 5850 21st Street, Bldg 211, Second Floor Fort Belvoir, VA 22060-5938...Elizabeth J. Keysar 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) National Defense Center for Energy and Environment Operated by Concurrent...Markup Language NDCEE National Defense Center for Energy and Environment NFDD National Geospatial–Intelligence Agency Feature Data Dictionary

  20. Data dictionaries in information systems - Standards, usage , and application

    NASA Technical Reports Server (NTRS)

    Johnson, Margaret

    1990-01-01

    An overview of data dictionary systems and the role of standardization in the interchange of data dictionaries is presented. The development of the data dictionary for the Planetary Data System is cited as an example. The data element dictionary (DED), which is the repository of the definitions of the vocabulary utilized in an information system, is an important part of this service. A DED provides the definitions of the fields of the data set as well as the data elements of the catalog system. Finally, international efforts such as the Consultative Committee on Space Data Systems and other committees set up to provide standard recommendations on the usage and structure of data dictionaries in the international space science community are discussed.

  1. The ABCs of Data Dictionaries

    ERIC Educational Resources Information Center

    Gould, Tate; Nicholas, Amy; Blandford, William; Ruggiero, Tony; Peters, Mary; Thayer, Sara

    2014-01-01

    This overview of the basic components of a data dictionary is designed to educate and inform IDEA Part C and Part B 619 state staff about the purpose and benefits of having up-to-date data dictionaries for their data systems. This report discusses the following topics: (1) What Is a Data Dictionary?; (2) Why Is a Data Dictionary Needed and How Can…

  2. Weighted Discriminative Dictionary Learning based on Low-rank Representation

    NASA Astrophysics Data System (ADS)

    Chang, Heyou; Zheng, Hao

    2017-01-01

    Low-rank representation has been widely used in the field of pattern classification, especially when both training and testing images are corrupted with large noise. Dictionary plays an important role in low-rank representation. With respect to the semantic dictionary, the optimal representation matrix should be block-diagonal. However, traditional low-rank representation based dictionary learning methods cannot effectively exploit the discriminative information between data and dictionary. To address this problem, this paper proposed weighted discriminative dictionary learning based on low-rank representation, where a weighted representation regularization term is constructed. The regularization associates label information of both training samples and dictionary atoms, and encourages to generate a discriminative representation with class-wise block-diagonal structure, which can further improve the classification performance where both training and testing images are corrupted with large noise. Experimental results demonstrate advantages of the proposed method over the state-of-the-art methods.

  3. Intertwining thesauri and dictionaries

    NASA Technical Reports Server (NTRS)

    Buchan, R. L.

    1989-01-01

    The use of dictionaries and thesauri in information retrieval is discussed. The structure and functions of thesauri and dictionaries are described. Particular attention is given to the format of the NASA Thesaurus. The relationship between thesauri and dictionaries, the need to regularize terminology, and the capitalization of words are examined.

  4. Multi-level discriminative dictionary learning with application to large scale image classification.

    PubMed

    Shen, Li; Sun, Gang; Huang, Qingming; Wang, Shuhui; Lin, Zhouchen; Wu, Enhua

    2015-10-01

    The sparse coding technique has shown flexibility and capability in image representation and analysis. It is a powerful tool in many visual applications. Some recent work has shown that incorporating the properties of task (such as discrimination for classification task) into dictionary learning is effective for improving the accuracy. However, the traditional supervised dictionary learning methods suffer from high computation complexity when dealing with large number of categories, making them less satisfactory in large scale applications. In this paper, we propose a novel multi-level discriminative dictionary learning method and apply it to large scale image classification. Our method takes advantage of hierarchical category correlation to encode multi-level discriminative information. Each internal node of the category hierarchy is associated with a discriminative dictionary and a classification model. The dictionaries at different layers are learnt to capture the information of different scales. Moreover, each node at lower layers also inherits the dictionary of its parent, so that the categories at lower layers can be described with multi-scale information. The learning of dictionaries and associated classification models is jointly conducted by minimizing an overall tree loss. The experimental results on challenging data sets demonstrate that our approach achieves excellent accuracy and competitive computation cost compared with other sparse coding methods for large scale image classification.

  5. A Selected Bibliography of Dictionaries. General Information Series, No. 9. Indochinese Refugee Education Guides.

    ERIC Educational Resources Information Center

    Center for Applied Linguistics, Arlington, VA.

    This is a selected, annotated bibliography of dictionaries useful to Indochinese refugees. The purpose of this guide is to provide the American teacher or sponsor with information on the use, limitations and availability of monolingual and bilingual dictionaries which can be used by refugees. The bibliography is preceded by notes on problems with…

  6. Blind Linguistic Steganalysis against Translation Based Steganography

    NASA Astrophysics Data System (ADS)

    Chen, Zhili; Huang, Liusheng; Meng, Peng; Yang, Wei; Miao, Haibo

    Translation based steganography (TBS) is a kind of relatively new and secure linguistic steganography. It takes advantage of the "noise" created by automatic translation of natural language text to encode the secret information. Up to date, there is little research on the steganalysis against this kind of linguistic steganography. In this paper, a blind steganalytic method, which is named natural frequency zoned word distribution analysis (NFZ-WDA), is presented. This method has improved on a previously proposed linguistic steganalysis method based on word distribution which is targeted for the detection of linguistic steganography like nicetext and texto. The new method aims to detect the application of TBS and uses none of the related information about TBS, its only used resource is a word frequency dictionary obtained from a large corpus, or a so called natural frequency dictionary, so it is totally blind. To verify the effectiveness of NFZ-WDA, two experiments with two-class and multi-class SVM classifiers respectively are carried out. The experimental results show that the steganalytic method is pretty promising.

  7. Mobile-Based Dictionary of Information and Communication Technology

    NASA Astrophysics Data System (ADS)

    Liando, O. E. S.; Mewengkang, A.; Kaseger, D.; Sangkop, F. I.; Rantung, V. P.; Rorimpandey, G. C.

    2018-02-01

    This study aims to design and build mobile-based dictionary of information and communication technology applications to provide access to information in the form of glossary of terms in the context of information and communication technologies. Applications built in this study using the Android platform, with SQLite database model. This research uses prototype model development method which covers the stages of communication, Quick Plan, Quick Design Modeling, Construction of Prototype, Deployment Delivery & Feedback, and Full System Transformation. The design of this application is designed in such a way as to facilitate the user in the process of learning and understanding the new terms or vocabularies encountered in the world of information and communication technology. Mobile-based dictionary of Information And Communication Technology applications that have been built can be an alternative to learning literature. In its simplest form, this application is able to meet the need for a comprehensive and accurate dictionary of Information And Communication Technology function.

  8. Classic Classroom Activities: The Oxford Picture Dictionary Program.

    ERIC Educational Resources Information Center

    Weiss, Renee; Adelson-Goldstein, Jayme; Shapiro, Norma

    This teacher resource book offers over 100 reproducible communicative practice activities and 768 picture cards based on the vocabulary of the Oxford Picture Dictionary. Teacher's notes and instructions, including adaptations for multilevel classes, are provided. The activities book has up-to-date art and graphics, explaining over 3700 words. The…

  9. Corpora and Collocations in Chinese-English Dictionaries for Chinese Users

    ERIC Educational Resources Information Center

    Xia, Lixin

    2015-01-01

    The paper identifies the major problems of the Chinese-English dictionary in representing collocational information after an extensive survey of nine dictionaries popular among Chinese users. It is found that the Chinese-English dictionary only provides the collocation types of "v+n" and "v+n," but completely ignores those of…

  10. Marks, Spaces and Boundaries: Punctuation (and Other Effects) in the Typography of Dictionaries

    ERIC Educational Resources Information Center

    Luna, Paul

    2011-01-01

    Dictionary compilers and designers use punctuation to structure and clarify entries and to encode information. Dictionaries with a relatively simple structure can have simple typography and simple punctuation; as dictionaries grew more complex, and encountered the space constraints of the printed page, complex encoding systems were developed,…

  11. [Systems, boundaries and resources: the lexicographer Gerhard Wahrig (1923-1978) and the genesis of his project "dictionary as database"].

    PubMed

    Wahrig-Burfeind, Renate; Wahrig, Bettina

    2014-09-01

    Gerhard Wahrig's private archive has recently been retrieved by the authors and their siblings. We undertake a first survey of the unpublished material and concentrate on those aspects of Wahrig's bio-ergography which stand in relation to his life project "dictionary as database", realised shortly before his death. We argue that this project was conceived in the 1950s, while Wahrig was writing and editing dictionaries and encyclopedias for the Bibliographisches Institut in Leipzig. Wahrig, who had been a wireless operator in WWII, was well informed about the development of computers in West Germany. He was influenced both by Ferdinand de Saussure and by the discussion on language and structure in the Soviet Union. When he crossed the German/German border in 1959, he experienced mechanisms of exclusion before he could establish himself in the West as a lexicographer. We argue that the transfer of symbolic and human capital was problematic due to the cultural differences between the two Germanies. In the 1970s, he became a professor of General and Applied Linguistics. The project of a "dictionary as database" was intended both as a basis for extensive empirical research on the semantic structure of natural languages and as a working tool for the average user of the German language. Due to his untimely death, he could not pursue his idea of exploring semantic networks.

  12. Small molecule annotation for the Protein Data Bank

    PubMed Central

    Sen, Sanchayita; Young, Jasmine; Berrisford, John M.; Chen, Minyu; Conroy, Matthew J.; Dutta, Shuchismita; Di Costanzo, Luigi; Gao, Guanghua; Ghosh, Sutapa; Hudson, Brian P.; Igarashi, Reiko; Kengaku, Yumiko; Liang, Yuhe; Peisach, Ezra; Persikova, Irina; Mukhopadhyay, Abhik; Narayanan, Buvaneswari Coimbatore; Sahni, Gaurav; Sato, Junko; Sekharan, Monica; Shao, Chenghua; Tan, Lihua; Zhuravleva, Marina A.

    2014-01-01

    The Protein Data Bank (PDB) is the single global repository for three-dimensional structures of biological macromolecules and their complexes, and its more than 100 000 structures contain more than 20 000 distinct ligands or small molecules bound to proteins and nucleic acids. Information about these small molecules and their interactions with proteins and nucleic acids is crucial for our understanding of biochemical processes and vital for structure-based drug design. Small molecules present in a deposited structure may be attached to a polymer or may occur as a separate, non-covalently linked ligand. During curation of a newly deposited structure by wwPDB annotation staff, each molecule is cross-referenced to the PDB Chemical Component Dictionary (CCD). If the molecule is new to the PDB, a dictionary description is created for it. The information about all small molecule components found in the PDB is distributed via the ftp archive as an external reference file. Small molecule annotation in the PDB also includes information about ligand-binding sites and about covalent and other linkages between ligands and macromolecules. During the remediation of the peptide-like antibiotics and inhibitors present in the PDB archive in 2011, it became clear that additional annotation was required for consistent representation of these molecules, which are quite often composed of several sequential subcomponents including modified amino acids and other chemical groups. The connectivity information of the modified amino acids is necessary for correct representation of these biologically interesting molecules. The combined information is made available via a new resource called the Biologically Interesting molecules Reference Dictionary, which is complementary to the CCD and is now routinely used for annotation of peptide-like antibiotics and inhibitors. PMID:25425036

  13. Small molecule annotation for the Protein Data Bank.

    PubMed

    Sen, Sanchayita; Young, Jasmine; Berrisford, John M; Chen, Minyu; Conroy, Matthew J; Dutta, Shuchismita; Di Costanzo, Luigi; Gao, Guanghua; Ghosh, Sutapa; Hudson, Brian P; Igarashi, Reiko; Kengaku, Yumiko; Liang, Yuhe; Peisach, Ezra; Persikova, Irina; Mukhopadhyay, Abhik; Narayanan, Buvaneswari Coimbatore; Sahni, Gaurav; Sato, Junko; Sekharan, Monica; Shao, Chenghua; Tan, Lihua; Zhuravleva, Marina A

    2014-01-01

    The Protein Data Bank (PDB) is the single global repository for three-dimensional structures of biological macromolecules and their complexes, and its more than 100,000 structures contain more than 20,000 distinct ligands or small molecules bound to proteins and nucleic acids. Information about these small molecules and their interactions with proteins and nucleic acids is crucial for our understanding of biochemical processes and vital for structure-based drug design. Small molecules present in a deposited structure may be attached to a polymer or may occur as a separate, non-covalently linked ligand. During curation of a newly deposited structure by wwPDB annotation staff, each molecule is cross-referenced to the PDB Chemical Component Dictionary (CCD). If the molecule is new to the PDB, a dictionary description is created for it. The information about all small molecule components found in the PDB is distributed via the ftp archive as an external reference file. Small molecule annotation in the PDB also includes information about ligand-binding sites and about covalent and other linkages between ligands and macromolecules. During the remediation of the peptide-like antibiotics and inhibitors present in the PDB archive in 2011, it became clear that additional annotation was required for consistent representation of these molecules, which are quite often composed of several sequential subcomponents including modified amino acids and other chemical groups. The connectivity information of the modified amino acids is necessary for correct representation of these biologically interesting molecules. The combined information is made available via a new resource called the Biologically Interesting molecules Reference Dictionary, which is complementary to the CCD and is now routinely used for annotation of peptide-like antibiotics and inhibitors. © The Author(s) 2014. Published by Oxford University Press.

  14. Environmental Health and Toxicology Resources of the United States National Library of Medicine

    PubMed Central

    Hochstein, Colette; Arnesen, Stacey; Goshorn, Jeanne

    2009-01-01

    For over 40 years, the National Library of Medicine’s (NLM) Toxicology and Environmental Health Information Program (TEHIP) has worked to organize and to provide access to an extensive array of environmental health and toxicology resources. During these years, the TEHIP program has evolved from a handful of databases developed primarily for researchers to a broad range of products and services that also serve industry, students, and the general public. TEHIP’s resources include TOXNET® , a collection of databases, including online handbooks, bibliographic references, information on the release of chemicals in the environment, and a chemical dictionary. TEHIP also produces several resources aimed towards the general public, such as the Household Products Database , which helps users explore chemicals often found in common household products, and Tox Town® , an interactive guide to commonly encountered toxic substances, health, and the environment. This paper introduces some of NLM’s environmental health and toxicology resources. PMID:17915629

  15. Concordancers and Dictionaries as Problem-Solving Tools for ESL Academic Writing

    ERIC Educational Resources Information Center

    Yoon, Choongil

    2016-01-01

    The present study investigated how 6 Korean ESL graduate students in Canada used a suite of freely available reference resources, consisting of Web-based corpus tools, Google search engines, and dictionaries, for solving linguistic problems while completing an authentic academic writing assignment in English. Using a mixed methods design, the…

  16. The Talking Dictionary. The Prospectus Series, Paper No. 2.

    ERIC Educational Resources Information Center

    Ward, Ted

    Three talking dictionaires designed to increase independence and resource-use skills of handicapped children have specific advantages and limitations. System I involves a random access tape recorder, a printed or braille dictionary which contains the inquiry numbers for words, a console (similar to an adding machine) on which the number is…

  17. The Primary Computer Dictionary.

    ERIC Educational Resources Information Center

    Girard, Suzanne; Willing, Kathlene

    Suitable for children from kindergarten to grade three, this dictionary is designed to introduce young children to computer terminology at a level that they will understand and find useful. It is also suitable for parents as a home resource, for library use, and as a handbook for teachers. The first sentence of each definition contains the kernel…

  18. Syntactic and Semantic Specifications in Online English Learners' Dictionaries

    ERIC Educational Resources Information Center

    Rizo-Rodriguez, Alfonso

    2009-01-01

    Among the multifarious linguistic resources currently available on the Internet, learners of English as a foreign language, as well as teachers and translators, can effortlessly access a vast variety of electronic dictionaries well suited to a multiplicity of lookup operations. A particular kind of lexicographical work on the Web is the…

  19. Organizing the present, looking to the future: an online knowledge repository to facilitate collaboration.

    PubMed

    Burchill, C; Roos, L L; Fergusson, P; Jebamani, L; Turner, K; Dueck, S

    2000-01-01

    Comprehensive data available in the Canadian province of Manitoba since 1970 have aided study of the interaction between population health, health care utilization, and structural features of the health care system. Given a complex linked database and many ongoing projects, better organization of available epidemiological, institutional, and technical information was needed. The Manitoba Centre for Health Policy and Evaluation wished to develop a knowledge repository to handle data, document research Methods, and facilitate both internal communication and collaboration with other sites. This evolving knowledge repository consists of both public and internal (restricted access) pages on the World Wide Web (WWW). Information can be accessed using an indexed logical format or queried to allow entry at user-defined points. The main topics are: Concept Dictionary, Research Definitions, Meta-Index, and Glossary. The Concept Dictionary operationalizes concepts used in health research using administrative data, outlining the creation of complex variables. Research Definitions specify the codes for common surgical procedures, tests, and diagnoses. The Meta-Index organizes concepts and definitions according to the Medical Sub-Heading (MeSH) system developed by the National Library of Medicine. The Glossary facilitates navigation through the research terms and abbreviations in the knowledge repository. An Education Resources heading presents a web-based graduate course using substantial amounts of material in the Concept Dictionary, a lecture in the Epidemiology Supercourse, and material for Manitoba's Regional Health Authorities. Confidential information (including Data Dictionaries) is available on the Centre's internal website. Use of the public pages has increased dramatically since January 1998, with almost 6,000 page hits from 250 different hosts in May 1999. More recently, the number of page hits has averaged around 4,000 per month, while the number of unique hosts has climbed to around 400. This knowledge repository promotes standardization and increases efficiency by placing concepts and associated programming in the Centre's collective memory. Collaboration and project management are facilitated.

  20. Organizing the Present, Looking to the Future: An Online Knowledge Repository to Facilitate Collaboration

    PubMed Central

    Burchill, Charles; Fergusson, Patricia; Jebamani, Laurel; Turner, Ken; Dueck, Stephen

    2000-01-01

    Background Comprehensive data available in the Canadian province of Manitoba since 1970 have aided study of the interaction between population health, health care utilization, and structural features of the health care system. Given a complex linked database and many ongoing projects, better organization of available epidemiological, institutional, and technical information was needed. Objective The Manitoba Centre for Health Policy and Evaluation wished to develop a knowledge repository to handle data, document research methods, and facilitate both internal communication and collaboration with other sites. Methods This evolving knowledge repository consists of both public and internal (restricted access) pages on the World Wide Web (WWW). Information can be accessed using an indexed logical format or queried to allow entry at user-defined points. The main topics are: Concept Dictionary, Research Definitions, Meta-Index, and Glossary. The Concept Dictionary operationalizes concepts used in health research using administrative data, outlining the creation of complex variables. Research Definitions specify the codes for common surgical procedures, tests, and diagnoses. The Meta-Index organizes concepts and definitions according to the Medical Sub-Heading (MeSH) system developed by the National Library of Medicine. The Glossary facilitates navigation through the research terms and abbreviations in the knowledge repository. An Education Resources heading presents a web-based graduate course using substantial amounts of material in the Concept Dictionary, a lecture in the Epidemiology Supercourse, and material for Manitoba's Regional Health Authorities. Confidential information (including Data Dictionaries) is available on the Centre's internal website. Results Use of the public pages has increased dramatically since January 1998, with almost 6,000 page hits from 250 different hosts in May 1999. More recently, the number of page hits has averaged around 4,000 per month, while the number of unique hosts has climbed to around 400. Conclusions This knowledge repository promotes standardization and increases efficiency by placing concepts and associated programming in the Centre's collective memory. Collaboration and project management are facilitated. PMID:11720929

  1. A Study of the Relationship between Type of Dictionary Used and Lexical Proficiency in Writings of Iranian EFL Students

    ERIC Educational Resources Information Center

    Vahdany, Fereidoon; Abdollahzadeh, Milad; Gholami, Shokoufeh; Ghanipoor, Mahmood

    2014-01-01

    This study aimed at investigating the relationship between types of dictionaries used and lexical proficiency in writing. Eighty TOEFL students took part in responding to two Questionnaires collecting information about their dictionary type preferences and habits of dictionary use, along with an interview for further in-depth responses. They were…

  2. English-Chinese Cross-Language IR Using Bilingual Dictionaries

    DTIC Science & Technology

    2006-01-01

    specialized dictionaries together contain about two million entries [6]. 4 Monolingual Experiment The Chinese documents and the Chinese translations of... monolingual performance. The main performance-limiting factor is the limited coverage of the dictionary used in query translation. Some of the key con...English-Chinese Cross-Language IR using Bilingual Dictionaries Aitao Chen , Hailing Jiang , and Fredric Gey School of Information Management

  3. Basis Expansion Approaches for Regularized Sequential Dictionary Learning Algorithms With Enforced Sparsity for fMRI Data Analysis.

    PubMed

    Seghouane, Abd-Krim; Iqbal, Asif

    2017-09-01

    Sequential dictionary learning algorithms have been successfully applied to functional magnetic resonance imaging (fMRI) data analysis. fMRI data sets are, however, structured data matrices with the notions of temporal smoothness in the column direction. This prior information, which can be converted into a constraint of smoothness on the learned dictionary atoms, has seldomly been included in classical dictionary learning algorithms when applied to fMRI data analysis. In this paper, we tackle this problem by proposing two new sequential dictionary learning algorithms dedicated to fMRI data analysis by accounting for this prior information. These algorithms differ from the existing ones in their dictionary update stage. The steps of this stage are derived as a variant of the power method for computing the SVD. The proposed algorithms generate regularized dictionary atoms via the solution of a left regularized rank-one matrix approximation problem where temporal smoothness is enforced via regularization through basis expansion and sparse basis expansion in the dictionary update stage. Applications on synthetic data experiments and real fMRI data sets illustrating the performance of the proposed algorithms are provided.

  4. The PDA as a reference tool: libraries' role in enhancing nursing education.

    PubMed

    Scollin, Patrick; Callahan, John; Mehta, Apurva; Garcia, Elizabeth

    2006-01-01

    "The PDA as a Reference Tool: The Libraries' Role in Enhancing Nursing Education" is a pilot project funded by the University of Massachusetts President's Office Information Technology Council through their Professional Development Grant program in 2004. The project's goal is to offer faculty and students in nursing programs at two University of Massachusetts campuses access to an array of medical reference information, such as handbooks, dictionaries, calculators, and diagnostic tools, on small handheld computers called personal digital assistants. Through exposure to the variety of information resources in this digital format, participants can discover and explore these resources at no personal financial cost. Participants borrow handhelds from the University Library's circulation desks. The libraries provide support in routine resynchronizing of handhelds to update information. This report will discuss how the projects were administered, what we learned about what did and did not work, the problems and solutions, and where we hope to go from here.

  5. Dictionary learning-based CT detection of pulmonary nodules

    NASA Astrophysics Data System (ADS)

    Wu, Panpan; Xia, Kewen; Zhang, Yanbo; Qian, Xiaohua; Wang, Ge; Yu, Hengyong

    2016-10-01

    Segmentation of lung features is one of the most important steps for computer-aided detection (CAD) of pulmonary nodules with computed tomography (CT). However, irregular shapes, complicated anatomical background and poor pulmonary nodule contrast make CAD a very challenging problem. Here, we propose a novel scheme for feature extraction and classification of pulmonary nodules through dictionary learning from training CT images, which does not require accurately segmented pulmonary nodules. Specifically, two classification-oriented dictionaries and one background dictionary are learnt to solve a two-category problem. In terms of the classification-oriented dictionaries, we calculate sparse coefficient matrices to extract intrinsic features for pulmonary nodule classification. The support vector machine (SVM) classifier is then designed to optimize the performance. Our proposed methodology is evaluated with the lung image database consortium and image database resource initiative (LIDC-IDRI) database, and the results demonstrate that the proposed strategy is promising.

  6. Sentiment analysis of political communication: combining a dictionary approach with crowdcoding.

    PubMed

    Haselmayer, Martin; Jenny, Marcelo

    2017-01-01

    Sentiment is important in studies of news values, public opinion, negative campaigning or political polarization and an explosive expansion of digital textual data and fast progress in automated text analysis provide vast opportunities for innovative social science research. Unfortunately, tools currently available for automated sentiment analysis are mostly restricted to English texts and require considerable contextual adaption to produce valid results. We present a procedure for collecting fine-grained sentiment scores through crowdcoding to build a negative sentiment dictionary in a language and for a domain of choice. The dictionary enables the analysis of large text corpora that resource-intensive hand-coding struggles to cope with. We calculate the tonality of sentences from dictionary words and we validate these estimates with results from manual coding. The results show that the crowdbased dictionary provides efficient and valid measurement of sentiment. Empirical examples illustrate its use by analyzing the tonality of party statements and media reports.

  7. Implementation and management of a biomedical observation dictionary in a large healthcare information system.

    PubMed

    Vandenbussche, Pierre-Yves; Cormont, Sylvie; André, Christophe; Daniel, Christel; Delahousse, Jean; Charlet, Jean; Lepage, Eric

    2013-01-01

    This study shows the evolution of a biomedical observation dictionary within the Assistance Publique Hôpitaux Paris (AP-HP), the largest European university hospital group. The different steps are detailed as follows: the dictionary creation, the mapping to logical observation identifier names and codes (LOINC), the integration into a multiterminological management platform and, finally, the implementation in the health information system. AP-HP decided to create a biomedical observation dictionary named AnaBio, to map it to LOINC and to maintain the mapping. A management platform based on methods used for knowledge engineering has been put in place. It aims at integrating AnaBio within the health information system and improving both the quality and stability of the dictionary. This new management platform is now active in AP-HP. The AnaBio dictionary is shared by 120 laboratories and currently includes 50 000 codes. The mapping implementation to LOINC reaches 40% of the AnaBio entries and uses 26% of LOINC records. The results of our work validate the choice made to develop a local dictionary aligned with LOINC. This work constitutes a first step towards a wider use of the platform. The next step will support the entire biomedical production chain, from the clinician prescription, through laboratory tests tracking in the laboratory information system to the communication of results and the use for decision support and biomedical research. In addition, the increase in the mapping implementation to LOINC ensures the interoperability allowing communication with other international health institutions.

  8. Creating a Digital Jamaican Sign Language Dictionary: A R2D2 Approach

    ERIC Educational Resources Information Center

    MacKinnon, Gregory; Soutar, Iris

    2015-01-01

    The Jamaican Association for the Deaf, in their responsibilities to oversee education for individuals who are deaf in Jamaica, has demonstrated an urgent need for a dictionary that assists students, educators, and parents with the practical use of "Jamaican Sign Language." While paper versions of a preliminary resource have been explored…

  9. The "Dictionary of Smoky Mountain English" as a Resource for Southern Appalachia.

    ERIC Educational Resources Information Center

    Montgomery, Michael

    This paper argues that one important reflection of a culture's status is the existence of general reference books on it. To this end, it discusses the forthcoming "Dictionary of Smoky Mountain English," a book designed to address the lack of a comprehensive reference work on Appalachian speech and language patterns in this region. The…

  10. Highly undersampled MR image reconstruction using an improved dual-dictionary learning method with self-adaptive dictionaries.

    PubMed

    Li, Jiansen; Song, Ying; Zhu, Zhen; Zhao, Jun

    2017-05-01

    Dual-dictionary learning (Dual-DL) method utilizes both a low-resolution dictionary and a high-resolution dictionary, which are co-trained for sparse coding and image updating, respectively. It can effectively exploit a priori knowledge regarding the typical structures, specific features, and local details of training sets images. The prior knowledge helps to improve the reconstruction quality greatly. This method has been successfully applied in magnetic resonance (MR) image reconstruction. However, it relies heavily on the training sets, and dictionaries are fixed and nonadaptive. In this research, we improve Dual-DL by using self-adaptive dictionaries. The low- and high-resolution dictionaries are updated correspondingly along with the image updating stage to ensure their self-adaptivity. The updated dictionaries incorporate both the prior information of the training sets and the test image directly. Both dictionaries feature improved adaptability. Experimental results demonstrate that the proposed method can efficiently and significantly improve the quality and robustness of MR image reconstruction.

  11. Using Medical Dictionaries to Teach the Critical Evaluation of Information Sources.

    ERIC Educational Resources Information Center

    Duff, Alistair S.

    1995-01-01

    A course-integrated bibliographic instruction session was designed to develop skills in evaluating biomedical information sources. Students in small groups evaluated and ranked medical and nursing dictionaries and defended ratings to the class. (SK)

  12. Dave Sperling's Guide to the Internet's Best Writing Resources.

    ERIC Educational Resources Information Center

    Sperling, Dave

    2003-01-01

    Provides a guide to writing resources on the Internet, including resources for business writing, dictionaries and thesauruses, e-mail, encyclopedias, free Web space, grammar, fun, online help, online writing labs, punctuation, and spelling. Lists useful Internet tips. (Author/VWL)

  13. Implementation and management of a biomedical observation dictionary in a large healthcare information system

    PubMed Central

    Vandenbussche, Pierre-Yves; Cormont, Sylvie; André, Christophe; Daniel, Christel; Delahousse, Jean; Charlet, Jean; Lepage, Eric

    2013-01-01

    Objective This study shows the evolution of a biomedical observation dictionary within the Assistance Publique Hôpitaux Paris (AP-HP), the largest European university hospital group. The different steps are detailed as follows: the dictionary creation, the mapping to logical observation identifier names and codes (LOINC), the integration into a multiterminological management platform and, finally, the implementation in the health information system. Methods AP-HP decided to create a biomedical observation dictionary named AnaBio, to map it to LOINC and to maintain the mapping. A management platform based on methods used for knowledge engineering has been put in place. It aims at integrating AnaBio within the health information system and improving both the quality and stability of the dictionary. Results This new management platform is now active in AP-HP. The AnaBio dictionary is shared by 120 laboratories and currently includes 50 000 codes. The mapping implementation to LOINC reaches 40% of the AnaBio entries and uses 26% of LOINC records. The results of our work validate the choice made to develop a local dictionary aligned with LOINC. Discussion and Conclusions This work constitutes a first step towards a wider use of the platform. The next step will support the entire biomedical production chain, from the clinician prescription, through laboratory tests tracking in the laboratory information system to the communication of results and the use for decision support and biomedical research. In addition, the increase in the mapping implementation to LOINC ensures the interoperability allowing communication with other international health institutions. PMID:23635601

  14. An object-oriented design for automated navigation of semantic networks inside a medical data dictionary.

    PubMed

    Ruan, W; Bürkle, T; Dudeck, J

    2000-01-01

    In this paper we present a data dictionary server for the automated navigation of information sources. The underlying knowledge is represented within a medical data dictionary. The mapping between medical terms and information sources is based on a semantic network. The key aspect of implementing the dictionary server is how to represent the semantic network in a way that is easier to navigate and to operate, i.e. how to abstract the semantic network and to represent it in memory for various operations. This paper describes an object-oriented design based on Java that represents the semantic network in terms of a group of objects. A node and its relationships to its neighbors are encapsulated in one object. Based on such a representation model, several operations have been implemented. They comprise the extraction of parts of the semantic network which can be reached from a given node as well as finding all paths between a start node and a predefined destination node. This solution is independent of any given layout of the semantic structure. Therefore the module, called Giessen Data Dictionary Server can act independent of a specific clinical information system. The dictionary server will be used to present clinical information, e.g. treatment guidelines or drug information sources to the clinician in an appropriate working context. The server is invoked from clinical documentation applications which contain an infobutton. Automated navigation will guide the user to all the information relevant to her/his topic, which is currently available inside our closed clinical network.

  15. A Novel Approach to Creating Disambiguated Multilingual Dictionaries

    ERIC Educational Resources Information Center

    Boguslavsky, Igor; Cardenosa, Jesus; Gallardo, Carolina

    2009-01-01

    Multilingual lexicons are needed in various applications, such as cross-lingual information retrieval, machine translation, and some others. Often, these applications suffer from the ambiguity of dictionary items, especially when an intermediate natural language is involved in the process of the dictionary construction, since this language adds…

  16. Binukid Dictionary.

    ERIC Educational Resources Information Center

    Otanes, Fe T., Ed.; Wrigglesworth, Hazel

    1992-01-01

    The dictionary of Binukid, a language spoken in the Bukidnon province of the Philippines, is intended as a tool for students of Binukid and for native Binukid-speakers interested in learning English. A single dialect was chosen for this work. The dictionary is introduced by notes on Binukid grammar, including basic information about phonology and…

  17. What Online Traditional Medicine Dictionaries Bring To English Speakers Now? Concepts or Equivalents?

    PubMed Central

    Fang, Lu

    2018-01-01

    Nowadays, more and more Chinese medicine practices are applied in the world and popularizing that becomes an urgent task. To meet the requiremets, an increasing number of Chinese - English traditional medicine dictionaries have been produced at home or abroad in recent decades. Nevertheless, the users are still struggling to spot the information in dictionaries. What traditional medicine dictionaries are needed for the English speakers now? To identify an entry model for online TCM dictionaries, I compared the entries in five printed traditional medicine dictionaries and two online ones. Based upon this, I tentatively put forward two samples, “阳经 (yángjīng)” and “阴经 (yīn jīng)”, focusing on concepts transmitting, for online Chinese - English TCM dictionaries. PMID:29875861

  18. SaRAD: a Simple and Robust Abbreviation Dictionary.

    PubMed

    Adar, Eytan

    2004-03-01

    Due to recent interest in the use of textual material to augment traditional experiments it has become necessary to automatically cluster, classify and filter natural language information. The Simple and Robust Abbreviation Dictionary (SaRAD) provides an easy to implement, high performance tool for the construction of a biomedical symbol dictionary. The algorithms, applied to the MEDLINE document set, result in a high quality dictionary and toolset to disambiguate abbreviation symbols automatically.

  19. Standardized Representation of Clinical Study Data Dictionaries with CIMI Archetypes.

    PubMed

    Sharma, Deepak K; Solbrig, Harold R; Prud'hommeaux, Eric; Pathak, Jyotishman; Jiang, Guoqian

    2016-01-01

    Researchers commonly use a tabular format to describe and represent clinical study data. The lack of standardization of data dictionary's metadata elements presents challenges for their harmonization for similar studies and impedes interoperability outside the local context. We propose that representing data dictionaries in the form of standardized archetypes can help to overcome this problem. The Archetype Modeling Language (AML) as developed by the Clinical Information Modeling Initiative (CIMI) can serve as a common format for the representation of data dictionary models. We mapped three different data dictionaries (identified from dbGAP, PheKB and TCGA) onto AML archetypes by aligning dictionary variable definitions with the AML archetype elements. The near complete alignment of data dictionaries helped map them into valid AML models that captured all data dictionary model metadata. The outcome of the work would help subject matter experts harmonize data models for quality, semantic interoperability and better downstream data integration.

  20. HST Keyword Dictionary

    NASA Astrophysics Data System (ADS)

    Swade, D. A.; Gardner, L.; Hopkins, E.; Kimball, T.; Lezon, K.; Rose, J.; Shiao, B.

    STScI has undertaken a project to place all HST keyword information in one source, the keyword database, and to provide a mechanism for making this keyword information accessible to all HST users, the keyword dictionary, which is a WWW interface to the keyword database.

  1. Essential Nursing References.

    ERIC Educational Resources Information Center

    Nursing and Health Care Perspectives, 2000

    2000-01-01

    This partially annotated bibliography contains these categories: abstract sources, archives, audiovisuals, bibliographies, databases, dictionaries, directories, drugs/toxicology/environmental health, grant resources, histories, indexes, Internet resources, reviews, statistical sources, and writers' manuals and guides. A supplement lists Canadian…

  2. An Analysis of Data Dictionaries and Their Role in Information Resource Management.

    DTIC Science & Technology

    1984-09-01

    management system DBMS). It manages data by utilizing software routines built Lato the idta di:-tionary package and thus is not dependent 3n D.IrS soft- wace ...described as having active or passive interfaces or a combination of the two. An inter- faze is a series of commands which connact the data...carefully conceived examples in the ii:tionary’s refer- enzc manuals. A hierarchy of menus can reduce complex oper- ations to a series of smaller

  3. The Junior Computer Dictionary. 101 Useful Words and Definitions to Introduce Students to Computer Terminology.

    ERIC Educational Resources Information Center

    Willing, Kathlene R.; Girard, Suzanne

    Suitable for children from grades four to seven, this dictionary is designed to introduce children to computer terminology at a level that they will understand and find useful. It is also suitable as a home resource for parents, for library use, and as a handbook for teachers. For each word, the first sentence of the definition contains the kernel…

  4. The T.M.R. Data Dictionary: A Management Tool for Data Base Design

    PubMed Central

    Ostrowski, Maureen; Bernes, Marshall R.

    1984-01-01

    In January 1981, a dictionary-driven ambulatory care information system known as TMR (The Medical Record) was installed at a large private medical group practice in Los Angeles. TMR's data dictionary has enabled the medical group to adapt the software to meet changing user needs largely without programming support. For top management, the dictionary is also a tool for navigating through the system's complexity and assuring the integrity of management goals.

  5. Interchanging lexical information for a multilingual dictionary.

    PubMed

    Baud, R H; Nyström, M; Borin, L; Evans, R; Schulz, S; Zweigenbaum, P

    2005-01-01

    To facilitate the interchange of lexical information for multiple languages in the medical domain. To pave the way for the emergence of a generally available truly multilingual electronic dictionary in the medical domain. An interchange format has to be neutral relative to the target languages. It has to be consistent with current needs of lexicon authors, present and future. An active interaction between six potential authors aimed to determine a common denominator striking the right balance between richness of content and ease of use for lexicon providers. A simple list of relevant attributes has been established and published. The format has the potential for collecting relevant parts of a future multilingual dictionary. An XML version is available. This effort makes feasible the exchange of lexical information between research groups. Interchange files are made available in a public repository. This procedure opens the door to a true multilingual dictionary, in the awareness that the exchange of lexical information is (only) a necessary first step, before structuring the corresponding entries in different languages.

  6. A dictionary based informational genome analysis

    PubMed Central

    2012-01-01

    Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068

  7. Dictionary Culture of University Students Learning English as a Foreign Language in Turkey

    ERIC Educational Resources Information Center

    Baskin, Sami; Mumcu, Muhsin

    2018-01-01

    Dictionaries, one of the oldest tools of language education, have continued to be a part of education although information technologies and concept of education has changed over time. Until today, with the help of the developments in technology both types of dictionaries have increased, and usage areas have expanded. Therefore, it is possible to…

  8. Barriers to Information Transfer and Approaches Toward Their Reduction, Conference Proceedings of the Technical Information Panel Specialists’ Meeting Held in Washington, DC on 23-24 September 1987.

    DTIC Science & Technology

    1988-03-01

    oriented expansion of dictionaries and systems. 4,.j - Portability. Included essential criteria for evaluation are: N - Quality of the raw (also called...hard to be made without having precise criteria for the de- cision. Because the amount of data in computerized dictionaries - on the long line of...develop- ment of MT and CAT systems - is the decisive component, the update of the (electronic) dictionary plays a substantial part in both alternatives

  9. Multivariate temporal dictionary learning for EEG.

    PubMed

    Barthélemy, Q; Gouy-Pailler, C; Isaac, Y; Souloumiac, A; Larue, A; Mars, J I

    2013-04-30

    This article addresses the issue of representing electroencephalographic (EEG) signals in an efficient way. While classical approaches use a fixed Gabor dictionary to analyze EEG signals, this article proposes a data-driven method to obtain an adapted dictionary. To reach an efficient dictionary learning, appropriate spatial and temporal modeling is required. Inter-channels links are taken into account in the spatial multivariate model, and shift-invariance is used for the temporal model. Multivariate learned kernels are informative (a few atoms code plentiful energy) and interpretable (the atoms can have a physiological meaning). Using real EEG data, the proposed method is shown to outperform the classical multichannel matching pursuit used with a Gabor dictionary, as measured by the representative power of the learned dictionary and its spatial flexibility. Moreover, dictionary learning can capture interpretable patterns: this ability is illustrated on real data, learning a P300 evoked potential. Copyright © 2013 Elsevier B.V. All rights reserved.

  10. Standardized Representation of Clinical Study Data Dictionaries with CIMI Archetypes

    PubMed Central

    Sharma, Deepak K.; Solbrig, Harold R.; Prud’hommeaux, Eric; Pathak, Jyotishman; Jiang, Guoqian

    2016-01-01

    Researchers commonly use a tabular format to describe and represent clinical study data. The lack of standardization of data dictionary’s metadata elements presents challenges for their harmonization for similar studies and impedes interoperability outside the local context. We propose that representing data dictionaries in the form of standardized archetypes can help to overcome this problem. The Archetype Modeling Language (AML) as developed by the Clinical Information Modeling Initiative (CIMI) can serve as a common format for the representation of data dictionary models. We mapped three different data dictionaries (identified from dbGAP, PheKB and TCGA) onto AML archetypes by aligning dictionary variable definitions with the AML archetype elements. The near complete alignment of data dictionaries helped map them into valid AML models that captured all data dictionary model metadata. The outcome of the work would help subject matter experts harmonize data models for quality, semantic interoperability and better downstream data integration. PMID:28269909

  11. French as a Second Language. Annotated Bibliography of Learning Resources: Beginning Level. Early Childhood Services - Grade 12.

    ERIC Educational Resources Information Center

    Alberta Dept. of Education, Edmonton. Language Services Branch.

    This annotated bibliography of instructional resources for Alberta (Canada) introductory French second language teaching in early childhood, elementary, and secondary education consists of citations in 10 categories: audio/video recordings; communicative activity resources (primarily texts and workbooks); dictionaries and vocabulary handbooks;…

  12. Training Manual: Dictionary of Occupational Titles.

    ERIC Educational Resources Information Center

    Georgia State Dept. of Human Resources, Atlanta.

    The training manual was developed as a tool for understanding the occupational information and descriptive data presented in the Dictionary of Occupational Titles (DOT) (Volumes 1 and 2 and Supplements 1 and 2). Exercises are provided in workbook form to increase an understanding of the occupational information presented. Exercises coordinated…

  13. Learning Words from Context and Dictionaries: An Experimental Comparison.

    ERIC Educational Resources Information Center

    Fischer, Ute

    1994-01-01

    Investigated the independent and interactive effects of contextual and definitional information on vocabulary learning. German students of English received either a text with unfamiliar English words or their monolingual English dictionary entries. A third group received both. Information about word context is crucial to understanding meaning. (44…

  14. Hyperspectral imagery super-resolution by compressive sensing inspired dictionary learning and spatial-spectral regularization.

    PubMed

    Huang, Wei; Xiao, Liang; Liu, Hongyi; Wei, Zhihui

    2015-01-19

    Due to the instrumental and imaging optics limitations, it is difficult to acquire high spatial resolution hyperspectral imagery (HSI). Super-resolution (SR) imagery aims at inferring high quality images of a given scene from degraded versions of the same scene. This paper proposes a novel hyperspectral imagery super-resolution (HSI-SR) method via dictionary learning and spatial-spectral regularization. The main contributions of this paper are twofold. First, inspired by the compressive sensing (CS) framework, for learning the high resolution dictionary, we encourage stronger sparsity on image patches and promote smaller coherence between the learned dictionary and sensing matrix. Thus, a sparsity and incoherence restricted dictionary learning method is proposed to achieve higher efficiency sparse representation. Second, a variational regularization model combing a spatial sparsity regularization term and a new local spectral similarity preserving term is proposed to integrate the spectral and spatial-contextual information of the HSI. Experimental results show that the proposed method can effectively recover spatial information and better preserve spectral information. The high spatial resolution HSI reconstructed by the proposed method outperforms reconstructed results by other well-known methods in terms of both objective measurements and visual evaluation.

  15. Sequential Dictionary Learning From Correlated Data: Application to fMRI Data Analysis.

    PubMed

    Seghouane, Abd-Krim; Iqbal, Asif

    2017-03-22

    Sequential dictionary learning via the K-SVD algorithm has been revealed as a successful alternative to conventional data driven methods such as independent component analysis (ICA) for functional magnetic resonance imaging (fMRI) data analysis. fMRI datasets are however structured data matrices with notions of spatio-temporal correlation and temporal smoothness. This prior information has not been included in the K-SVD algorithm when applied to fMRI data analysis. In this paper we propose three variants of the K-SVD algorithm dedicated to fMRI data analysis by accounting for this prior information. The proposed algorithms differ from the K-SVD in their sparse coding and dictionary update stages. The first two algorithms account for the known correlation structure in the fMRI data by using the squared Q, R-norm instead of the Frobenius norm for matrix approximation. The third and last algorithm account for both the known correlation structure in the fMRI data and the temporal smoothness. The temporal smoothness is incorporated in the dictionary update stage via regularization of the dictionary atoms obtained with penalization. The performance of the proposed dictionary learning algorithms are illustrated through simulations and applications on real fMRI data.

  16. Terminological reference of a knowledge-based system: the data dictionary.

    PubMed

    Stausberg, J; Wormek, A; Kraut, U

    1995-01-01

    The development of open and integrated knowledge bases makes new demands on the definition of the used terminology. The definition should be realized in a data dictionary separated from the knowledge base. Within the works done at a reference model of medical knowledge, a data dictionary has been developed and used in different applications: a term definition shell, a documentation tool and a knowledge base. The data dictionary includes that part of terminology, which is largely independent of a certain knowledge model. For that reason, the data dictionary can be used as a basis for integrating knowledge bases into information systems, for knowledge sharing and reuse and for modular development of knowledge-based systems.

  17. Interchanging Lexical Information for a Multilingual Dictionary

    PubMed Central

    Baud, RH; Nyström, M; Borin, L; Evans, R; Schulz, S; Zweigenbaum, P

    2005-01-01

    Objective To facilitate the interchange of lexical information for multiple languages in the medical domain. To pave the way for the emergence of a generally available truly multilingual electronic dictionary in the medical domain. Methods An interchange format has to be neutral relative to the target languages. It has to be consistent with current needs of lexicon authors, present and future. An active interaction between six potential authors aimed to determine a common denominator striking the right balance between richness of content and ease of use for lexicon providers. Results A simple list of relevant attributes has been established and published. The format has the potential for collecting relevant parts of a future multilingual dictionary. An XML version is available. Conclusion This effort makes feasible the exchange of lexical information between research groups. Interchange files are made available in a public repository. This procedure opens the door to a true multilingual dictionary, in the awareness that the exchange of lexical information is (only) a necessary first step, before structuring the corresponding entries in different languages. PMID:16778996

  18. IRDS prototyping with applications to the representation of EA/RA models

    NASA Technical Reports Server (NTRS)

    Lekkos, Anthony A.; Greenwood, Bruce

    1988-01-01

    The requirements and system overview for the Information Resources Dictionary System (IRDS) are described. A formal design specification for a scaled down IRDS implementation compatible with the proposed FIPS IRDS standard is contained. The major design objectives for this IRDS will include a menu driven user interface, implementation of basic IRDS operations, and PC compatibility. The IRDS was implemented using Smalltalk/5 object oriented programming system and an ATT 6300 personal computer running under MS-DOS 3.1. The difficulties encountered in using Smalltalk are discussed.

  19. ESP Students' Views on Online Language Resources for L2 Text Production Purposes

    ERIC Educational Resources Information Center

    Kozlova, Inna; Presas, Marisa

    2013-01-01

    The use of online language resources for L2 text production purposes is a recent phenomenon and has not yet been studied in depth. Increasing availability of new online resources seems to be changing the very nature of L2 text production. The traditional dictionary, hitherto a default resource to help with language doubts, is being left behind…

  20. Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format

    PubMed Central

    Kinjo, Akira R.; Suzuki, Hirofumi; Yamashita, Reiko; Ikegawa, Yasuyo; Kudou, Takahiro; Igarashi, Reiko; Kengaku, Yumiko; Cho, Hasumi; Standley, Daron M.; Nakagawa, Atsushi; Nakamura, Haruki

    2012-01-01

    The Protein Data Bank Japan (PDBj, http://pdbj.org) is a member of the worldwide Protein Data Bank (wwPDB) and accepts and processes the deposited data of experimentally determined macromolecular structures. While maintaining the archive in collaboration with other wwPDB partners, PDBj also provides a wide range of services and tools for analyzing structures and functions of proteins, which are summarized in this article. To enhance the interoperability of the PDB data, we have recently developed PDB/RDF, PDB data in the Resource Description Framework (RDF) format, along with its ontology in the Web Ontology Language (OWL) based on the PDB mmCIF Exchange Dictionary. Being in the standard format for the Semantic Web, the PDB/RDF data provide a means to integrate the PDB with other biological information resources. PMID:21976737

  1. 78 FR 40522 - Agency Information Collection Activities: Renewal of Currently Approved Collection; Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-07-05

    ... Reporting Data Dictionary (available electronically at https://www.federalreporting.gov/federalreporting... the Recipient Reporting Data Dictionary. Below are the basic reporting requirements to be reported on...

  2. High-recall protein entity recognition using a dictionary

    PubMed Central

    Kou, Zhenzhen; Cohen, William W.; Murphy, Robert F.

    2010-01-01

    Protein name extraction is an important step in mining biological literature. We describe two new methods for this task: semiCRFs and dictionary HMMs. SemiCRFs are a recently-proposed extension to conditional random fields that enables more effective use of dictionary information as features. Dictionary HMMs are a technique in which a dictionary is converted to a large HMM that recognizes phrases from the dictionary, as well as variations of these phrases. Standard training methods for HMMs can be used to learn which variants should be recognized. We compared the performance of our new approaches to that of Maximum Entropy (Max-Ent) and normal CRFs on three datasets, and improvement was obtained for all four methods over the best published results for two of the datasets. CRFs and semiCRFs achieved the highest overall performance according to the widely-used F-measure, while the dictionary HMMs performed the best at finding entities that actually appear in the dictionary—the measure of most interest in our intended application. PMID:15961466

  3. POLARIS: Helping Managers Get Answers Fast!

    NASA Technical Reports Server (NTRS)

    Corcoran, Patricia M.; Webster, Jeffery

    2007-01-01

    This viewgraph presentation reviews the Project Online Library and Resource Information System (POLARIS) system. It is NASA-wide, web-based system, providing access to information related to Program and Project Management. It will provide a one-stop shop for access to: a searchable, sortable database of all requirements for all product lines, project life cycle diagrams with reviews, project life cycle diagrams with reviews, project review definitions with products review information from NPR 7123.1, NASA Systems Engineering Processes and Requirements, templates and examples of products, project standard WBSs with dictionaries, and requirements for implementation and approval, information from NASA s Metadata Manager (MdM): Attributes of Missions, Themes, Programs & Projects, NPR7120.5 waiver form and instructions and much more. The presentation reviews the plans and timelines for future revisions and modifications.

  4. Pap Smear: MedlinePlus Lab Test Information

    MedlinePlus

    ... U.S. Department of Health and Human Services; NCI Dictionary of Cancer Terms: cervix; [cited 2017 Feb 3]; [ ... screens]. Available from: https://www.cancer.gov/publications/dictionaries/cancer-terms?cdrid=46133 National Cancer Institute [Internet]. ...

  5. Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification.

    PubMed

    Schuemie, Martijn J; Mons, Barend; Weeber, Marc; Kors, Jan A

    2007-06-01

    Gene and protein name identification in text requires a dictionary approach to relate synonyms to the same gene or protein, and to link names to external databases. However, existing dictionaries are incomplete. We investigate two complementary methods for automatic generation of a comprehensive dictionary: combination of information from existing gene and protein databases and rule-based generation of spelling variations. Both methods have been reported in literature before, but have hitherto not been combined and evaluated systematically. We combined gene and protein names from several existing databases of four different organisms. The combined dictionaries showed a substantial increase in recall on three different test sets, as compared to any single database. Application of 23 spelling variation rules to the combined dictionaries further increased recall. However, many rules appeared to have no effect and some appear to have a detrimental effect on precision.

  6. Plasma Dictionary Website

    NASA Astrophysics Data System (ADS)

    Correll, Don; Heeter, Robert; Alvarez, Mitch

    2000-10-01

    In response to many inquiries for a list of plasma terms, a database driven Plasma Dictionary website (plasmadictionary.llnl.gov) was created that allows users to submit new terms, search for specific terms or browse alphabetic listings. The Plasma Dictionary website contents began with the Fusion & Plasma Glossary terms available at the Fusion Energy Educational website (fusedweb.llnl.gov). Plasma researchers are encouraged to add terms and definitions. By clarifying the meanings of specific plasma terms, it is envisioned that the primary use of the Plasma Dictionary website will be by students, teachers, researchers, and writers for (1) Enhancing literacy in plasma science, (2) Serving as an educational aid, (3) Providing practical information, and (4) Helping clarify plasma writings. The Plasma Dictionary website has already proved useful in responding to a request from the CRC Press (www.crcpress.com) to add plasma terms to its CRC physics dictionary project (members.aol.com/physdict/).

  7. "It's Just Reflex Now": German Language Learners' Use of Online Resources

    ERIC Educational Resources Information Center

    Larson-Guenette, Julie

    2013-01-01

    This study examined how often and to what extent university learners of German use online resources (e.g., online dictionaries and translators) in relation to German coursework, their motivations for use, and their beliefs about online resources and language learning. Data for this study consisted of open-ended surveys ("n" = 71) and face-to-face…

  8. Automatic de-identification of textual documents in the electronic health record: a review of recent research

    PubMed Central

    2010-01-01

    Background In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. Methods This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. Results The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. Conclusions In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication. PMID:20678228

  9. Automatic de-identification of textual documents in the electronic health record: a review of recent research.

    PubMed

    Meystre, Stephane M; Friedlin, F Jeffrey; South, Brett R; Shen, Shuying; Samore, Matthew H

    2010-08-02

    In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.

  10. 78 FR 19333 - Agency Information Collection Activities: Renewal of Currently Approved Collection; Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-29

    ... to submit section 1512 data elements as set forth in the Recipient Reporting Data Dictionary... reported by prime recipients and sub-recipients are included in the Recipient Reporting Data Dictionary...

  11. 77 FR 75434 - Proposed Agency Information Collection Activities; Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-12-20

    ... Credit Card Data Collection Data Dictionary The Federal Reserve proposes to add 65 data items to the Domestic Credit Card Data Collection Data Dictionary schedule. 46 data items would be added at the account...

  12. Online Multi-Modal Robust Non-Negative Dictionary Learning for Visual Tracking

    PubMed Central

    Zhang, Xiang; Guan, Naiyang; Tao, Dacheng; Qiu, Xiaogang; Luo, Zhigang

    2015-01-01

    Dictionary learning is a method of acquiring a collection of atoms for subsequent signal representation. Due to its excellent representation ability, dictionary learning has been widely applied in multimedia and computer vision. However, conventional dictionary learning algorithms fail to deal with multi-modal datasets. In this paper, we propose an online multi-modal robust non-negative dictionary learning (OMRNDL) algorithm to overcome this deficiency. Notably, OMRNDL casts visual tracking as a dictionary learning problem under the particle filter framework and captures the intrinsic knowledge about the target from multiple visual modalities, e.g., pixel intensity and texture information. To this end, OMRNDL adaptively learns an individual dictionary, i.e., template, for each modality from available frames, and then represents new particles over all the learned dictionaries by minimizing the fitting loss of data based on M-estimation. The resultant representation coefficient can be viewed as the common semantic representation of particles across multiple modalities, and can be utilized to track the target. OMRNDL incrementally learns the dictionary and the coefficient of each particle by using multiplicative update rules to respectively guarantee their non-negativity constraints. Experimental results on a popular challenging video benchmark validate the effectiveness of OMRNDL for visual tracking in both quantity and quality. PMID:25961715

  13. Online multi-modal robust non-negative dictionary learning for visual tracking.

    PubMed

    Zhang, Xiang; Guan, Naiyang; Tao, Dacheng; Qiu, Xiaogang; Luo, Zhigang

    2015-01-01

    Dictionary learning is a method of acquiring a collection of atoms for subsequent signal representation. Due to its excellent representation ability, dictionary learning has been widely applied in multimedia and computer vision. However, conventional dictionary learning algorithms fail to deal with multi-modal datasets. In this paper, we propose an online multi-modal robust non-negative dictionary learning (OMRNDL) algorithm to overcome this deficiency. Notably, OMRNDL casts visual tracking as a dictionary learning problem under the particle filter framework and captures the intrinsic knowledge about the target from multiple visual modalities, e.g., pixel intensity and texture information. To this end, OMRNDL adaptively learns an individual dictionary, i.e., template, for each modality from available frames, and then represents new particles over all the learned dictionaries by minimizing the fitting loss of data based on M-estimation. The resultant representation coefficient can be viewed as the common semantic representation of particles across multiple modalities, and can be utilized to track the target. OMRNDL incrementally learns the dictionary and the coefficient of each particle by using multiplicative update rules to respectively guarantee their non-negativity constraints. Experimental results on a popular challenging video benchmark validate the effectiveness of OMRNDL for visual tracking in both quantity and quality.

  14. Reconstruction of magnetic resonance imaging by three-dimensional dual-dictionary learning.

    PubMed

    Song, Ying; Zhu, Zhen; Lu, Yang; Liu, Qiegen; Zhao, Jun

    2014-03-01

    To improve the magnetic resonance imaging (MRI) data acquisition speed while maintaining the reconstruction quality, a novel method is proposed for multislice MRI reconstruction from undersampled k-space data based on compressed-sensing theory using dictionary learning. There are two aspects to improve the reconstruction quality. One is that spatial correlation among slices is used by extending the atoms in dictionary learning from patches to blocks. The other is that the dictionary-learning scheme is used at two resolution levels; i.e., a low-resolution dictionary is used for sparse coding and a high-resolution dictionary is used for image updating. Numerical experiments are carried out on in vivo 3D MR images of brains and abdomens with a variety of undersampling schemes and ratios. The proposed method (dual-DLMRI) achieves better reconstruction quality than conventional reconstruction methods, with the peak signal-to-noise ratio being 7 dB higher. The advantages of the dual dictionaries are obvious compared with the single dictionary. Parameter variations ranging from 50% to 200% only bias the image quality within 15% in terms of the peak signal-to-noise ratio. Dual-DLMRI effectively uses the a priori information in the dual-dictionary scheme and provides dramatically improved reconstruction quality. Copyright © 2013 Wiley Periodicals, Inc.

  15. Discriminative object tracking via sparse representation and online dictionary learning.

    PubMed

    Xie, Yuan; Zhang, Wensheng; Li, Cuihua; Lin, Shuyang; Qu, Yanyun; Zhang, Yinghua

    2014-04-01

    We propose a robust tracking algorithm based on local sparse coding with discriminative dictionary learning and new keypoint matching schema. This algorithm consists of two parts: the local sparse coding with online updated discriminative dictionary for tracking (SOD part), and the keypoint matching refinement for enhancing the tracking performance (KP part). In the SOD part, the local image patches of the target object and background are represented by their sparse codes using an over-complete discriminative dictionary. Such discriminative dictionary, which encodes the information of both the foreground and the background, may provide more discriminative power. Furthermore, in order to adapt the dictionary to the variation of the foreground and background during the tracking, an online learning method is employed to update the dictionary. The KP part utilizes refined keypoint matching schema to improve the performance of the SOD. With the help of sparse representation and online updated discriminative dictionary, the KP part are more robust than the traditional method to reject the incorrect matches and eliminate the outliers. The proposed method is embedded into a Bayesian inference framework for visual tracking. Experimental results on several challenging video sequences demonstrate the effectiveness and robustness of our approach.

  16. Mapping the Future: Optimizing Joint Geospatial Engineering Support

    DTIC Science & Technology

    2006-05-16

    Environment. Maxwell Air Force Base, AL.: Air University, 1990. Babbage , Ross and Desmond Ball. Geographic Information Systems: Defence Applications...Joint Pub 4-04. Washington, DC: 27 September 2001. Wertz, Charles J. The Data Dictionary, Concepts and Uses. Wellesley, MA: QED Information...Force Defense Mapping for Future Operations, Washington, DC: September 1995, 1-7. 18 Charles J. Wertz, The Data Dictionary, Concepts and Uses

  17. Resources for Topics in Architecture.

    ERIC Educational Resources Information Center

    Van Noate, Judith, Comp.

    This guide for conducting library research on topics in architecture or on the work of a particular architect presents suggestions for utilizing four categories of resources: books, dictionaries and encyclopedias, indexes, and a periodicals and series list (PASL). Two topics are researched as examples: the contemporary architect Richard Meier, and…

  18. Study of Tools for Command and Telemetry Dictionaries

    NASA Technical Reports Server (NTRS)

    Pires, Craig; Knudson, Matthew D.

    2017-01-01

    The Command and Telemetry Dictionary is at the heart of space missions. The C&T Dictionary represents all of the information that is exchanged between the various systems both in space and on the ground. Large amounts of ever-changing information has to be disseminated to all for the various systems and sub-systems throughout all phases of the mission. The typical approach of having each sub-system manage it's own information flow, results in a patchwork of methods within a mission. This leads to significant duplication of effort and potential errors. More centralized methods have been developed to manage this data flow. This presentation will compare two tools that have been developed for this purpose, CCDD and SCIMI that were designed to work with the Core Flight System (cFS).

  19. USEEIO Satellite Tables

    EPA Pesticide Factsheets

    These files contain the environmental data as particular emissions or resources associated with a BEA sectors that are used in the USEEIO model. They are organized by the emission or resources type, as described in the manuscript. The main files (without SI) show the final satellite tables in the 'Exchanges' sheet which have emissions or resource use per USD for 2013. The other sheets in these files provide meta data for the create of the tables, including general information, sources, etc. The 'export' sheet is used for saving the satellite table for csv export. The data dictionary describes the fields in this sheet. The supporting files provide all the details data transformation and organization for the development of the satellite tables.This dataset is associated with the following publication:Yang, Y., W. Ingwersen, T. Hawkins, and D. Meyer. USEEIO: a New and Transparent United States Environmentally Extended Input-Output Model. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA,

  20. AdjScales: Visualizing Differences between Adjectives for Language Learners

    NASA Astrophysics Data System (ADS)

    Sheinman, Vera; Tokunaga, Takenobu

    In this study we introduce AdjScales, a method for scaling similar adjectives by their strength. It combines existing Web-based computational linguistic techniques in order to automatically differentiate between similar adjectives that describe the same property by strength. Though this kind of information is rarely present in most of the lexical resources and dictionaries, it may be useful for language learners that try to distinguish between similar words. Additionally, learners might gain from a simple visualization of these differences using unidimensional scales. The method is evaluated by comparison with annotation on a subset of adjectives from WordNet by four native English speakers. It is also compared against two non-native speakers of English. The collected annotation is an interesting resource in its own right. This work is a first step toward automatic differentiation of meaning between similar words for language learners. AdjScales can be useful for lexical resource enhancement.

  1. Part 6: The Literature of Inorganic Chemistry, Revised.

    ERIC Educational Resources Information Center

    Douville, Judith A.

    2002-01-01

    Presents a list of resources on inorganic chemistry that includes general surveys, nomenclature, dictionaries, handbooks, compilations, and treatises. Selected for use by academic and student chemists. (DDR)

  2. Ambiguity and variability of database and software names in bioinformatics.

    PubMed

    Duck, Geraint; Kovacevic, Aleksandar; Robertson, David L; Stevens, Robert; Nenadic, Goran

    2015-01-01

    There are numerous options available to achieve various tasks in bioinformatics, but until recently, there were no tools that could systematically identify mentions of databases and tools within the literature. In this paper we explore the variability and ambiguity of database and software name mentions and compare dictionary and machine learning approaches to their identification. Through the development and analysis of a corpus of 60 full-text documents manually annotated at the mention level, we report high variability and ambiguity in database and software mentions. On a test set of 25 full-text documents, a baseline dictionary look-up achieved an F-score of 46 %, highlighting not only variability and ambiguity but also the extensive number of new resources introduced. A machine learning approach achieved an F-score of 63 % (with precision of 74 %) and 70 % (with precision of 83 %) for strict and lenient matching respectively. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature. Our analyses show that identification of mentions of databases and tools is a challenging task that cannot be achieved by relying on current manually-curated resource repositories. Although machine learning shows improvement and promise (primarily in precision), more contextual information needs to be taken into account to achieve a good degree of accuracy.

  3. Tracking state deployments of commercial vehicle information systems and networks : 1998 Washington State report

    DOT National Transportation Integrated Search

    1999-12-01

    Volume III of the Logical Architecture contract deliverable documents the Data Dictionary. This formatted version of the Teamwork model data dictionary is mechanically produced from the Teamwork CDIF (Case Data Interchange Format) output file. It is ...

  4. Dictionary of Cotton

    USDA-ARS?s Scientific Manuscript database

    The Dictionary of Cotton has over 2,000 terms and definitions that were compiled by 33 researchers. It reflects the ongoing commitment of the International Cotton Advisory Committee, through its Technical Information Section, to the spread of knowledge about cotton to all those who have an interest ...

  5. Basic Aerospace Education Library

    ERIC Educational Resources Information Center

    Journal of Aerospace Education, 1975

    1975-01-01

    Lists the most significant resource items on aerospace education which are presently available. Includes source books, bibliographies, directories, encyclopedias, dictionaries, audiovisuals, curriculum/planning guides, aerospace statistics, aerospace education statistics and newsletters. (BR)

  6. Weakly supervised visual dictionary learning by harnessing image attributes.

    PubMed

    Gao, Yue; Ji, Rongrong; Liu, Wei; Dai, Qionghai; Hua, Gang

    2014-12-01

    Bag-of-features (BoFs) representation has been extensively applied to deal with various computer vision applications. To extract discriminative and descriptive BoF, one important step is to learn a good dictionary to minimize the quantization loss between local features and codewords. While most existing visual dictionary learning approaches are engaged with unsupervised feature quantization, the latest trend has turned to supervised learning by harnessing the semantic labels of images or regions. However, such labels are typically too expensive to acquire, which restricts the scalability of supervised dictionary learning approaches. In this paper, we propose to leverage image attributes to weakly supervise the dictionary learning procedure without requiring any actual labels. As a key contribution, our approach establishes a generative hidden Markov random field (HMRF), which models the quantized codewords as the observed states and the image attributes as the hidden states, respectively. Dictionary learning is then performed by supervised grouping the observed states, where the supervised information is stemmed from the hidden states of the HMRF. In such a way, the proposed dictionary learning approach incorporates the image attributes to learn a semantic-preserving BoF representation without any genuine supervision. Experiments in large-scale image retrieval and classification tasks corroborate that our approach significantly outperforms the state-of-the-art unsupervised dictionary learning approaches.

  7. The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank

    PubMed Central

    Westbrook, John D.; Shao, Chenghua; Feng, Zukang; Zhuravleva, Marina; Velankar, Sameer; Young, Jasmine

    2015-01-01

    Summary: The Chemical Component Dictionary (CCD) is a chemical reference data resource that describes all residue and small molecule components found in Protein Data Bank (PDB) entries. The CCD contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands and solvent molecules. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, chemical descriptors, systematic chemical names and idealized coordinates. The content, preparation, validation and distribution of this CCD chemical reference dataset are described. Availability and implementation: The CCD is updated regularly in conjunction with the scheduled weekly release of new PDB structure data. The CCD and amino acid variant reference datasets are hosted in the public PDB ftp repository at ftp://ftp.wwpdb.org/pub/pdb/data/monomers/components.cif.gz, ftp://ftp.wwpdb.org/pub/pdb/data/monomers/aa-variants-v1.cif.gz, and its mirror sites, and can be accessed from http://wwpdb.org. Contact: jwest@rcsb.rutgers.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25540181

  8. National Hydrocephalus Foundation

    MedlinePlus

    ... Types of Seizures About the Foundation Mission, History & Philosophy of NHF Treatment of Hydrocephalus What is a Shunt? Treatment Third Ventriculostomy Shunt Malfunction Prognosis and Research Medical Dictionary Resources Success Stories Blessing in Disguise ...

  9. Relational Database Design in Information Science Education.

    ERIC Educational Resources Information Center

    Brooks, Terrence A.

    1985-01-01

    Reports on database management system (dbms) applications designed by library school students for university community at University of Iowa. Three dbms design issues are examined: synthesis of relations, analysis of relations (normalization procedure), and data dictionary usage. Database planning prior to automation using data dictionary approach…

  10. Bengali-English Relevant Cross Lingual Information Access Using Finite Automata

    NASA Astrophysics Data System (ADS)

    Banerjee, Avishek; Bhattacharyya, Swapan; Hazra, Simanta; Mondal, Shatabdi

    2010-10-01

    CLIR techniques searches unrestricted texts and typically extract term and relationships from bilingual electronic dictionaries or bilingual text collections and use them to translate query and/or document representations into a compatible set of representations with a common feature set. In this paper, we focus on dictionary-based approach by using a bilingual data dictionary with a combination to statistics-based methods to avoid the problem of ambiguity also the development of human computer interface aspects of NLP (Natural Language processing) is the approach of this paper. The intelligent web search with regional language like Bengali is depending upon two major aspect that is CLIA (Cross language information access) and NLP. In our previous work with IIT, KGP we already developed content based CLIA where content based searching in trained on Bengali Corpora with the help of Bengali data dictionary. Here we want to introduce intelligent search because to recognize the sense of meaning of a sentence and it has a better real life approach towards human computer interactions.

  11. Brain tumor classification and segmentation using sparse coding and dictionary learning.

    PubMed

    Salman Al-Shaikhli, Saif Dawood; Yang, Michael Ying; Rosenhahn, Bodo

    2016-08-01

    This paper presents a novel fully automatic framework for multi-class brain tumor classification and segmentation using a sparse coding and dictionary learning method. The proposed framework consists of two steps: classification and segmentation. The classification of the brain tumors is based on brain topology and texture. The segmentation is based on voxel values of the image data. Using K-SVD, two types of dictionaries are learned from the training data and their associated ground truth segmentation: feature dictionary and voxel-wise coupled dictionaries. The feature dictionary consists of global image features (topological and texture features). The coupled dictionaries consist of coupled information: gray scale voxel values of the training image data and their associated label voxel values of the ground truth segmentation of the training data. For quantitative evaluation, the proposed framework is evaluated using different metrics. The segmentation results of the brain tumor segmentation (MICCAI-BraTS-2013) database are evaluated using five different metric scores, which are computed using the online evaluation tool provided by the BraTS-2013 challenge organizers. Experimental results demonstrate that the proposed approach achieves an accurate brain tumor classification and segmentation and outperforms the state-of-the-art methods.

  12. miRPathDB: a new dictionary on microRNAs and target pathways.

    PubMed

    Backes, Christina; Kehl, Tim; Stöckel, Daniel; Fehlmann, Tobias; Schneider, Lara; Meese, Eckart; Lenhof, Hans-Peter; Keller, Andreas

    2017-01-04

    In the last decade, miRNAs and their regulatory mechanisms have been intensively studied and many tools for the analysis of miRNAs and their targets have been developed. We previously presented a dictionary on single miRNAs and their putative target pathways. Since then, the number of miRNAs has tripled and the knowledge on miRNAs and targets has grown substantially. This, along with changes in pathway resources such as KEGG, leads to an improved understanding of miRNAs, their target genes and related pathways. Here, we introduce the miRNA Pathway Dictionary Database (miRPathDB), freely accessible at https://mpd.bioinf.uni-sb.de/ With the database we aim to complement available target pathway web-servers by providing researchers easy access to the information which pathways are regulated by a miRNA, which miRNAs target a pathway and how specific these regulations are. The database contains a large number of miRNAs (2595 human miRNAs), different miRNA target sets (14 773 experimentally validated target genes as well as 19 281 predicted targets genes) and a broad selection of functional biochemical categories (KEGG-, WikiPathways-, BioCarta-, SMPDB-, PID-, Reactome pathways, functional categories from gene ontology (GO), protein families from Pfam and chromosomal locations totaling 12 875 categories). In addition to Homo sapiens, also Mus musculus data are stored and can be compared to human target pathways. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Incoherent dictionary learning for reducing crosstalk noise in least-squares reverse time migration

    NASA Astrophysics Data System (ADS)

    Wu, Juan; Bai, Min

    2018-05-01

    We propose to apply a novel incoherent dictionary learning (IDL) algorithm for regularizing the least-squares inversion in seismic imaging. The IDL is proposed to overcome the drawback of traditional dictionary learning algorithm in losing partial texture information. Firstly, the noisy image is divided into overlapped image patches, and some random patches are extracted for dictionary learning. Then, we apply the IDL technology to minimize the coherency between atoms during dictionary learning. Finally, the sparse representation problem is solved by a sparse coding algorithm, and image is restored by those sparse coefficients. By reducing the correlation among atoms, it is possible to preserve most of the small-scale features in the image while removing much of the long-wavelength noise. The application of the IDL method to regularization of seismic images from least-squares reverse time migration shows successful performance.

  14. Automatic Dictionary Expansion Using Non-parallel Corpora

    NASA Astrophysics Data System (ADS)

    Rapp, Reinhard; Zock, Michael

    Automatically generating bilingual dictionaries from parallel, manually translated texts is a well established technique that works well in practice. However, parallel texts are a scarce resource. Therefore, it is desirable also to be able to generate dictionaries from pairs of comparable monolingual corpora. For most languages, such corpora are much easier to acquire, and often in considerably larger quantities. In this paper we present the implementation of an algorithm which exploits such corpora with good success. Based on the assumption that the co-occurrence patterns between different languages are related, it expands a small base lexicon. For improved performance, it also realizes a novel interlingua approach. That is, if corpora of more than two languages are available, the translations from one language to another can be determined not only directly, but also indirectly via a pivot language.

  15. Data Element Dictionary: Finance. A Technical Report Concerning Finance Related Data Elements in the WICHE Management Information Systems Program. First Edition.

    ERIC Educational Resources Information Center

    Thomas, Charles R.

    This document is one of the 5 sections of the Data Element Dictionary developed as part of the WICHE Management Information Systems (MIS) Program. The elements in this section apply to both the current and historical data concerning finance. The purpose of the WICHE MIS Program is to make it possible to derive data which will be truly comparable…

  16. Using UMLS to construct a generalized hierarchical concept-based dictionary of brain functions for information extraction from the fMRI literature.

    PubMed

    Hsiao, Mei-Yu; Chen, Chien-Chung; Chen, Jyh-Horng

    2009-10-01

    With a rapid progress in the field, a great many fMRI studies are published every year, to the extent that it is now becoming difficult for researchers to keep up with the literature, since reading papers is extremely time-consuming and labor-intensive. Thus, automatic information extraction has become an important issue. In this study, we used the Unified Medical Language System (UMLS) to construct a hierarchical concept-based dictionary of brain functions. To the best of our knowledge, this is the first generalized dictionary of this kind. We also developed an information extraction system for recognizing, mapping and classifying terms relevant to human brain study. The precision and recall of our system was on a par with that of human experts in term recognition, term mapping and term classification. Our approach presented in this paper presents an alternative to the more laborious, manual entry approach to information extraction.

  17. Image fusion via nonlocal sparse K-SVD dictionary learning.

    PubMed

    Li, Ying; Li, Fangyi; Bai, Bendu; Shen, Qiang

    2016-03-01

    Image fusion aims to merge two or more images captured via various sensors of the same scene to construct a more informative image by integrating their details. Generally, such integration is achieved through the manipulation of the representations of the images concerned. Sparse representation plays an important role in the effective description of images, offering a great potential in a variety of image processing tasks, including image fusion. Supported by sparse representation, in this paper, an approach for image fusion by the use of a novel dictionary learning scheme is proposed. The nonlocal self-similarity property of the images is exploited, not only at the stage of learning the underlying description dictionary but during the process of image fusion. In particular, the property of nonlocal self-similarity is combined with the traditional sparse dictionary. This results in an improved learned dictionary, hereafter referred to as the nonlocal sparse K-SVD dictionary (where K-SVD stands for the K times singular value decomposition that is commonly used in the literature), and abbreviated to NL_SK_SVD. The performance of the NL_SK_SVD dictionary is applied for image fusion using simultaneous orthogonal matching pursuit. The proposed approach is evaluated with different types of images, and compared with a number of alternative image fusion techniques. The resultant superior fused images using the present approach demonstrates the efficacy of the NL_SK_SVD dictionary in sparse image representation.

  18. Jobs You Won't Find in the Dictionary of Occupational Titles.

    ERIC Educational Resources Information Center

    Salomone, Paul R.; Helmstetter, Christopher L.

    1992-01-01

    Written in whimsical style, this article points out the value-laden nature of the "Dictionary of Occupational Titles," the premier source of occupational information. Other than legal occupations, classifies jobs that are remunerative into three categories: (1) illegal occupations (drug dealer, hitperson); (2) underground or…

  19. Blackfoot Dictionary of Stems, Roots, and Affixes. Second Edition.

    ERIC Educational Resources Information Center

    Frantz, Donald G.; Russell, Norma Jean

    The dictionary of stems, roots, and affixes for the Blackfoot language provides, for each entry, information on the item's morphological type (e.g., noun stem, verb stem, root), subclassification if relevant, English index, and certain diagnostic inflectional forms (full words or sentences), each with an English translation. In addition, entries…

  20. Panel A report: Standards needed to interconnect ADS pilots for data sharing for catalogues, directories, and dictionaries

    NASA Technical Reports Server (NTRS)

    1981-01-01

    User requirements, guidelines, and standards for interconnecting an Applications Data Service (ADS) program for data sharing are discussed. Methods for effective sharing of information (catalogues, directories, and dictionaries) among member installations are addressed. An ADS Directory/Catalog architectural model is also given.

  1. Assigning categorical information to Japanese medical terms using MeSH and MEDLINE.

    PubMed

    Onogi, Yuzo

    2007-01-01

    This paper reports on the assigning of MeSH (Medical Subject Headings) categories to Japanese terms in an English-Japanese dictionary using the titles and abstracts of articles indexed in MEDLINE. In a previous study, 30,000 of 80,000 terms in the dictionary were mapped to MeSH terms by normalized comparison. It was reasoned that if the remaining dictionary terms appeared in MEDLINE-indexed articles that are indexed using MeSH terms, then relevancies between the dictionary terms and MeSH terms could be calculated, and thus MeSH categories assigned. This study compares two approaches for calculating the weight matrix. One is the TF*IDF method and the other uses the inner product of two weight matrices. About 20,000 additional dictionary terms were identified in MEDLINE-indexed articles published between 2000 and 2004. The precision and recall of these algorithms were evaluated separately for MeSH terms and non-MeSH terms. Unfortunately, the precision and recall of the algorithms was not good, but this method will help with manual assignment of MeSH categories to dictionary terms.

  2. Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion

    PubMed Central

    Jonnagaddala, Jitendra; Jue, Toni Rose; Chang, Nai-Wen; Dai, Hong-Jie

    2016-01-01

    The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditional random fields (CRFs) and dictionary lookup method are widely used for named entity recognition and normalization respectively. We herein developed a CRF-based model to allow automated recognition of disease mentions, and studied the effect of various techniques in improving the normalization results based on the dictionary lookup approach. The dataset from the BioCreative V CDR track was used to report the performance of the developed normalization methods and compare with other existing dictionary lookup based normalization methods. The best configuration achieved an F-measure of 0.77 for the disease normalization, which outperformed the best dictionary lookup based baseline method studied in this work by an F-measure of 0.13. Database URL: https://github.com/TCRNBioinformatics/DiseaseExtract PMID:27504009

  3. Robust low-dose dynamic cerebral perfusion CT image restoration via coupled dictionary learning scheme.

    PubMed

    Tian, Xiumei; Zeng, Dong; Zhang, Shanli; Huang, Jing; Zhang, Hua; He, Ji; Lu, Lijun; Xi, Weiwen; Ma, Jianhua; Bian, Zhaoying

    2016-11-22

    Dynamic cerebral perfusion x-ray computed tomography (PCT) imaging has been advocated to quantitatively and qualitatively assess hemodynamic parameters in the diagnosis of acute stroke or chronic cerebrovascular diseases. However, the associated radiation dose is a significant concern to patients due to its dynamic scan protocol. To address this issue, in this paper we propose an image restoration method by utilizing coupled dictionary learning (CDL) scheme to yield clinically acceptable PCT images with low-dose data acquisition. Specifically, in the present CDL scheme, the 2D background information from the average of the baseline time frames of low-dose unenhanced CT images and the 3D enhancement information from normal-dose sequential cerebral PCT images are exploited to train the dictionary atoms respectively. After getting the two trained dictionaries, we couple them to represent the desired PCT images as spatio-temporal prior in objective function construction. Finally, the low-dose dynamic cerebral PCT images are restored by using a general DL image processing. To get a robust solution, the objective function is solved by using a modified dictionary learning based image restoration algorithm. The experimental results on clinical data show that the present method can yield more accurate kinetic enhanced details and diagnostic hemodynamic parameter maps than the state-of-the-art methods.

  4. Super resolution reconstruction of infrared images based on classified dictionary learning

    NASA Astrophysics Data System (ADS)

    Liu, Fei; Han, Pingli; Wang, Yi; Li, Xuan; Bai, Lu; Shao, Xiaopeng

    2018-05-01

    Infrared images always suffer from low-resolution problems resulting from limitations of imaging devices. An economical approach to combat this problem involves reconstructing high-resolution images by reasonable methods without updating devices. Inspired by compressed sensing theory, this study presents and demonstrates a Classified Dictionary Learning method to reconstruct high-resolution infrared images. It classifies features of the samples into several reasonable clusters and trained a dictionary pair for each cluster. The optimal pair of dictionaries is chosen for each image reconstruction and therefore, more satisfactory results is achieved without the increase in computational complexity and time cost. Experiments and results demonstrated that it is a viable method for infrared images reconstruction since it improves image resolution and recovers detailed information of targets.

  5. Applications of the generalized information processing system (GIPSY)

    USGS Publications Warehouse

    Moody, D.W.; Kays, Olaf

    1972-01-01

    The Generalized Information Processing System (GIPSY) stores and retrieves variable-field, variable-length records consisting of numeric data, textual data, or codes. A particularly noteworthy feature of GIPSY is its ability to search records for words, word stems, prefixes, and suffixes as well as for numeric values. Moreover, retrieved records may be printed on pre-defined formats or formatted as fixed-field, fixed-length records for direct input to other-programs, which facilitates the exchange of data with other systems. At present there are some 22 applications of GIPSY falling in the general areas of bibliography, natural resources information, and management science, This report presents a description of each application including a sample input form, dictionary, and a typical formatted record. It is hoped that these examples will stimulate others to experiment with innovative uses of computer technology.

  6. Weakly Supervised Dictionary Learning

    NASA Astrophysics Data System (ADS)

    You, Zeyu; Raich, Raviv; Fern, Xiaoli Z.; Kim, Jinsub

    2018-05-01

    We present a probabilistic modeling and inference framework for discriminative analysis dictionary learning under a weak supervision setting. Dictionary learning approaches have been widely used for tasks such as low-level signal denoising and restoration as well as high-level classification tasks, which can be applied to audio and image analysis. Synthesis dictionary learning aims at jointly learning a dictionary and corresponding sparse coefficients to provide accurate data representation. This approach is useful for denoising and signal restoration, but may lead to sub-optimal classification performance. By contrast, analysis dictionary learning provides a transform that maps data to a sparse discriminative representation suitable for classification. We consider the problem of analysis dictionary learning for time-series data under a weak supervision setting in which signals are assigned with a global label instead of an instantaneous label signal. We propose a discriminative probabilistic model that incorporates both label information and sparsity constraints on the underlying latent instantaneous label signal using cardinality control. We present the expectation maximization (EM) procedure for maximum likelihood estimation (MLE) of the proposed model. To facilitate a computationally efficient E-step, we propose both a chain and a novel tree graph reformulation of the graphical model. The performance of the proposed model is demonstrated on both synthetic and real-world data.

  7. Foreign Objects in the Rectum

    MedlinePlus

    ... Resources In This Article Medical Dictionary Also of Interest (Quiz) Anal Fissure (News) Could a Common Blood Thinner Lower Cancer Risk? (News) Study Untangles Disparity in Colon Cancer Survival Rates (News) Poor Prognosis for Diabetic Foot Sores (News) ...

  8. Space platform expendables resupply concept definition study. Volume 3: Work breakdown structure and work breakdown structure dictionary

    NASA Technical Reports Server (NTRS)

    1984-01-01

    The work breakdown structure (WBS) for the Space Platform Expendables Resupply Concept Definition Study is described. The WBS consists of a list of WBS elements, a dictionary of element definitions, and an element logic diagram. The list and logic diagram identify the interrelationships of the elements. The dictionary defines the types of work that may be represented by or be classified under each specific element. The Space Platform Expendable Resupply WBS was selected mainly to support the program planning, scheduling, and costing performed in the programmatics task (task 3). The WBS is neither a statement-of-work nor a work authorization document. Rather, it is a framework around which to define requirements, plan effort, assign responsibilities, allocate and control resources, and report progress, expenditures, technical performance, and schedule performance. The WBS element definitions are independent of make-or-buy decisions, organizational structure, and activity locations unless exceptions are specifically stated.

  9. A Concise Dictionary of Minnesota Ojibwe.

    ERIC Educational Resources Information Center

    Nichols, John D.; Nyholm, Earl

    The dictionary of the Ojibwa or Chippewa language represents the speech of the Mille Lacs Band of Minnesota and contains over 7,000 Ojibwa terms. Each entry gives information on the word stem, grammatical classification, English gloss, form variations, and references to alternate forms. An introductory section describes the entry format and use,…

  10. Kinship in Mongolian Sign Language

    ERIC Educational Resources Information Center

    Geer, Leah

    2011-01-01

    Information and research on Mongolian Sign Language is scant. To date, only one dictionary is available in the United States (Badnaa and Boll 1995), and even that dictionary presents only a subset of the signs employed in Mongolia. The present study describes the kinship system used in Mongolian Sign Language (MSL) based on data elicited from…

  11. Relaxations to Sparse Optimization Problems and Applications

    NASA Astrophysics Data System (ADS)

    Skau, Erik West

    Parsimony is a fundamental property that is applied to many characteristics in a variety of fields. Of particular interest are optimization problems that apply rank, dimensionality, or support in a parsimonious manner. In this thesis we study some optimization problems and their relaxations, and focus on properties and qualities of the solutions of these problems. The Gramian tensor decomposition problem attempts to decompose a symmetric tensor as a sum of rank one tensors.We approach the Gramian tensor decomposition problem with a relaxation to a semidefinite program. We study conditions which ensure that the solution of the relaxed semidefinite problem gives the minimal Gramian rank decomposition. Sparse representations with learned dictionaries are one of the leading image modeling techniques for image restoration. When learning these dictionaries from a set of training images, the sparsity parameter of the dictionary learning algorithm strongly influences the content of the dictionary atoms.We describe geometrically the content of trained dictionaries and how it changes with the sparsity parameter.We use statistical analysis to characterize how the different content is used in sparse representations. Finally, a method to control the structure of the dictionaries is demonstrated, allowing us to learn a dictionary which can later be tailored for specific applications. Variations of dictionary learning can be broadly applied to a variety of applications.We explore a pansharpening problem with a triple factorization variant of coupled dictionary learning. Another application of dictionary learning is computer vision. Computer vision relies heavily on object detection, which we explore with a hierarchical convolutional dictionary learning model. Data fusion of disparate modalities is a growing topic of interest.We do a case study to demonstrate the benefit of using social media data with satellite imagery to estimate hazard extents. In this case study analysis we apply a maximum entropy model, guided by the social media data, to estimate the flooded regions during a 2013 flood in Boulder, CO and show that the results are comparable to those obtained using expert information.

  12. Fast group matching for MR fingerprinting reconstruction.

    PubMed

    Cauley, Stephen F; Setsompop, Kawin; Ma, Dan; Jiang, Yun; Ye, Huihui; Adalsteinsson, Elfar; Griswold, Mark A; Wald, Lawrence L

    2015-08-01

    MR fingerprinting (MRF) is a technique for quantitative tissue mapping using pseudorandom measurements. To estimate tissue properties such as T1 , T2 , proton density, and B0 , the rapidly acquired data are compared against a large dictionary of Bloch simulations. This matching process can be a very computationally demanding portion of MRF reconstruction. We introduce a fast group matching algorithm (GRM) that exploits inherent correlation within MRF dictionaries to create highly clustered groupings of the elements. During matching, a group specific signature is first used to remove poor matching possibilities. Group principal component analysis (PCA) is used to evaluate all remaining tissue types. In vivo 3 Tesla brain data were used to validate the accuracy of our approach. For a trueFISP sequence with over 196,000 dictionary elements, 1000 MRF samples, and image matrix of 128 × 128, GRM was able to map MR parameters within 2s using standard vendor computational resources. This is an order of magnitude faster than global PCA and nearly two orders of magnitude faster than direct matching, with comparable accuracy (1-2% relative error). The proposed GRM method is a highly efficient model reduction technique for MRF matching and should enable clinically relevant reconstruction accuracy and time on standard vendor computational resources. © 2014 Wiley Periodicals, Inc.

  13. Thriving in the Information Age: Why Marine Corps Commanders Must Maximize Their Existing Information Related Capabilities

    DTIC Science & Technology

    2016-04-20

    skills within a business that make it more valuable or competitive, https://dictionary.cambridge.org/us/dictionary/ english /skill. Examples of knowledge ...is a lack of standardization – from the tactical to the strategic level – with integrating and employing IRCs and these SMEs. This along with other...organization strategic goals and objectives, and/or a lack of senior leader advocacy. Practical approaches such as the IRCs Integration Continuum acts as a

  14. Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion.

    PubMed

    Jonnagaddala, Jitendra; Jue, Toni Rose; Chang, Nai-Wen; Dai, Hong-Jie

    2016-01-01

    The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditional random fields (CRFs) and dictionary lookup method are widely used for named entity recognition and normalization respectively. We herein developed a CRF-based model to allow automated recognition of disease mentions, and studied the effect of various techniques in improving the normalization results based on the dictionary lookup approach. The dataset from the BioCreative V CDR track was used to report the performance of the developed normalization methods and compare with other existing dictionary lookup based normalization methods. The best configuration achieved an F-measure of 0.77 for the disease normalization, which outperformed the best dictionary lookup based baseline method studied in this work by an F-measure of 0.13.Database URL: https://github.com/TCRNBioinformatics/DiseaseExtract. © The Author(s) 2016. Published by Oxford University Press.

  15. [To understand the words of cancer: Lexonco, dictionary of oncology SOR SAVOIR PATIENT for patients and relatives. Methodological aspects].

    PubMed

    Delavigne, V; Carretier, J; Brusco, Sylvie; Leichtnam-Dugarin, L; Déchelette, M; Philip, Thierry

    2007-04-01

    In response to the evolution of the information-seeking behaviour of patients and concerns from health professionals regarding cancer patient information, the French National Federation of Comprehensive Cancer Centres (FNCLCC) introduced, in 1998, an information and education program dedicated to patients and relatives, the SOR SAVOIR PATIENT program. Lexonco project is a dictionary on oncology adapted for patients and relatives and validated by medical experts and cancer patients. This paper describes the methodological aspects which take into account patients and experts' perspectives to produce the defi nitions.

  16. An Extensible, User- Modifiable Framework for Planning Activities

    NASA Technical Reports Server (NTRS)

    Joshing, Joseph C.; Abramyan, Lucy; Mickelson, Megan C.; Wallick, Michael N.; Kurien, James A.; Crockett, Thomasa M.; Powell, Mark W.; Pyrzak, Guy; Aghevli, Arash

    2013-01-01

    This software provides a development framework that allows planning activities for the Mars Science Laboratory rover to be altered at any time, based on changes of the Activity Dictionary. The Activity Dictionary contains the definition of all activities that can be carried out by a particular asset (robotic or human). These definitions (and combinations of these definitions) are used by mission planners to give a daily plan of what a mission should do. During the development and course of the mission, the Activity Dictionary and actions that are going to be carried out will often be changed. Previously, such changes would require a change to the software and redeployment. Now, the Activity Dictionary authors are able to customize activity definitions, parameters, and resource usage without requiring redeployment. This software provides developers and end users the ability to modify the behavior of automatically generated activities using a script. This allows changes to the software behavior without incurring the burden of redeployment. This software is currently being used for the Mars Science Laboratory, and is in the process of being integrated into the LADEE (Lunar Atmosphere and Dust Environment Explorer) mission, as well as the International Space Station.

  17. A Remote Sensing Image Fusion Method based on adaptive dictionary learning

    NASA Astrophysics Data System (ADS)

    He, Tongdi; Che, Zongxi

    2018-01-01

    This paper discusses using a remote sensing fusion method, based on' adaptive sparse representation (ASP)', to provide improved spectral information, reduce data redundancy and decrease system complexity. First, the training sample set is formed by taking random blocks from the images to be fused, the dictionary is then constructed using the training samples, and the remaining terms are clustered to obtain the complete dictionary by iterated processing at each step. Second, the self-adaptive weighted coefficient rule of regional energy is used to select the feature fusion coefficients and complete the reconstruction of the image blocks. Finally, the reconstructed image blocks are rearranged and an average is taken to obtain the final fused images. Experimental results show that the proposed method is superior to other traditional remote sensing image fusion methods in both spectral information preservation and spatial resolution.

  18. Travel time tomography with local image regularization by sparsity constrained dictionary learning

    NASA Astrophysics Data System (ADS)

    Bianco, M.; Gerstoft, P.

    2017-12-01

    We propose a regularization approach for 2D seismic travel time tomography which models small rectangular groups of slowness pixels, within an overall or `global' slowness image, as sparse linear combinations of atoms from a dictionary. The groups of slowness pixels are referred to as patches and a dictionary corresponds to a collection of functions or `atoms' describing the slowness in each patch. These functions could for example be wavelets.The patch regularization is incorporated into the global slowness image. The global image models the broad features, while the local patch images incorporate prior information from the dictionary. Further, high resolution slowness within patches is permitted if the travel times from the global estimates support it. The proposed approach is formulated as an algorithm, which is repeated until convergence is achieved: 1) From travel times, find the global slowness image with a minimum energy constraint on the pixel variance relative to a reference. 2) Find the patch level solutions to fit the global estimate as a sparse linear combination of dictionary atoms.3) Update the reference as the weighted average of the patch level solutions.This approach relies on the redundancy of the patches in the seismic image. Redundancy means that the patches are repetitions of a finite number of patterns, which are described by the dictionary atoms. Redundancy in the earth's structure was demonstrated in previous works in seismics where dictionaries of wavelet functions regularized inversion. We further exploit redundancy of the patches by using dictionary learning algorithms, a form of unsupervised machine learning, to estimate optimal dictionaries from the data in parallel with the inversion. We demonstrate our approach on densely, but irregularly sampled synthetic seismic images.

  19. Reading L2 Russian: The Challenges of the Russian-English Dictionary

    ERIC Educational Resources Information Center

    Comer, William J.

    2014-01-01

    This descriptive study examines when and how students use Russian-English dictionaries while reading informational texts in Russian and what success they have with word lookup. The study uses introspective verbal protocols (i.e., think-alouds) to follow how readers construct meaning from two texts while reading them for a limited time first…

  20. Criteria Affecting Pre-Service TESOLTeachers' Attitudes towards Using CD-ROM Dictionaries

    ERIC Educational Resources Information Center

    Issa, Jinan Hatem; Jamil, Hazri

    2012-01-01

    Undoubtedly, CD-ROM dictionaries assist in removing the fear of inadequacy in language usage and ameliorating the panic of being-on-the-spot especially with the rapid development of information technology. This research was conducted to find out whether there is a relationship between pre-service TESOL teachers' attitudes towards using CD-ROM…

  1. A hospital-wide clinical findings dictionary based on an extension of the International Classification of Diseases (ICD).

    PubMed

    Bréant, C; Borst, F; Campi, D; Griesser, V; Momjian, S

    1999-01-01

    The use of a controlled vocabulary set in a hospital-wide clinical information system is of crucial importance for many departmental database systems to communicate and exchange information. In the absence of an internationally recognized clinical controlled vocabulary set, a new extension of the International statistical Classification of Diseases (ICD) is proposed. It expands the scope of the standard ICD beyond diagnosis and procedures to clinical terminology. In addition, the common Clinical Findings Dictionary (CFD) further records the definition of clinical entities. The construction of the vocabulary set and the CFD is incremental and manual. Tools have been implemented to facilitate the tasks of defining/maintaining/publishing dictionary versions. The design of database applications in the integrated clinical information system is driven by the CFD which is part of the Medical Questionnaire Designer tool. Several integrated clinical database applications in the field of diabetes and neuro-surgery have been developed at the HUG.

  2. A hospital-wide clinical findings dictionary based on an extension of the International Classification of Diseases (ICD).

    PubMed Central

    Bréant, C.; Borst, F.; Campi, D.; Griesser, V.; Momjian, S.

    1999-01-01

    The use of a controlled vocabulary set in a hospital-wide clinical information system is of crucial importance for many departmental database systems to communicate and exchange information. In the absence of an internationally recognized clinical controlled vocabulary set, a new extension of the International statistical Classification of Diseases (ICD) is proposed. It expands the scope of the standard ICD beyond diagnosis and procedures to clinical terminology. In addition, the common Clinical Findings Dictionary (CFD) further records the definition of clinical entities. The construction of the vocabulary set and the CFD is incremental and manual. Tools have been implemented to facilitate the tasks of defining/maintaining/publishing dictionary versions. The design of database applications in the integrated clinical information system is driven by the CFD which is part of the Medical Questionnaire Designer tool. Several integrated clinical database applications in the field of diabetes and neuro-surgery have been developed at the HUG. Images Figure 1 PMID:10566451

  3. PDQ® Cancer Information

    Cancer.gov

    An NCI database that contains the latest information about cancer treatment, screening, prevention, genetics, supportive care, and complementary and alternative medicine, plus drug information and a dictionary of cancer terms.

  4. A practical implementation for a data dictionary in an environment of diverse data sets

    USGS Publications Warehouse

    Sprenger, Karla K.; Larsen, Dana M.

    1993-01-01

    The need for a data dictionary database at the U.S. Geological Survey's EROS Data Center (EDC) was reinforced with the Earth Observing System Data and Information System (EOSDIS) requirement for consistent field definitions of data sets residing at more than one archive center. The EDC requirement addresses the existence of multiple sets with identical field definitions using various naming conventions. The EDC is developing a data dictionary database to accomplish the following foals: to standardize field names for ease in software development; to facilitate querying and updating of the date; and to generate ad hoc reports. The structure of the EDC electronic data dictionary database supports different metadata systems as well as many different data sets. A series of reports is used to keep consistency among data sets and various metadata systems.

  5. Publications

    Cancer.gov

    Information about NCI publications including PDQ cancer information for patients and health professionals, patient-education publications, fact sheets, dictionaries, NCI blogs and newsletters and major reports.

  6. Joint Dictionary Learning for Multispectral Change Detection.

    PubMed

    Lu, Xiaoqiang; Yuan, Yuan; Zheng, Xiangtao

    2017-04-01

    Change detection is one of the most important applications of remote sensing technology. It is a challenging task due to the obvious variations in the radiometric value of spectral signature and the limited capability of utilizing spectral information. In this paper, an improved sparse coding method for change detection is proposed. The intuition of the proposed method is that unchanged pixels in different images can be well reconstructed by the joint dictionary, which corresponds to knowledge of unchanged pixels, while changed pixels cannot. First, a query image pair is projected onto the joint dictionary to constitute the knowledge of unchanged pixels. Then reconstruction error is obtained to discriminate between the changed and unchanged pixels in the different images. To select the proper thresholds for determining changed regions, an automatic threshold selection strategy is presented by minimizing the reconstruction errors of the changed pixels. Adequate experiments on multispectral data have been tested, and the experimental results compared with the state-of-the-art methods prove the superiority of the proposed method. Contributions of the proposed method can be summarized as follows: 1) joint dictionary learning is proposed to explore the intrinsic information of different images for change detection. In this case, change detection can be transformed as a sparse representation problem. To the authors' knowledge, few publications utilize joint learning dictionary in change detection; 2) an automatic threshold selection strategy is presented, which minimizes the reconstruction errors of the changed pixels without the prior assumption of the spectral signature. As a result, the threshold value provided by the proposed method can adapt to different data due to the characteristic of joint dictionary learning; and 3) the proposed method makes no prior assumption of the modeling and the handling of the spectral signature, which can be adapted to different data.

  7. Computer Science and Technology: A Survey of Eleven Government-Developed Data Element Dictionary/Directory Systems.

    ERIC Educational Resources Information Center

    National Bureau of Standards (DOC), Washington, DC. Inst. for Computer Sciences and Technology.

    This report presents the current state of the art of government developed Data Element Dictionary/Directory (DED/D) systems. DED/D's are software tools used for managing and controlling information and data. The introduction of the report includes a list of the government agency systems surveyed and a summary matrix presenting each system's…

  8. DICONALE: A Novel German-Spanish Onomasiological Lexicographical Model Involving Paradigmatic and Syntagmatic Information

    ERIC Educational Resources Information Center

    Sánchez Hernández, Paloma

    2016-01-01

    This contribution, based on the DICONALE ON LINE (FFI2012-32658) and COMBIDIGILEX (FFI2015-64476-P) research projects, aims to create an onomasiological bilingual dictionary with online access for German and Spanish verbal lexemes. The objective of this work is to present the most relevant contributions of the dictionary based on two lexemes from…

  9. The English Monolingual Dictionary: Its Use among Second Year Students of University Technology of Malaysia, International Campus, Kuala Lumpur

    ERIC Educational Resources Information Center

    Manan, Amerrudin Abd.; Al-Zubaidi, Khairi Obaid

    2011-01-01

    This research was conducted to seek information on English Monolingual Dictionary (EMD) use among 2nd year students of Universiti Teknologi Malaysia, International Campus, Kuala Lumpur (UTMKL). Specifically, the researchers wish to discover, firstly, the students' habit and attitude in EMD use; secondly, to discover their knowledge with regard to…

  10. Arabic Information Retrieval at UMass in TREC-10

    DTIC Science & Technology

    2006-01-01

    electronic bilingual dictionaries , and stemmers, and our unfamiliarity with Arabic, we had our hands full carrying out some standard approaches to... monolingual and cross-lan- guage Arabic retrieval, and did not submit any runs based on novel approaches. We submitted three monolingual runs and one... dictionary construction, expanded Arabic queries, improved estimation and smoothing in language models, and added combination of evidence, increasing

  11. The HLA dictionary 2008: a summary of HLA-A, -B, -C, -DRB1/3/4/5, and -DQB1 alleles and their association with serologically defined HLA-A, -B, -C, -DR, and -DQ antigens.

    PubMed

    Holdsworth, R; Hurley, C K; Marsh, S G E; Lau, M; Noreen, H J; Kempenich, J H; Setterholm, M; Maiers, M

    2009-02-01

    The 2008 report of the human leukocyte antigen (HLA) data dictionary presents serologic equivalents of HLA-A, -B, -C, -DRB1, -DRB3, -DRB4, -DRB5, and -DQB1 alleles. The dictionary is an update of the one published in 2004. The data summarize equivalents obtained by the World Health Organization Nomenclature Committee for Factors of the HLA System, the International Cell Exchange, UCLA, the National Marrow Donor Program, recent publications, and individual laboratories. The 2008 edition includes information on 832 new alleles (685 class I and 147 class II) and updated information on 766 previously listed alleles (577 class I and 189 class II). The tables list the alleles with remarks on the serologic patterns and the equivalents. The serological equivalents are listed as expert assigned types, and the data are useful for identifying potential stem cell donors who were typed by either serology or DNA-based methods. The tables with HLA equivalents are available as a searchable form on the IMGT/HLA database Web site (http://www.ebi.ac.uk/imgt/hla/dictionary.html).

  12. Occupations, U. S. A.

    ERIC Educational Resources Information Center

    Geneva Area City Schools, OH.

    The booklet divides job titles, selected from the Dictionary of Occupational Titles, into 15 career clusters: agribusiness and natural resources, business and office education, communication and media, construction, consumer and home economics, fine arts and humanities, health occupations, hospitality and recreation, manufacturing, marine science,…

  13. Document image database indexing with pictorial dictionary

    NASA Astrophysics Data System (ADS)

    Akbari, Mohammad; Azimi, Reza

    2010-02-01

    In this paper we introduce a new approach for information retrieval from Persian document image database without using Optical Character Recognition (OCR).At first an attribute called subword upper contour label is defined then, a pictorial dictionary is constructed based on this attribute for the subwords. By this approach we address two issues in document image retrieval: keyword spotting and retrieval according to the document similarities. The proposed methods have been evaluated on a Persian document image database. The results have proved the ability of this approach in document image information retrieval.

  14. Magnetic resonance brain tissue segmentation based on sparse representations

    NASA Astrophysics Data System (ADS)

    Rueda, Andrea

    2015-12-01

    Segmentation or delineation of specific organs and structures in medical images is an important task in the clinical diagnosis and treatment, since it allows to characterize pathologies through imaging measures (biomarkers). In brain imaging, segmentation of main tissues or specific structures is challenging, due to the anatomic variability and complexity, and the presence of image artifacts (noise, intensity inhomogeneities, partial volume effect). In this paper, an automatic segmentation strategy is proposed, based on sparse representations and coupled dictionaries. Image intensity patterns are singly related to tissue labels at the level of small patches, gathering this information in coupled intensity/segmentation dictionaries. This dictionaries are used within a sparse representation framework to find the projection of a new intensity image onto the intensity dictionary, and the same projection can be used with the segmentation dictionary to estimate the corresponding segmentation. Preliminary results obtained with two publicly available datasets suggest that the proposal is capable of estimating adequate segmentations for gray matter (GM) and white matter (WM) tissues, with an average overlapping of 0:79 for GM and 0:71 for WM (with respect to original segmentations).

  15. Multi-channel feature dictionaries for RGB-D object recognition

    NASA Astrophysics Data System (ADS)

    Lan, Xiaodong; Li, Qiming; Chong, Mina; Song, Jian; Li, Jun

    2018-04-01

    Hierarchical matching pursuit (HMP) is a popular feature learning method for RGB-D object recognition. However, the feature representation with only one dictionary for RGB channels in HMP does not capture sufficient visual information. In this paper, we propose multi-channel feature dictionaries based feature learning method for RGB-D object recognition. The process of feature extraction in the proposed method consists of two layers. The K-SVD algorithm is used to learn dictionaries in sparse coding of these two layers. In the first-layer, we obtain features by performing max pooling on sparse codes of pixels in a cell. And the obtained features of cells in a patch are concatenated to generate patch jointly features. Then, patch jointly features in the first-layer are used to learn the dictionary and sparse codes in the second-layer. Finally, spatial pyramid pooling can be applied to the patch jointly features of any layer to generate the final object features in our method. Experimental results show that our method with first or second-layer features can obtain a comparable or better performance than some published state-of-the-art methods.

  16. Self-expressive Dictionary Learning for Dynamic 3D Reconstruction.

    PubMed

    Zheng, Enliang; Ji, Dinghuang; Dunn, Enrique; Frahm, Jan-Michael

    2017-08-22

    We target the problem of sparse 3D reconstruction of dynamic objects observed by multiple unsynchronized video cameras with unknown temporal overlap. To this end, we develop a framework to recover the unknown structure without sequencing information across video sequences. Our proposed compressed sensing framework poses the estimation of 3D structure as the problem of dictionary learning, where the dictionary is defined as an aggregation of the temporally varying 3D structures. Given the smooth motion of dynamic objects, we observe any element in the dictionary can be well approximated by a sparse linear combination of other elements in the same dictionary (i.e. self-expression). Our formulation optimizes a biconvex cost function that leverages a compressed sensing formulation and enforces both structural dependency coherence across video streams, as well as motion smoothness across estimates from common video sources. We further analyze the reconstructability of our approach under different capture scenarios, and its comparison and relation to existing methods. Experimental results on large amounts of synthetic data as well as real imagery demonstrate the effectiveness of our approach.

  17. 3D Reconstruction of human bones based on dictionary learning.

    PubMed

    Zhang, Binkai; Wang, Xiang; Liang, Xiao; Zheng, Jinjin

    2017-11-01

    An effective method for reconstructing a 3D model of human bones from computed tomography (CT) image data based on dictionary learning is proposed. In this study, the dictionary comprises the vertices of triangular meshes, and the sparse coefficient matrix indicates the connectivity information. For better reconstruction performance, we proposed a balance coefficient between the approximation and regularisation terms and a method for optimisation. Moreover, we applied a local updating strategy and a mesh-optimisation method to update the dictionary and the sparse matrix, respectively. The two updating steps are iterated alternately until the objective function converges. Thus, a reconstructed mesh could be obtained with high accuracy and regularisation. The experimental results show that the proposed method has the potential to obtain high precision and high-quality triangular meshes for rapid prototyping, medical diagnosis, and tissue engineering. Copyright © 2017 IPEM. Published by Elsevier Ltd. All rights reserved.

  18. Normalizing biomedical terms by minimizing ambiguity and variability

    PubMed Central

    Tsuruoka, Yoshimasa; McNaught, John; Ananiadou, Sophia

    2008-01-01

    Background One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach. Results We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS. Conclusions The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known. PMID:18426547

  19. Remote Sensing Terminology in a Global and Knowledge-Based World

    NASA Astrophysics Data System (ADS)

    Kancheva, Rumiana

    The paper is devoted to terminology issues related to all aspects of remote sensing research and applications. Terminology is the basis for a better understanding among people. It is crucial to keep up with the latest developments and novelties of the terminology in advanced technology fields such as aerospace science and industry. This is especially true in remote sensing and geoinformatics which develop rapidly and have ever extending applications in various domains of science and human activities. Remote sensing terminology issues are directly relevant to the contemporary worldwide policies on information accessibility, dissemination and utilization of research results in support of solutions to global environmental challenges and sustainable development goals. Remote sensing and spatial information technologies are an integral part of the international strategies for cooperation in scientific, research and application areas with a particular accent on environmental monitoring, ecological problems natural resources management, climate modeling, weather forecasts, disaster mitigation and many others to which remote sensing data can be put. Remote sensing researchers, professionals, students and decision makers of different counties and nationalities should fully understand, interpret and translate into their native language any term, definition or acronym found in papers, books, proceedings, specifications, documentation, and etc. The importance of the correct use, precise definition and unification of remote sensing terms refers not only to people working in this field but also to experts in a variety of disciplines who handle remote sensing data and information products. In this paper, we draw the attention on the specifics, peculiarities and recent needs of compiling specialized dictionaries in the area of remote sensing focusing on Earth observations and the integration of remote sensing with other geoinformation technologies such as photogrammetry, geodesy, GIS, etc. Our belief is that the elaboration of bilingual and multilingual dictionaries and glossaries in this spreading, most technically advanced and promising field of human expertise is of great practical importance. The work on an English-Bulgarian Dictionary of Remote Sensing Terms is described including considerations on its scope, structure, information content, sellection of terms, and etc. The vision builds upon previous national and international experience and makes use of ongoing activities on the subject. Any interest in cooperation and initiating suchlike collaborative projects is welcome and highly appreciated.

  20. A Selected Bibliography of Educational Sources.

    ERIC Educational Resources Information Center

    Campbell, Janet; And Others

    Focusing on materials available at the California State University at Long Beach Library, this annotated bibliography lists resources in seven subject categories pertaining to education: (1) guides to the professional educational literature; (2) books about education research methodology; (3) encyclopedias and dictionaries; (4) tests and…

  1. Anatomical Entity Recognition with a Hierarchical Framework Augmented by External Resources

    PubMed Central

    Xu, Yan; Hua, Ji; Ni, Zhaoheng; Chen, Qinlang; Fan, Yubo; Ananiadou, Sophia; Chang, Eric I-Chao; Tsujii, Junichi

    2014-01-01

    References to anatomical entities in medical records consist not only of explicit references to anatomical locations, but also other diverse types of expressions, such as specific diseases, clinical tests, clinical treatments, which constitute implicit references to anatomical entities. In order to identify these implicit anatomical entities, we propose a hierarchical framework, in which two layers of named entity recognizers (NERs) work in a cooperative manner. Each of the NERs is implemented using the Conditional Random Fields (CRF) model, which use a range of external resources to generate features. We constructed a dictionary of anatomical entity expressions by exploiting four existing resources, i.e., UMLS, MeSH, RadLex and BodyPart3D, and supplemented information from two external knowledge bases, i.e., Wikipedia and WordNet, to improve inference of anatomical entities from implicit expressions. Experiments conducted on 300 discharge summaries showed a micro-averaged performance of 0.8509 Precision, 0.7796 Recall and 0.8137 F1 for explicit anatomical entity recognition, and 0.8695 Precision, 0.6893 Recall and 0.7690 F1 for implicit anatomical entity recognition. The use of the hierarchical framework, which combines the recognition of named entities of various types (diseases, clinical tests, treatments) with information embedded in external knowledge bases, resulted in a 5.08% increment in F1. The resources constructed for this research will be made publicly available. PMID:25343498

  2. A Study of the Use of a Monolingual Pedagogical Dictionary by Learners of English Engaged in Writing.

    ERIC Educational Resources Information Center

    Harvey, Keith; Yuill, Deborah

    1997-01-01

    Presents an account of a study of the role played by a dictionary in the completion of written (encoding) tasks by students of English as a foreign language. The study uses an introspective methodology based on the completion of flowcharts. Results indicate the importance of information on spelling and meanings and the neglect of coded syntactic…

  3. High-Order Local Pooling and Encoding Gaussians Over a Dictionary of Gaussians.

    PubMed

    Li, Peihua; Zeng, Hui; Wang, Qilong; Shiu, Simon C K; Zhang, Lei

    2017-07-01

    Local pooling (LP) in configuration (feature) space proposed by Boureau et al. explicitly restricts similar features to be aggregated, which can preserve as much discriminative information as possible. At the time it appeared, this method combined with sparse coding achieved competitive classification results with only a small dictionary. However, its performance lags far behind the state-of-the-art results as only the zero-order information is exploited. Inspired by the success of high-order statistical information in existing advanced feature coding or pooling methods, we make an attempt to address the limitation of LP. To this end, we present a novel method called high-order LP (HO-LP) to leverage the information higher than the zero-order one. Our idea is intuitively simple: we compute the first- and second-order statistics per configuration bin and model them as a Gaussian. Accordingly, we employ a collection of Gaussians as visual words to represent the universal probability distribution of features from all classes. Our problem is naturally formulated as encoding Gaussians over a dictionary of Gaussians as visual words. This problem, however, is challenging since the space of Gaussians is not a Euclidean space but forms a Riemannian manifold. We address this challenge by mapping Gaussians into the Euclidean space, which enables us to perform coding with common Euclidean operations rather than complex and often expensive Riemannian operations. Our HO-LP preserves the advantages of the original LP: pooling only similar features and using a small dictionary. Meanwhile, it achieves very promising performance on standard benchmarks, with either conventional, hand-engineered features or deep learning-based features.

  4. Using TEI for an Endangered Language Lexical Resource: The Nxa?amxcín Database-Dictionary Project

    ERIC Educational Resources Information Center

    Czaykowska-Higgins, Ewa; Holmes, Martin D.; Kell, Sarah M.

    2014-01-01

    This paper describes the evolution of a lexical resource project for Nxa?amxcín, an endangered Salish language, from the project's inception in the 1990s, based on legacy materials recorded in the 1960s and 1970s, to its current form as an online database that is transformable into various print and web-based formats for varying uses. We…

  5. Dictionary of Microscopy

    NASA Astrophysics Data System (ADS)

    Heath, Julian

    2005-10-01

    The past decade has seen huge advances in the application of microscopy in all areas of science. This welcome development in microscopy has been paralleled by an expansion of the vocabulary of technical terms used in microscopy: terms have been coined for new instruments and techniques and, as microscopes reach even higher resolution, the use of terms that relate to the optical and physical principles underpinning microscopy is now commonplace. The Dictionary of Microscopy was compiled to meet this challenge and provides concise definitions of over 2,500 terms used in the fields of light microscopy, electron microscopy, scanning probe microscopy, x-ray microscopy and related techniques. Written by Dr Julian P. Heath, Editor of Microscopy and Analysis, the dictionary is intended to provide easy navigation through the microscopy terminology and to be a first point of reference for definitions of new and established terms. The Dictionary of Microscopy is an essential, accessible resource for: students who are new to the field and are learning about microscopes equipment purchasers who want an explanation of the terms used in manufacturers' literature scientists who are considering using a new microscopical technique experienced microscopists as an aide mémoire or quick source of reference librarians, the press and marketing personnel who require definitions for technical reports.

  6. Readability of Healthcare Literature for Gastroparesis and Evaluation of Medical Terminology in Reading Difficulty.

    PubMed

    Meillier, Andrew; Patel, Shyam

    2017-02-01

    Gastroparesis is a chronic condition that can be further enhanced with patient understanding. Patients' education resources on the Internet have become increasingly important in improving healthcare literacy. We evaluated the readability of online resources for gastroparesis and the influence by medical terminology. Google searches were performed for "gastroparesis", "gastroparesis patient education material" and "gastroparesis patient information". Following, all medical terminology was determined if included on Taber's Medical Dictionary 22nd Edition. The medical terminology was replaced independently with "help" and "helping". Web resources were analyzed with the Readability Studio Professional Edition (Oleander Solutions, Vandalia, OH) using 10 different readability scales. The average of the 26 patient education resources was 12.7 ± 1.8 grade levels. The edited "help" group had 6.6 ± 1.0 and "helping" group had 10.4 ± 2.1 reading levels. In comparing the three groups, the "help" and "helping" groups had significantly lower readability levels (P < 0.001). The "help" group was significantly less than the "helping" group (P < 0.001). The web resources for gastroparesis were higher than the recommended reading level by the American Medical Association. Medical terminology was shown to be the cause for this elevated readability level with all, but four resources within the recommended grade levels following word replacement.

  7. Rating prediction using textual reviews

    NASA Astrophysics Data System (ADS)

    NithyaKalyani, A.; Ushasukhanya, S.; Nagamalleswari, TYJ; Girija, S.

    2018-04-01

    Information today is present in the form of opinions. Two & a half quintillion bytes are exchanged today in Internet everyday and a large amount consists of people’s speculation and reflection over an issue. It is the need of the hour to be able to mine this information that is presented to us. Sentimental analysis refers to mining of this raw information to make sense. The discipline of opinion mining has seen a lot of encouragement in the past few years augmented by involvement of social media like Instagram, Facebook, and twitter. The hidden message in this web of information is useful in several fields such as marketing, political polls, product review, forecast market movement, Identifying detractor and promoter. In this endeavor, we introduced sentiment rating system for a particular text or paragraph to determine the opinions polarity. Firstly we resolve the searching problem, tokenization, classification, and reliable content identification. Secondly we extract probability for given text or paragraph for both positive & negative sentiment value using naive bayes classifier. At last we use sentiment dictionary (SD), sentiment degree dictionary (SDD) and negation dictionary (ND) for more accuracy. Later we blend all above mentioned factor into given formula to find the rating for the review.

  8. CIDR

    Science.gov Websites

    Related Links & Resources Access and Applications Access Applications Example Applications Project Research and Related (R&R) forms and the SF424 (R&R) Application Guide. Access to the CIDR Program Guidelines Example Applications All applications must include a Data Dictionary of phenotypic measures to be

  9. Requirements and design aspects of a data model for a data dictionary in paediatric oncology.

    PubMed

    Merzweiler, A; Knaup, P; Creutzig, U; Ehlerding, H; Haux, R; Mludek, V; Schilling, F H; Weber, R; Wiedemann, T

    2000-01-01

    German children suffering from cancer are mostly treated within the framework of multicentre clinical trials. An important task of conducting these trials is an extensive information and knowledge exchange, which has to be based on a standardised documentation. To support this effort, it is the aim of a nationwide project to define a standardised terminology that should be used by clinical trials for therapy documentation. In order to support terminology maintenance we are currently developing a data dictionary. In this paper we describe requirements and design aspects of the data model used for the data dictionary as first results of our research. We compare it with other terminology systems.

  10. DoD Net-Centric Services Strategy Implementation in the C2 Domain

    DTIC Science & Technology

    2010-02-01

    those for monolingual thesauri indicated in ANSI/NISO Z39.19-2005 and ISO 2788-1986. Also, the versioning regimen in the KOS must be robust, a...Metadata Registry: Repository of all metadata related to data structures, models, dictionaries , taxonomies, schema, and other engineering artifacts that...access information, schemas, style sheets, controlled vocabularies, dictionaries , and other work products. It would normally be discovered via a

  11. Spanish Language Processing at University of Maryland: Building Infrastructure for Multilingual Applications

    DTIC Science & Technology

    2001-09-01

    translation of the Spanish original sentence. Acquiring bilingual dictionary entries In addition to building and applying the more sophisticated LCS...porting LCS lexicons to new languages, as described above, and are also useful by themselves in improving dictionary -based cross language information...hold much of the time. Moreover, lexical dependencies have proven to be instrumental in advances in monolingual syntactic analysis (e.g. I-erg MY

  12. DESM: portal for microbial knowledge exploration systems.

    PubMed

    Salhi, Adil; Essack, Magbubah; Radovanovic, Aleksandar; Marchand, Benoit; Bougouffa, Salim; Antunes, Andre; Simoes, Marta Filipa; Lafi, Feras F; Motwalli, Olaa A; Bokhari, Ameerah; Malas, Tariq; Amoudi, Soha Al; Othum, Ghofran; Allam, Intikhab; Mineta, Katsuhiko; Gao, Xin; Hoehndorf, Robert; C Archer, John A; Gojobori, Takashi; Bajic, Vladimir B

    2016-01-04

    Microorganisms produce an enormous variety of chemical compounds. It is of general interest for microbiology and biotechnology researchers to have means to explore information about molecular and genetic basis of functioning of different microorganisms and their ability for bioproduction. To enable such exploration, we compiled 45 topic-specific knowledgebases (KBs) accessible through DESM portal (www.cbrc.kaust.edu.sa/desm). The KBs contain information derived through text-mining of PubMed information and complemented by information data-mined from various other resources (e.g. ChEBI, Entrez Gene, GO, KOBAS, KEGG, UniPathways, BioGrid). All PubMed records were indexed using 4,538,278 concepts from 29 dictionaries, with 1 638 986 records utilized in KBs. Concepts used are normalized whenever possible. Most of the KBs focus on a particular type of microbial activity, such as production of biocatalysts or nutraceuticals. Others are focused on specific categories of microorganisms, e.g. streptomyces or cyanobacteria. KBs are all structured in a uniform manner and have a standardized user interface. Information exploration is enabled through various searches. Users can explore statistically most significant concepts or pairs of concepts, generate hypotheses, create interactive networks of associated concepts and export results. We believe DESM will be a useful complement to the existing resources to benefit microbiology and biotechnology research. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. [Pilot study of domain-specific terminology adaptation for morphological analysis: research on unknown terms in national examination documents of radiological technologists].

    PubMed

    Tsuji, Shintarou; Nishimoto, Naoki; Ogasawara, Katsuhiko

    2008-07-20

    Although large medical texts are stored in electronic format, they are seldom reused because of the difficulty of processing narrative texts by computer. Morphological analysis is a key technology for extracting medical terms correctly and automatically. This process parses a sentence into its smallest unit, the morpheme. Phrases consisting of two or more technical terms, however, cause morphological analysis software to fail in parsing the sentence and output unprocessed terms as "unknown words." The purpose of this study was to reduce the number of unknown words in medical narrative text processing. The results of parsing the text with additional dictionaries were compared with the analysis of the number of unknown words in the national examination for radiologists. The ratio of unknown words was reduced 1.0% to 0.36% by adding terminologies of radiological technology, MeSH, and ICD-10 labels. The terminology of radiological technology was the most effective resource, being reduced by 0.62%. This result clearly showed the necessity of additional dictionary selection and trends in unknown words. The potential for this investigation is to make available a large body of clinical information that would otherwise be inaccessible for applications other than manual health care review by personnel.

  14. Learning Essential Terms and Concepts in Statistics and Accounting

    ERIC Educational Resources Information Center

    Peters, Pam; Smith, Adam; Middledorp, Jenny; Karpin, Anne; Sin, Samantha; Kilgore, Alan

    2014-01-01

    This paper describes a terminological approach to the teaching and learning of fundamental concepts in foundation tertiary units in Statistics and Accounting, using an online dictionary-style resource (TermFinder) with customised "termbanks" for each discipline. Designed for independent learning, the termbanks support inquiring students…

  15. Web technology for emergency medicine and secure transmission of electronic patient records.

    PubMed

    Halamka, J D

    1998-01-01

    The American Heritage dictionary defines the word "web" as "something intricately contrived, especially something that ensnares or entangles." The wealth of medical resources on the World Wide Web is now so extensive, yet disorganized and unmonitored, that such a definition seems fitting. In emergency medicine, for example, a field in which accurate and complete information, including patients' records, is urgently needed, more than 5000 Web pages are available today, whereas fewer than 50 were available in December 1994. Most sites are static Web pages using the Internet to publish textbook material, but new technology is extending the scope of the Internet to include online medical education and secure exchange of clinical information. This article lists some of the best Web sites for use in emergency medicine and then describes a project in which the Web is used for transmission and protection of electronic medical records.

  16. dREL: a relational expression language for dictionary methods.

    PubMed

    Spadaccini, Nick; Castleden, Ian R; du Boulay, Doug; Hall, Sydney R

    2012-08-27

    The provision of precise metadata is an important but a largely underrated challenge for modern science [Nature 2009, 461, 145]. We describe here a dictionary methods language dREL that has been designed to enable complex data relationships to be expressed as formulaic scripts in data dictionaries written in DDLm [Spadaccini and Hall J. Chem. Inf. Model.2012 doi:10.1021/ci300075z]. dREL describes data relationships in a simple but powerful canonical form that is easy to read and understand and can be executed computationally to evaluate or validate data. The execution of dREL expressions is not a substitute for traditional scientific computation; it is to provide precise data dependency information to domain-specific definitions and a means for cross-validating data. Some scientific fields apply conventional programming languages to methods scripts but these tend to inhibit both dictionary development and accessibility. dREL removes the programming barrier and encourages the production of the metadata needed for seamless data archiving and exchange in science.

  17. Evaluation of a data dictionary system. [information dissemination and computer systems programs

    NASA Technical Reports Server (NTRS)

    Driggers, W. G.

    1975-01-01

    The usefulness was investigated of a data dictionary/directory system for achieving optimum benefits from existing and planned investments in computer data files in the Data Systems Development Branch and the Institutional Data Systems Division. Potential applications of the data catalogue system are discussed along with an evaluation of the system. Other topics discussed include data description, data structure, programming aids, programming languages, program networks, and test data.

  18. Implementation of dictionary pair learning algorithm for image quality improvement

    NASA Astrophysics Data System (ADS)

    Vimala, C.; Aruna Priya, P.

    2018-04-01

    This paper proposes an image denoising on dictionary pair learning algorithm. Visual information is transmitted in the form of digital images is becoming a major method of communication in the modern age, but the image obtained after transmissions is often corrupted with noise. The received image needs processing before it can be used in applications. Image denoising involves the manipulation of the image data to produce a visually high quality image.

  19. Use of Modality and Negation in Semantically-Informed Syntactic MT

    DTIC Science & Technology

    2012-06-01

    Longman Dictionary of Contemporary English (LDOCE). 422 Baker et al. Modality and Negation in SIMT We produced the full English MN lexicon semi...English sentence pairs, and a bilingual dictionary with 113,911 entries. For our development and test sets, we split the NIST MT-08 test set into two...for combining MT and semantics (termed distillation) to answer the informa- tion needs of monolingual speakers using multilingual sources. Proper

  20. The Language Grid: supporting intercultural collaboration

    NASA Astrophysics Data System (ADS)

    Ishida, T.

    2018-03-01

    A variety of language resources already exist online. Unfortunately, since many language resources have usage restrictions, it is virtually impossible for each user to negotiate with every language resource provider when combining several resources to achieve the intended purpose. To increase the accessibility and usability of language resources (dictionaries, parallel texts, part-of-speech taggers, machine translators, etc.), we proposed the Language Grid [1]; it wraps existing language resources as atomic services and enables users to create new services by combining the atomic services, and reduces the negotiation costs related to intellectual property rights [4]. Our slogan is “language services from language resources.” We believe that modularization with recombination is the key to creating a full range of customized language environments for various user communities.

  1. Kinase Pathway Database: An Integrated Protein-Kinase and NLP-Based Protein-Interaction Resource

    PubMed Central

    Koike, Asako; Kobayashi, Yoshiyuki; Takagi, Toshihisa

    2003-01-01

    Protein kinases play a crucial role in the regulation of cellular functions. Various kinds of information about these molecules are important for understanding signaling pathways and organism characteristics. We have developed the Kinase Pathway Database, an integrated database involving major completely sequenced eukaryotes. It contains the classification of protein kinases and their functional conservation, ortholog tables among species, protein–protein, protein–gene, and protein–compound interaction data, domain information, and structural information. It also provides an automatic pathway graphic image interface. The protein, gene, and compound interactions are automatically extracted from abstracts for all genes and proteins by natural-language processing (NLP).The method of automatic extraction uses phrase patterns and the GENA protein, gene, and compound name dictionary, which was developed by our group. With this database, pathways are easily compared among species using data with more than 47,000 protein interactions and protein kinase ortholog tables. The database is available for querying and browsing at http://kinasedb.ontology.ims.u-tokyo.ac.jp/. PMID:12799355

  2. Expanding Academic Vocabulary with an Interactive On-Line Database

    ERIC Educational Resources Information Center

    Horst, Marlise; Cobb, Tom; Nicolae, Ioana

    2005-01-01

    University students used a set of existing and purpose-built on-line tools for vocabulary learning in an experimental ESL course. The resources included concordance, dictionary, cloze-builder, hypertext, and a database with interactive self-quizzing feature (all freely available at www.lextutor.ca). The vocabulary targeted for learning consisted…

  3. Thumbs Up: High-Quality, Low-Cost Teaching Aids.

    ERIC Educational Resources Information Center

    Paine, Carolyn

    1982-01-01

    Exemplary teaching aids--games, workbooks, student and teacher resource books, reading materials, and records--are recommended by subject area and grade level. Materials include an ice cream cone game for mathematics, a "Life Skills Reading" book on telephone usage, a "Dictionary of Recent American History," and many other items. (PP)

  4. ArtMARC Sourcebook: Cataloging Art, Architecture, and Their Visual Images.

    ERIC Educational Resources Information Center

    McRae, Linda, Ed.; White, Lynda S., Ed.

    Profiling the proven cataloging methods of experts from libraries, art galleries, museums, and other institutions, this sourcebook outlines cataloging techniques for a wide variety of resources from ancient artifacts to architectural drawings. A data dictionary of relevant MARC fields is also included, along with data conversion comments. A…

  5. Student Outcomes 2009: Data Dictionary. Support Document

    ERIC Educational Resources Information Center

    National Centre for Vocational Education Research (NCVER), 2009

    2009-01-01

    This document was produced as an added resource for the report "Outcomes from the Productivity Places Program, 2009." The study reported the outcomes for students who completed their vocational education and training (VET) under the Productivity Places Program (PPP) during 2008. This document presents an alphabetical arrangement of the…

  6. ARBA Guide to Biographical Resources 1986-1997.

    ERIC Educational Resources Information Center

    Wick, Robert L., Ed.; Mood, Terry Ann, Ed.

    This guide provides a representative selection of biographical dictionaries and related works useful to the reference and collection development processes in all types of libraries. Three criteria were used in selection: (1) each item included was published within the past 12 years; (2) each item has been included in American Reference Books…

  7. Hupa Natural Resources Dictionary.

    ERIC Educational Resources Information Center

    Bennett, Ruth, Ed.; And Others

    Created by children in grades 5-8 who were enrolled in a year-long Hupa language class, this computer-generated, bilingual book contains descriptions and illustrations of local animals, birds, and fish. The introduction explains that students worked on a Macintosh computer able to print the Unifon alphabet used in writing the Hupa language.…

  8. Resources for Teaching Word Identification.

    ERIC Educational Resources Information Center

    Schell, Leo M., Ed.; And Others

    Only materials specifically designed to teach one or more of the following word identification skills were included in this booklet: sight words, context clues, phonic analysis, structural analysis, and dictionary skills. Materials for grades one through six are stressed, although a few materials suitable for secondary school students are listed.…

  9. Linguistic Resource Creation for Research and Technology Development: A Recent Experiment

    DTIC Science & Technology

    2003-01-01

    inflectional morphology. For instance, Tzeltal (a Mayan language of Mexico), Swahili (east Africa), and Shuar (a Jivaroan language of Ecuador) all...or they can come from dictionaries, as was the case for Tzeltal and Shuar . In the case of inflectionally rich languages, when the word forms are

  10. Data dictionary services in XNAT and the Human Connectome Project.

    PubMed

    Herrick, Rick; McKay, Michael; Olsen, Timothy; Horton, William; Florida, Mark; Moore, Charles J; Marcus, Daniel S

    2014-01-01

    The XNAT informatics platform is an open source data management tool used by biomedical imaging researchers around the world. An important feature of XNAT is its highly extensible architecture: users of XNAT can add new data types to the system to capture the imaging and phenotypic data generated in their studies. Until recently, XNAT has had limited capacity to broadcast the meaning of these data extensions to users, other XNAT installations, and other software. We have implemented a data dictionary service for XNAT, which is currently being used on ConnectomeDB, the Human Connectome Project (HCP) public data sharing website. The data dictionary service provides a framework to define key relationships between data elements and structures across the XNAT installation. This includes not just core data representing medical imaging data or subject or patient evaluations, but also taxonomical structures, security relationships, subject groups, and research protocols. The data dictionary allows users to define metadata for data structures and their properties, such as value types (e.g., textual, integers, floats) and valid value templates, ranges, or field lists. The service provides compatibility and integration with other research data management services by enabling easy migration of XNAT data to standards-based formats such as the Resource Description Framework (RDF), JavaScript Object Notation (JSON), and Extensible Markup Language (XML). It also facilitates the conversion of XNAT's native data schema into standard neuroimaging vocabularies and structures.

  11. Data dictionary services in XNAT and the Human Connectome Project

    PubMed Central

    Herrick, Rick; McKay, Michael; Olsen, Timothy; Horton, William; Florida, Mark; Moore, Charles J.; Marcus, Daniel S.

    2014-01-01

    The XNAT informatics platform is an open source data management tool used by biomedical imaging researchers around the world. An important feature of XNAT is its highly extensible architecture: users of XNAT can add new data types to the system to capture the imaging and phenotypic data generated in their studies. Until recently, XNAT has had limited capacity to broadcast the meaning of these data extensions to users, other XNAT installations, and other software. We have implemented a data dictionary service for XNAT, which is currently being used on ConnectomeDB, the Human Connectome Project (HCP) public data sharing website. The data dictionary service provides a framework to define key relationships between data elements and structures across the XNAT installation. This includes not just core data representing medical imaging data or subject or patient evaluations, but also taxonomical structures, security relationships, subject groups, and research protocols. The data dictionary allows users to define metadata for data structures and their properties, such as value types (e.g., textual, integers, floats) and valid value templates, ranges, or field lists. The service provides compatibility and integration with other research data management services by enabling easy migration of XNAT data to standards-based formats such as the Resource Description Framework (RDF), JavaScript Object Notation (JSON), and Extensible Markup Language (XML). It also facilitates the conversion of XNAT's native data schema into standard neuroimaging vocabularies and structures. PMID:25071542

  12. Signal Sampling for Efficient Sparse Representation of Resting State FMRI Data

    PubMed Central

    Ge, Bao; Makkie, Milad; Wang, Jin; Zhao, Shijie; Jiang, Xi; Li, Xiang; Lv, Jinglei; Zhang, Shu; Zhang, Wei; Han, Junwei; Guo, Lei; Liu, Tianming

    2015-01-01

    As the size of brain imaging data such as fMRI grows explosively, it provides us with unprecedented and abundant information about the brain. How to reduce the size of fMRI data but not lose much information becomes a more and more pressing issue. Recent literature studies tried to deal with it by dictionary learning and sparse representation methods, however, their computation complexities are still high, which hampers the wider application of sparse representation method to large scale fMRI datasets. To effectively address this problem, this work proposes to represent resting state fMRI (rs-fMRI) signals of a whole brain via a statistical sampling based sparse representation. First we sampled the whole brain’s signals via different sampling methods, then the sampled signals were aggregate into an input data matrix to learn a dictionary, finally this dictionary was used to sparsely represent the whole brain’s signals and identify the resting state networks. Comparative experiments demonstrate that the proposed signal sampling framework can speed-up by ten times in reconstructing concurrent brain networks without losing much information. The experiments on the 1000 Functional Connectomes Project further demonstrate its effectiveness and superiority. PMID:26646924

  13. Dictionary-based fiber orientation estimation with improved spatial consistency.

    PubMed

    Ye, Chuyang; Prince, Jerry L

    2018-02-01

    Diffusion magnetic resonance imaging (dMRI) has enabled in vivo investigation of white matter tracts. Fiber orientation (FO) estimation is a key step in tract reconstruction and has been a popular research topic in dMRI analysis. In particular, the sparsity assumption has been used in conjunction with a dictionary-based framework to achieve reliable FO estimation with a reduced number of gradient directions. Because image noise can have a deleterious effect on the accuracy of FO estimation, previous works have incorporated spatial consistency of FOs in the dictionary-based framework to improve the estimation. However, because FOs are only indirectly determined from the mixture fractions of dictionary atoms and not modeled as variables in the objective function, these methods do not incorporate FO smoothness directly, and their ability to produce smooth FOs could be limited. In this work, we propose an improvement to Fiber Orientation Reconstruction using Neighborhood Information (FORNI), which we call FORNI+; this method estimates FOs in a dictionary-based framework where FO smoothness is better enforced than in FORNI alone. We describe an objective function that explicitly models the actual FOs and the mixture fractions of dictionary atoms. Specifically, it consists of data fidelity between the observed signals and the signals represented by the dictionary, pairwise FO dissimilarity that encourages FO smoothness, and weighted ℓ 1 -norm terms that ensure the consistency between the actual FOs and the FO configuration suggested by the dictionary representation. The FOs and mixture fractions are then jointly estimated by minimizing the objective function using an iterative alternating optimization strategy. FORNI+ was evaluated on a simulation phantom, a physical phantom, and real brain dMRI data. In particular, in the real brain dMRI experiment, we have qualitatively and quantitatively evaluated the reproducibility of the proposed method. Results demonstrate that FORNI+ produces FOs with better quality compared with competing methods. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Maximally Expressive Modeling of Operations Tasks

    NASA Technical Reports Server (NTRS)

    Jaap, John; Richardson, Lea; Davis, Elizabeth

    2002-01-01

    Planning and scheduling systems organize "tasks" into a timeline or schedule. The tasks are defined within the scheduling system in logical containers called models. The dictionary might define a model of this type as "a system of things and relations satisfying a set of rules that, when applied to the things and relations, produce certainty about the tasks that are being modeled." One challenging domain for a planning and scheduling system is the operation of on-board experiments for the International Space Station. In these experiments, the equipment used is among the most complex hardware ever developed, the information sought is at the cutting edge of scientific endeavor, and the procedures are intricate and exacting. Scheduling is made more difficult by a scarcity of station resources. The models to be fed into the scheduler must describe both the complexity of the experiments and procedures (to ensure a valid schedule) and the flexibilities of the procedures and the equipment (to effectively utilize available resources). Clearly, scheduling International Space Station experiment operations calls for a "maximally expressive" modeling schema.

  15. Booksearch: What Dictionary (General or Specialized) Do You Find Useful or Interesting for Students?

    ERIC Educational Resources Information Center

    English Journal, 1988

    1988-01-01

    Presents classroom teachers' recommendations for a variety of dictionaries that may heighten students' interest in language: a reverse dictionary, a visual dictionary, WEIGHTY WORD BOOK, a collegiate desk dictionary, OXFORD ENGLISH DICTIONARY, DICTIONARY OF AMERICAN REGIONAL ENGLISH, and a dictionary of idioms. (ARH)

  16. The Adult Mouse Anatomical Dictionary: a tool for annotating and integrating data

    PubMed Central

    Hayamizu, Terry F; Mangan, Mary; Corradi, John P; Kadin, James A; Ringwald, Martin

    2005-01-01

    We have developed an ontology to provide standardized nomenclature for anatomical terms in the postnatal mouse. The Adult Mouse Anatomical Dictionary is structured as a directed acyclic graph, and is organized hierarchically both spatially and functionally. The ontology will be used to annotate and integrate different types of data pertinent to anatomy, such as gene expression patterns and phenotype information, which will contribute to an integrated description of biological phenomena in the mouse. PMID:15774030

  17. The architecture of a distributed medical dictionary.

    PubMed

    Fowler, J; Buffone, G; Moreau, D

    1995-01-01

    Exploiting high-speed computer networks to provide a national medical information infrastructure is a goal for medical informatics. The Distributed Medical Dictionary under development at Baylor College of Medicine is a model for an architecture that supports collaborative development of a distributed online medical terminology knowledge-base. A prototype is described that illustrates the concept. Issues that must be addressed by such a system include high availability, acceptable response time, support for local idiom, and control of vocabulary.

  18. Building a protein name dictionary from full text: a machine learning term extraction approach.

    PubMed

    Shi, Lei; Campagne, Fabien

    2005-04-07

    The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt.

  19. Generation of a consensus protein domain dictionary

    PubMed Central

    Schaeffer, R. Dustin; Jonsson, Amanda L.; Simms, Andrew M.; Daggett, Valerie

    2011-01-01

    Motivation: The discovery of new protein folds is a relatively rare occurrence even as the rate of protein structure determination increases. This rarity reinforces the concept of folds as reusable units of structure and function shared by diverse proteins. If the folding mechanism of proteins is largely determined by their topology, then the folding pathways of members of existing folds could encompass the full set used by globular protein domains. Results: We have used recent versions of three common protein domain dictionaries (SCOP, CATH and Dali) to generate a consensus domain dictionary (CDD). Surprisingly, 40% of the metafolds in the CDD are not composed of autonomous structural domains, i.e. they are not plausible independent folding units. This finding has serious ramifications for bioinformatics studies mining these domain dictionaries for globular protein properties. However, our main purpose in deriving this CDD was to generate an updated CDD to choose targets for MD simulation as part of our dynameomics effort, which aims to simulate the native and unfolding pathways of representatives of all globular protein consensus folds (metafolds). Consequently, we also compiled a list of representative protein targets of each metafold in the CDD. Availability and implementation: This domain dictionary is available at www.dynameomics.org. Contact: daggett@u.washington.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21068000

  20. Building a protein name dictionary from full text: a machine learning term extraction approach

    PubMed Central

    Shi, Lei; Campagne, Fabien

    2005-01-01

    Background The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. Results We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM) to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. Conclusion This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt. PMID:15817129

  1. Building a semantic web-based metadata repository for facilitating detailed clinical modeling in cancer genome studies.

    PubMed

    Sharma, Deepak K; Solbrig, Harold R; Tao, Cui; Weng, Chunhua; Chute, Christopher G; Jiang, Guoqian

    2017-06-05

    Detailed Clinical Models (DCMs) have been regarded as the basis for retaining computable meaning when data are exchanged between heterogeneous computer systems. To better support clinical cancer data capturing and reporting, there is an emerging need to develop informatics solutions for standards-based clinical models in cancer study domains. The objective of the study is to develop and evaluate a cancer genome study metadata management system that serves as a key infrastructure in supporting clinical information modeling in cancer genome study domains. We leveraged a Semantic Web-based metadata repository enhanced with both ISO11179 metadata standard and Clinical Information Modeling Initiative (CIMI) Reference Model. We used the common data elements (CDEs) defined in The Cancer Genome Atlas (TCGA) data dictionary, and extracted the metadata of the CDEs using the NCI Cancer Data Standards Repository (caDSR) CDE dataset rendered in the Resource Description Framework (RDF). The ITEM/ITEM_GROUP pattern defined in the latest CIMI Reference Model is used to represent reusable model elements (mini-Archetypes). We produced a metadata repository with 38 clinical cancer genome study domains, comprising a rich collection of mini-Archetype pattern instances. We performed a case study of the domain "clinical pharmaceutical" in the TCGA data dictionary and demonstrated enriched data elements in the metadata repository are very useful in support of building detailed clinical models. Our informatics approach leveraging Semantic Web technologies provides an effective way to build a CIMI-compliant metadata repository that would facilitate the detailed clinical modeling to support use cases beyond TCGA in clinical cancer study domains.

  2. The role of organizational context and individual nurse characteristics in explaining variation in use of information technologies in evidence based practice

    PubMed Central

    2012-01-01

    Background There is growing awareness of the role of information technology in evidence-based practice. The purpose of this study was to investigate the role of organizational context and nurse characteristics in explaining variation in nurses’ use of personal digital assistants (PDAs) and mobile Tablet PCs for accessing evidence-based information. The Promoting Action on Research Implementation in Health Services (PARIHS) model provided the framework for studying the impact of providing nurses with PDA-supported, evidence-based practice resources, and for studying the organizational, technological, and human resource variables that impact nurses’ use patterns. Methods A survey design was used, involving baseline and follow-up questionnaires. The setting included 24 organizations representing three sectors: hospitals, long-term care (LTC) facilities, and community organizations (home care and public health). The sample consisted of 710 participants (response rate 58%) at Time 1, and 469 for whom both Time 1 and Time 2 follow-up data were obtained (response rate 66%). A hierarchical regression model (HLM) was used to evaluate the effect of predictors from all levels simultaneously. Results The Chi square result indicated PDA users reported using their device more frequently than Tablet PC users (p = 0.001). Frequency of device use was explained by ‘breadth of device functions’ and PDA versus Tablet PC. Frequency of Best Practice Guideline use was explained by ‘willingness to implement research,’ ‘structural and electronic resources,’ ‘organizational slack time,’ ‘breadth of device functions’ (positive effects), and ‘slack staff’ (negative effect). Frequency of Nursing Plus database use was explained by ‘culture,’ ‘structural and electronic resources,’ and ‘breadth of device functions’ (positive effects), and ‘slack staff’ (negative). ‘Organizational culture’ (positive), ‘breadth of device functions’ (positive), and ‘slack staff ‘(negative) were associated with frequency of Lexi/PEPID drug dictionary use. Conclusion Access to PDAs and Tablet PCs supported nurses’ self-reported use of information resources. Several of the organizational context variables and one individual nurse variable explained variation in the frequency of information resource use. PMID:23276201

  3. Preliminary Integrated Geologic Map Databases for the United States: Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, Rhode Island and Vermont

    USGS Publications Warehouse

    Nicholson, Suzanne W.; Dicken, Connie L.; Horton, John D.; Foose, Michael P.; Mueller, Julia A.L.; Hon, Rudi

    2006-01-01

    The rapid growth in the use of Geographic Information Systems (GIS) has highlighted the need for regional and national scale digital geologic maps that have standardized information about geologic age and lithology. Such maps can be conveniently used to generate derivative maps for manifold special purposes such as mineral-resource assessment, metallogenic studies, tectonic studies, and environmental research. Although two digital geologic maps (Schruben and others, 1994; Reed and Bush, 2004) of the United States currently exist, their scales (1:2,500,000 and 1:5,000,000) are too general for many regional applications. Most states have digital geologic maps at scales of about 1:500,000, but the databases are not comparably structured and, thus, it is difficult to use the digital database for more than one state at a time. This report describes the result for a seven state region of an effort by the U.S. Geological Survey to produce a series of integrated and standardized state geologic map databases that cover the entire United States. In 1997, the United States Geological Survey's Mineral Resources Program initiated the National Surveys and Analysis (NSA) Project to develop national digital databases. One primary activity of this project was to compile a national digital geologic map database, utilizing state geologic maps, to support studies in the range of 1:250,000- to 1:1,000,000-scale. To accomplish this, state databases were prepared using a common standard for the database structure, fields, attribution, and data dictionaries. For Alaska and Hawaii new state maps are being prepared and the preliminary work for Alaska is being released as a series of 1:250,000 scale quadrangle reports. This document provides background information and documentation for the integrated geologic map databases of this report. This report is one of a series of such reports releasing preliminary standardized geologic map databases for the United States. The data products of the project consist of two main parts, the spatial databases and a set of supplemental tables relating to geologic map units. The datasets serve as a data resource to generate a variety of stratigraphic, age, and lithologic maps. This documentation is divided into four main sections: (1) description of the set of data files provided in this report, (2) specifications of the spatial databases, (3) specifications of the supplemental tables, and (4) an appendix containing the data dictionaries used to populate some fields of the spatial database and supplemental tables.

  4. Good Planets Are Hard To Find! Ecology Action Workbook and Dictionary.

    ERIC Educational Resources Information Center

    Bazar, Ronald M.; Dehr, Roma

    Even though the condition of the planet is a serious subject, becoming ecologically aware and active can be fun. This workbook provides ecologically conscious-raising activities that children can do at home or with others. A series of worksheets guides students through eight activities including: (1) assessing community resources and environmental…

  5. Spanish Sign in the Americas.

    ERIC Educational Resources Information Center

    Schein, Jerome D.

    1995-01-01

    Spanish Sign Language (SSL) is now the second most used sign language. This article introduces resources for the study of SSL, including three SSL dictionaries--two from Argentina and one from Puerto Rico. Differences in SSL between and within the two countries are noted. Implications for deaf educators in North America are drawn. (Author/DB)

  6. Aerospell Supplemental Spell Check File

    NASA Technical Reports Server (NTRS)

    2000-01-01

    Aerospell is a supplemental spell check file that can be used as a resource for researchers, writers, editors, students, and others who compose scientific and technical texts. The file extends the general spell check dictionaries of word processors by adding more than 13,000 words used in a broad range of aerospace and related disciplines.

  7. Learners' Dictionaries: State of the Art. Anthology Series 23.

    ERIC Educational Resources Information Center

    Tickoo, Makhan L., Ed.

    A collection of articles on dictionaries for advanced second language learners includes essays on the past, present, and future of learners' dictionaries; alternative dictionaries; dictionary construction; and dictionaries and their users. Titles include: "Idle Thoughts of an Idle Fellow; or Vaticinations on the Learners' Dictionary"…

  8. Dictionaries: British and American. The Language Library.

    ERIC Educational Resources Information Center

    Hulbert, James Root

    An account of the dictionaries, great and small, of the English-speaking world is given in this book. Subjects covered include the origin of English dictionaries, early dictionaries, Noah Webster and his successors to the present, abridged dictionaries, "The Oxford English Dictionary" and later dictionaries patterned after it, the…

  9. Conversion of environmental data to a digital-spatial database, Puget Sound area, Washington

    USGS Publications Warehouse

    Uhrich, M.A.; McGrath, T.S.

    1997-01-01

    Data and maps from the Puget Sound Environmental Atlas, compiled for the U.S. Environmental Protection Agency, the Puget Sound Water Quality Authority, and the U.S. Army Corps of Engineers, have been converted into a digital-spatial database using a geographic information system. Environmental data for the Puget Sound area,collected from sources other than the Puget SoundEnvironmental Atlas by different Federal, State, andlocal agencies, also have been converted into thisdigital-spatial database. Background on the geographic-information-system planning process, the design and implementation of the geographic information-system database, and the reasons for conversion to this digital-spatial database are included in this report. The Puget Sound Environmental Atlas data layers include information about seabird nesting areas, eelgrass and kelp habitat, marine mammal and fish areas, and shellfish resources and bed certification. Data layers, from sources other than the Puget Sound Environmental Atlas, include the Puget Sound shoreline, the water-body system, shellfish growing areas, recreational shellfish beaches, sewage-treatment outfalls, upland hydrography,watershed and political boundaries, and geographicnames. The sources of data, descriptions of the datalayers, and the steps and errors of processing associated with conversion to a digital-spatial database used in development of the Puget Sound Geographic Information System also are included in this report. The appendixes contain data dictionaries for each of the resource layers and error values for the conversion of Puget SoundEnvironmental Atlas data.

  10. Reference architecture and interoperability model for data mining and fusion in scientific cross-domain infrastructures

    NASA Astrophysics Data System (ADS)

    Haener, Rainer; Waechter, Joachim; Grellet, Sylvain; Robida, Francois

    2017-04-01

    Interoperability is the key factor in establishing scientific research environments and infrastructures, as well as in bringing together heterogeneous, geographically distributed risk management, monitoring, and early warning systems. Based on developments within the European Plate Observing System (EPOS), a reference architecture has been devised that comprises architectural blue-prints and interoperability models regarding the specification of business processes and logic as well as the encoding of data, metadata, and semantics. The architectural blueprint is developed on the basis of the so called service-oriented architecture (SOA) 2.0 paradigm, which combines intelligence and proactiveness of event-driven with service-oriented architectures. SOA 2.0 supports analysing (Data Mining) both, static and real-time data in order to find correlations of disparate information that do not at first appear to be intuitively obvious: Analysed data (e.g., seismological monitoring) can be enhanced with relationships discovered by associating them (Data Fusion) with other data (e.g., creepmeter monitoring), with digital models of geological structures, or with the simulation of geological processes. The interoperability model describes the information, communication (conversations) and the interactions (choreographies) of all participants involved as well as the processes for registering, providing, and retrieving information. It is based on the principles of functional integration, implemented via dedicated services, communicating via service-oriented and message-driven infrastructures. The services provide their functionality via standardised interfaces: Instead of requesting data directly, users share data via services that are built upon specific adapters. This approach replaces the tight coupling at data level by a flexible dependency on loosely coupled services. The main component of the interoperability model is the comprehensive semantic description of the information, business logic and processes on the basis of a minimal set of well-known, established standards. It implements the representation of knowledge with the application of domain-controlled vocabularies to statements about resources, information, facts, and complex matters (ontologies). Seismic experts for example, would be interested in geological models or borehole measurements at a certain depth, based on which it is possible to correlate and verify seismic profiles. The entire model is built upon standards from the Open Geospatial Consortium (Dictionaries, Service Layer), the International Organisation for Standardisation (Registries, Metadata), and the World Wide Web Consortium (Resource Description Framework, Spatial Data on the Web Best Practices). It has to be emphasised that this approach is scalable to the greatest possible extent: All information, necessary in the context of cross-domain infrastructures is referenced via vocabularies and knowledge bases containing statements that provide either the information itself or resources (service-endpoints), the information can be retrieved from. The entire infrastructure communication is subject to a broker-based business logic integration platform where the information exchanged between involved participants, is managed on the basis of standardised dictionaries, repositories, and registries. This approach also enables the development of Systems-of-Systems (SoS), which allow the collaboration of autonomous, large scale concurrent, and distributed systems, yet cooperatively interacting as a collective in a common environment.

  11. The Design of Lexical Database for Indonesian Language

    NASA Astrophysics Data System (ADS)

    Gunawan, D.; Amalia, A.

    2017-03-01

    Kamus Besar Bahasa Indonesia (KBBI), an official dictionary for Indonesian language, provides lists of words with their meaning. The online version can be accessed via Internet network. Another online dictionary is Kateglo. KBBI online and Kateglo only provides an interface for human. A machine cannot retrieve data from the dictionary easily without using advanced techniques. Whereas, lexical of words is required in research or application development which related to natural language processing, text mining, information retrieval or sentiment analysis. To address this requirement, we need to build a lexical database which provides well-defined structured information about words. A well-known lexical database is WordNet, which provides the relation among words in English. This paper proposes the design of a lexical database for Indonesian language based on the combination of KBBI 4th edition, Kateglo and WordNet structure. Knowledge representation by utilizing semantic networks depict the relation among words and provide the new structure of lexical database for Indonesian language. The result of this design can be used as the foundation to build the lexical database for Indonesian language.

  12. Determining building interior structures using compressive sensing

    NASA Astrophysics Data System (ADS)

    Lagunas, Eva; Amin, Moeness G.; Ahmad, Fauzia; Nájar, Montse

    2013-04-01

    We consider imaging of the building interior structures using compressive sensing (CS) with applications to through-the-wall imaging and urban sensing. We consider a monostatic synthetic aperture radar imaging system employing stepped frequency waveform. The proposed approach exploits prior information of building construction practices to form an appropriate sparse representation of the building interior layout. We devise a dictionary of possible wall locations, which is consistent with the fact that interior walls are typically parallel or perpendicular to the front wall. The dictionary accounts for the dominant normal angle reflections from exterior and interior walls for the monostatic imaging system. CS is applied to a reduced set of observations to recover the true positions of the walls. Additional information about interior walls can be obtained using a dictionary of possible corner reflectors, which is the response of the junction of two walls. Supporting results based on simulation and laboratory experiments are provided. It is shown that the proposed sparsifying basis outperforms the conventional through-the-wall CS model, the wavelet sparsifying basis, and the block sparse model for building interior layout detection.

  13. Multi-scale learning based segmentation of glands in digital colonrectal pathology images.

    PubMed

    Gao, Yi; Liu, William; Arjun, Shipra; Zhu, Liangjia; Ratner, Vadim; Kurc, Tahsin; Saltz, Joel; Tannenbaum, Allen

    2016-02-01

    Digital histopathological images provide detailed spatial information of the tissue at micrometer resolution. Among the available contents in the pathology images, meso-scale information, such as the gland morphology, texture, and distribution, are useful diagnostic features. In this work, focusing on the colon-rectal cancer tissue samples, we propose a multi-scale learning based segmentation scheme for the glands in the colon-rectal digital pathology slides. The algorithm learns the gland and non-gland textures from a set of training images in various scales through a sparse dictionary representation. After the learning step, the dictionaries are used collectively to perform the classification and segmentation for the new image.

  14. Multi-scale learning based segmentation of glands in digital colonrectal pathology images

    NASA Astrophysics Data System (ADS)

    Gao, Yi; Liu, William; Arjun, Shipra; Zhu, Liangjia; Ratner, Vadim; Kurc, Tahsin; Saltz, Joel; Tannenbaum, Allen

    2016-03-01

    Digital histopathological images provide detailed spatial information of the tissue at micrometer resolution. Among the available contents in the pathology images, meso-scale information, such as the gland morphology, texture, and distribution, are useful diagnostic features. In this work, focusing on the colon-rectal cancer tissue samples, we propose a multi-scale learning based segmentation scheme for the glands in the colon-rectal digital pathology slides. The algorithm learns the gland and non-gland textures from a set of training images in various scales through a sparse dictionary representation. After the learning step, the dictionaries are used collectively to perform the classification and segmentation for the new image.

  15. Sparse representation of electrodermal activity with knowledge-driven dictionaries.

    PubMed

    Chaspari, Theodora; Tsiartas, Andreas; Stein, Leah I; Cermak, Sharon A; Narayanan, Shrikanth S

    2015-03-01

    Biometric sensors and portable devices are being increasingly embedded into our everyday life, creating the need for robust physiological models that efficiently represent, analyze, and interpret the acquired signals. We propose a knowledge-driven method to represent electrodermal activity (EDA), a psychophysiological signal linked to stress, affect, and cognitive processing. We build EDA-specific dictionaries that accurately model both the slow varying tonic part and the signal fluctuations, called skin conductance responses (SCR), and use greedy sparse representation techniques to decompose the signal into a small number of atoms from the dictionary. Quantitative evaluation of our method considers signal reconstruction, compression rate, and information retrieval measures, that capture the ability of the model to incorporate the main signal characteristics, such as SCR occurrences. Compared to previous studies fitting a predetermined structure to the signal, results indicate that our approach provides benefits across all aforementioned criteria. This paper demonstrates the ability of appropriate dictionaries along with sparse decomposition methods to reliably represent EDA signals and provides a foundation for automatic measurement of SCR characteristics and the extraction of meaningful EDA features.

  16. Note regarding the word 'behavior' in glossaries of introductory textbooks, dictionaries, and encyclopedias devoted to psychology.

    PubMed

    Abramson, Charles I; Place, Aaron J

    2005-10-01

    Glossaries of introductory textbooks in psychology, biology, and animal behavior were surveyed to find whether they induded the word 'behavior'. In addition to texts, encyclopedias and dictionaries devoted to the study of behavior were also surveyed. Of the 138 tests sampled across all three fields, only 38 (27%) included the term 'behavior' in their glossaries. Of the 15 encyclopedias and dictionaries surveyed, only 5 defined 'behavior'. To assess whether the term 'behavior' has disappeared from textbook glossaries or whether it has usually been absent, we sampled 23 introductory psychology texts written from 1886 to 1958. Only two texts contained glossaries, and the word 'behavior' was defined in both. An informal survey was conducted of students enrolled in introductory classes in psychology, biology, and animal behavior to provide data on the consistency of definitions. Students were asked to "define the word 'behavior'." Analysis indicated the definition was dependent upon the course. We suggest that future introductory textbook authors and editors of psychology-based dictionaries and encyclopedias include 'behavior' in their glossaries.

  17. The semantics of Chemical Markup Language (CML): dictionaries and conventions.

    PubMed

    Murray-Rust, Peter; Townsend, Joe A; Adams, Sam E; Phadungsukanan, Weerapong; Thomas, Jens

    2011-10-14

    The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs.

  18. The semantics of Chemical Markup Language (CML): dictionaries and conventions

    PubMed Central

    2011-01-01

    The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs. PMID:21999509

  19. USGS national surveys and analysis projects: Preliminary compilation of integrated geological datasets for the United States

    USGS Publications Warehouse

    Nicholson, Suzanne W.; Stoeser, Douglas B.; Wilson, Frederic H.; Dicken, Connie L.; Ludington, Steve

    2007-01-01

    The growth in the use of Geographic nformation Systems (GS) has highlighted the need for regional and national digital geologic maps attributed with age and rock type information. Such spatial data can be conveniently used to generate derivative maps for purposes that include mineral-resource assessment, metallogenic studies, tectonic studies, human health and environmental research. n 1997, the United States Geological Survey’s Mineral Resources Program initiated an effort to develop national digital databases for use in mineral resource and environmental assessments. One primary activity of this effort was to compile a national digital geologic map database, utilizing state geologic maps, to support mineral resource studies in the range of 1:250,000- to 1:1,000,000-scale. Over the course of the past decade, state databases were prepared using a common standard for the database structure, fields, attributes, and data dictionaries. As of late 2006, standardized geological map databases for all conterminous (CONUS) states have been available on-line as USGS Open-File Reports. For Alaska and Hawaii, new state maps are being prepared, and the preliminary work for Alaska is being released as a series of 1:500,000-scale regional compilations. See below for a list of all published databases.

  20. Shape prior modeling using sparse representation and online dictionary learning.

    PubMed

    Zhang, Shaoting; Zhan, Yiqiang; Zhou, Yan; Uzunbas, Mustafa; Metaxas, Dimitris N

    2012-01-01

    The recently proposed sparse shape composition (SSC) opens a new avenue for shape prior modeling. Instead of assuming any parametric model of shape statistics, SSC incorporates shape priors on-the-fly by approximating a shape instance (usually derived from appearance cues) by a sparse combination of shapes in a training repository. Theoretically, one can increase the modeling capability of SSC by including as many training shapes in the repository. However, this strategy confronts two limitations in practice. First, since SSC involves an iterative sparse optimization at run-time, the more shape instances contained in the repository, the less run-time efficiency SSC has. Therefore, a compact and informative shape dictionary is preferred to a large shape repository. Second, in medical imaging applications, training shapes seldom come in one batch. It is very time consuming and sometimes infeasible to reconstruct the shape dictionary every time new training shapes appear. In this paper, we propose an online learning method to address these two limitations. Our method starts from constructing an initial shape dictionary using the K-SVD algorithm. When new training shapes come, instead of re-constructing the dictionary from the ground up, we update the existing one using a block-coordinates descent approach. Using the dynamically updated dictionary, sparse shape composition can be gracefully scaled up to model shape priors from a large number of training shapes without sacrificing run-time efficiency. Our method is validated on lung localization in X-Ray and cardiac segmentation in MRI time series. Compared to the original SSC, it shows comparable performance while being significantly more efficient.

  1. The PDS4 Data Dictionary Tool - Metadata Design for Data Preparers

    NASA Astrophysics Data System (ADS)

    Raugh, A.; Hughes, J. S.

    2017-12-01

    One of the major design goals of the PDS4 development effort was to create an extendable Information Model (IM) for the archive, and to allow mission data designers/preparers to create extensions for metadata definitions specific to their own contexts. This capability is critical for the Planetary Data System - an archive that deals with a data collection that is diverse along virtually every conceivable axis. Amid such diversity in the data itself, it is in the best interests of the PDS archive and its users that all extensions to the IM follow the same design techniques, conventions, and restrictions as the core implementation itself. But it is unrealistic to expect mission data designers to acquire expertise in information modeling, model-driven design, ontology, schema formulation, and PDS4 design conventions and philosophy in order to define their own metadata. To bridge that expertise gap and bring the power of information modeling to the data label designer, the PDS Engineering Node has developed the data dictionary creation tool known as "LDDTool". This tool incorporates the same software used to maintain and extend the core IM, packaged with an interface that enables a developer to create his extension to the IM using the same, standards-based metadata framework PDS itself uses. Through this interface, the novice dictionary developer has immediate access to the common set of data types and unit classes for defining attributes, and a straight-forward method for constructing classes. The more experienced developer, using the same tool, has access to more sophisticated modeling methods like abstraction and extension, and can define context-specific validation rules. We present the key features of the PDS Local Data Dictionary Tool, which both supports the development of extensions to the PDS4 IM, and ensures their compatibility with the IM.

  2. The Planetary Data System (PDS) Data Dictionary Tool (LDDTool)

    NASA Astrophysics Data System (ADS)

    Raugh, Anne C.; Hughes, John S.

    2017-10-01

    One of the major design goals of the PDS4 development effort was to provide an avenue for discipline specialists and large data preparers such as mission archivists to extend the core PDS4 Information Model (IM) to include metadata definitions specific to their own contexts. This capability is critical for the Planetary Data System - an archive that deals with a data collection that is diverse along virtually every conceivable axis. Amid such diversity, it is in the best interests of the PDS archive and its users that all extensions to the core IM follow the same design techniques, conventions, and restrictions as the core implementation itself. Notwithstanding, expecting all mission and discipline archivist seeking to define metadata for a new context to acquire expertise in information modeling, model-driven design, ontology, schema formulation, and PDS4 design conventions and philosophy is unrealistic, to say the least.To bridge that expertise gap, the PDS Engineering Node has developed the data dictionary creation tool known as “LDDTool”. This tool incorporates the same software used to maintain and extend the core IM, packaged with an interface that enables a developer to create his contextual information model using the same, open standards-based metadata framework PDS itself uses. Through this interface, the novice dictionary developer has immediate access to the common set of data types and unit classes for defining attributes, and a straight-forward method for constructing classes. The more experienced developer, using the same tool, has access to more sophisticated modeling methods like abstraction and extension, and can define very sophisticated validation rules.We present the key features of the PDS Local Data Dictionary Tool, which both supports the development of extensions to the PDS4 IM, and ensures their compatibility with the IM.

  3. Liver segmentation from CT images using a sparse priori statistical shape model (SP-SSM).

    PubMed

    Wang, Xuehu; Zheng, Yongchang; Gan, Lan; Wang, Xuan; Sang, Xinting; Kong, Xiangfeng; Zhao, Jie

    2017-01-01

    This study proposes a new liver segmentation method based on a sparse a priori statistical shape model (SP-SSM). First, mark points are selected in the liver a priori model and the original image. Then, the a priori shape and its mark points are used to obtain a dictionary for the liver boundary information. Second, the sparse coefficient is calculated based on the correspondence between mark points in the original image and those in the a priori model, and then the sparse statistical model is established by combining the sparse coefficients and the dictionary. Finally, the intensity energy and boundary energy models are built based on the intensity information and the specific boundary information of the original image. Then, the sparse matching constraint model is established based on the sparse coding theory. These models jointly drive the iterative deformation of the sparse statistical model to approximate and accurately extract the liver boundaries. This method can solve the problems of deformation model initialization and a priori method accuracy using the sparse dictionary. The SP-SSM can achieve a mean overlap error of 4.8% and a mean volume difference of 1.8%, whereas the average symmetric surface distance and the root mean square symmetric surface distance can reach 0.8 mm and 1.4 mm, respectively.

  4. A Sparse Dictionary Learning-Based Adaptive Patch Inpainting Method for Thick Clouds Removal from High-Spatial Resolution Remote Sensing Imagery

    PubMed Central

    Yang, Xiaomei; Zhou, Chenghu; Li, Zhi

    2017-01-01

    Cloud cover is inevitable in optical remote sensing (RS) imagery on account of the influence of observation conditions, which limits the availability of RS data. Therefore, it is of great significance to be able to reconstruct the cloud-contaminated ground information. This paper presents a sparse dictionary learning-based image inpainting method for adaptively recovering the missing information corrupted by thick clouds patch-by-patch. A feature dictionary was learned from exemplars in the cloud-free regions, which was later utilized to infer the missing patches via sparse representation. To maintain the coherence of structures, structure sparsity was brought in to encourage first filling-in of missing patches on image structures. The optimization model of patch inpainting was formulated under the adaptive neighborhood-consistency constraint, which was solved by a modified orthogonal matching pursuit (OMP) algorithm. In light of these ideas, the thick-cloud removal scheme was designed and applied to images with simulated and true clouds. Comparisons and experiments show that our method can not only keep structures and textures consistent with the surrounding ground information, but also yield rare smoothing effect and block effect, which is more suitable for the removal of clouds from high-spatial resolution RS imagery with salient structures and abundant textured features. PMID:28914787

  5. A Sparse Dictionary Learning-Based Adaptive Patch Inpainting Method for Thick Clouds Removal from High-Spatial Resolution Remote Sensing Imagery.

    PubMed

    Meng, Fan; Yang, Xiaomei; Zhou, Chenghu; Li, Zhi

    2017-09-15

    Cloud cover is inevitable in optical remote sensing (RS) imagery on account of the influence of observation conditions, which limits the availability of RS data. Therefore, it is of great significance to be able to reconstruct the cloud-contaminated ground information. This paper presents a sparse dictionary learning-based image inpainting method for adaptively recovering the missing information corrupted by thick clouds patch-by-patch. A feature dictionary was learned from exemplars in the cloud-free regions, which was later utilized to infer the missing patches via sparse representation. To maintain the coherence of structures, structure sparsity was brought in to encourage first filling-in of missing patches on image structures. The optimization model of patch inpainting was formulated under the adaptive neighborhood-consistency constraint, which was solved by a modified orthogonal matching pursuit (OMP) algorithm. In light of these ideas, the thick-cloud removal scheme was designed and applied to images with simulated and true clouds. Comparisons and experiments show that our method can not only keep structures and textures consistent with the surrounding ground information, but also yield rare smoothing effect and block effect, which is more suitable for the removal of clouds from high-spatial resolution RS imagery with salient structures and abundant textured features.

  6. Database Dictionary for Ethiopian National Ground-Water DAtabase (ENGDA) Data Fields

    USGS Publications Warehouse

    Kuniansky, Eve L.; Litke, David W.; Tucci, Patrick

    2007-01-01

    Introduction This document describes the data fields that are used for both field forms and the Ethiopian National Ground-water Database (ENGDA) tables associated with information stored about production wells, springs, test holes, test wells, and water level or water-quality observation wells. Several different words are used in this database dictionary and in the ENGDA database to describe a narrow shaft constructed in the ground. The most general term is borehole, which is applicable to any type of hole. A well is a borehole specifically constructed to extract water from the ground; however, for this data dictionary and for the ENGDA database, the words well and borehole are used interchangeably. A production well is defined as any well used for water supply and includes hand-dug wells, small-diameter bored wells equipped with hand pumps, or large-diameter bored wells equipped with large-capacity motorized pumps. Test holes are borings made to collect information about the subsurface with continuous core or non-continuous core and/or where geophysical logs are collected. Test holes are not converted into wells. A test well is a well constructed for hydraulic testing of an aquifer in order to plan a larger ground-water production system. A water-level or water-quality observation well is a well that is used to collect information about an aquifer and not used for water supply. A spring is any naturally flowing, local, ground-water discharge site. The database dictionary is designed to help define all fields on both field data collection forms (provided in attachment 2 of this report) and for the ENGDA software screen entry forms (described in Litke, 2007). The data entered into each screen entry field are stored in relational database tables within the computer database. The organization of the database dictionary is designed based on field data collection and the field forms, because this is what the majority of people will use. After each field, however, the ENGDA database field name and relational database table is designated; along with the ENGDA screen entry form(s) and the ENGDA field form (attachment 2). The database dictionary is separated into sections. The first section, Basic Site Data Fields, describes the basic site information that is similar for all of the different types of sites. The remaining sections may be applicable for only one type of site; for example, the Well Drilling and Construction Data Fields and Lithologic Description Data Fields are applicable to boreholes and not to springs. Attachment 1 contains a table for conversion from English to metric units. Attachment 2 contains selected field forms used in conjunction with ENGDA. A separate document, 'Users Reference Manual for the Ethiopian National Ground-Water DAtabase (ENGDA),' by David W. Litke was developed as a users guide for the computer database and screen entry. This database dictionary serves as a reference for both the field forms and the computer database. Every effort has been made to have identical field names between the field forms and the screen entry forms in order to avoid confusion.

  7. Automatic coding and selection of causes of death: an adaptation of Iris software for using in Brazil.

    PubMed

    Martins, Renata Cristófani; Buchalla, Cassia Maria

    2015-01-01

    To prepare a dictionary in Portuguese for using in Iris and to evaluate its completeness for coding causes of death. Iniatially, a dictionary with all illness and injuries was created based on the International Classification of Diseases - tenth revision (ICD-10) codes. This dictionary was based on two sources: the electronic file of ICD-10 volume 1 and the data from Thesaurus of the International Classification of Primary Care (ICPC-2). Then, a death certificate sample from the Program of Improvement of Mortality Information in São Paulo (PRO-AIM) was coded manually and by Iris version V4.0.34, and the causes of death were compared. Whenever Iris was not able to code the causes of death, adjustments were made in the dictionary. Iris was able to code all causes of death in 94.4% death certificates, but only 50.6% were directly coded, without adjustments. Among death certificates that the software was unable to fully code, 89.2% had a diagnosis of external causes (chapter XX of ICD-10). This group of causes of death showed less agreement when comparing the coding by Iris to the manual one. The software performed well, but it needs adjustments and improvement in its dictionary. In the upcoming versions of the software, its developers are trying to solve the external causes of death problem.

  8. Fiber Orientation Estimation Guided by a Deep Network.

    PubMed

    Ye, Chuyang; Prince, Jerry L

    2017-09-01

    Diffusion magnetic resonance imaging (dMRI) is currently the only tool for noninvasively imaging the brain's white matter tracts. The fiber orientation (FO) is a key feature computed from dMRI for tract reconstruction. Because the number of FOs in a voxel is usually small, dictionary-based sparse reconstruction has been used to estimate FOs. However, accurate estimation of complex FO configurations in the presence of noise can still be challenging. In this work we explore the use of a deep network for FO estimation in a dictionary-based framework and propose an algorithm named Fiber Orientation Reconstruction guided by a Deep Network (FORDN). FORDN consists of two steps. First, we use a smaller dictionary encoding coarse basis FOs to represent diffusion signals. To estimate the mixture fractions of the dictionary atoms, a deep network is designed to solve the sparse reconstruction problem. Second, the coarse FOs inform the final FO estimation, where a larger dictionary encoding a dense basis of FOs is used and a weighted ℓ 1 -norm regularized least squares problem is solved to encourage FOs that are consistent with the network output. FORDN was evaluated and compared with state-of-the-art algorithms that estimate FOs using sparse reconstruction on simulated and typical clinical dMRI data. The results demonstrate the benefit of using a deep network for FO estimation.

  9. MO-DE-207A-05: Dictionary Learning Based Reconstruction with Low-Rank Constraint for Low-Dose Spectral CT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Q; Stanford University School of Medicine, Stanford, CA; Liu, H

    Purpose: Spectral CT enabled by an energy-resolved photon-counting detector outperforms conventional CT in terms of material discrimination, contrast resolution, etc. One reconstruction method for spectral CT is to generate a color image from a reconstructed component in each energy channel. However, given the radiation dose, the number of photons in each channel is limited, which will result in strong noise in each channel and affect the final color reconstruction. Here we propose a novel dictionary learning method for spectral CT that combines dictionary-based sparse representation method and the patch based low-rank constraint to simultaneously improve the reconstruction in each channelmore » and to address the inter-channel correlations to further improve the reconstruction. Methods: The proposed method has two important features: (1) guarantee of the patch based sparsity in each energy channel, which is the result of the dictionary based sparse representation constraint; (2) the explicit consideration of the correlations among different energy channels, which is realized by patch-by-patch nuclear norm-based low-rank constraint. For each channel, the dictionary consists of two sub-dictionaries. One is learned from the average of the images in all energy channels, and the other is learned from the average of the images in all energy channels except the current channel. With average operation to reduce noise, these two dictionaries can effectively preserve the structural details and get rid of artifacts caused by noise. Combining them together can express all structural information in current channel. Results: Dictionary learning based methods can obtain better results than FBP and the TV-based method. With low-rank constraint, the image quality can be further improved in the channel with more noise. The final color result by the proposed method has the best visual quality. Conclusion: The proposed method can effectively improve the image quality of low-dose spectral CT. This work is partially supported by the National Natural Science Foundation of China (No. 61302136), and the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2014JQ8317).« less

  10. Gene and protein nomenclature in public databases

    PubMed Central

    Fundel, Katrin; Zimmer, Ralf

    2006-01-01

    Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application. PMID:16899134

  11. The Role of Dictionaries in Language Learning.

    ERIC Educational Resources Information Center

    White, Philip A.

    1997-01-01

    Examines assumptions about dictionaries, especially the bilingual dictionary, and suggests ways of integrating the monolingual dictionary into the second-language instructional process. Findings indicate that the monolingual dictionary can coexist with bilingual dictionaries within a foreign-language course if the latter are appropriately used as…

  12. Training L2 Writers to Reference Corpora as a Self-Correction Tool

    ERIC Educational Resources Information Center

    Quinn, Cynthia

    2015-01-01

    Corpora have the potential to support the L2 writing process at the discourse level in contrast to the isolated dictionary entries that many intermediate writers rely on. To take advantage of this resource, learners need to be trained, which involves practising corpus research and referencing skills as well as learning to make data-based…

  13. Information Sharing and Collaboration Business Plan

    DTIC Science & Technology

    2006-03-30

    for information sharing The proposed environment will need a common definition of terms and dictionaries of competing terms where common definitions...a lexicon, a monolingual on-line handbook, and a thesaurus and ontology of abbreviations, acronyms, and terminology. (ISCO 2005, 18

  14. Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization.

    PubMed

    Gao, Shenghua; Tsang, Ivor Wai-Hung; Ma, Yi

    2014-02-01

    This paper targets fine-grained image categorization by learning a category-specific dictionary for each category and a shared dictionary for all the categories. Such category-specific dictionaries encode subtle visual differences among different categories, while the shared dictionary encodes common visual patterns among all the categories. To this end, we impose incoherence constraints among the different dictionaries in the objective of feature coding. In addition, to make the learnt dictionary stable, we also impose the constraint that each dictionary should be self-incoherent. Our proposed dictionary learning formulation not only applies to fine-grained classification, but also improves conventional basic-level object categorization and other tasks such as event recognition. Experimental results on five data sets show that our method can outperform the state-of-the-art fine-grained image categorization frameworks as well as sparse coding based dictionary learning frameworks. All these results demonstrate the effectiveness of our method.

  15. Which Dictionary? A Review of the Leading Learners' Dictionaries.

    ERIC Educational Resources Information Center

    Nesi, Hilary

    Three major dictionaries designed for learners of English as a second language are reviewed, their elements and approaches compared and evaluated, their usefulness for different learners discussed, and recommendations for future dictionary improvement made. The dictionaries in question are the "Oxford Advanced Learner's Dictionary," the…

  16. French Dictionaries. Series: Specialised Bibliographies.

    ERIC Educational Resources Information Center

    Klaar, R. M.

    This is a list of French monolingual, French-English and English-French dictionaries available in December 1975. Dictionaries of etymology, phonetics, place names, proper names, and slang are included, as well as dictionaries for children and dictionaries of Belgian, Canadian, and Swiss French. Most other specialized dictionaries, encyclopedias,…

  17. Which Desk Dictionary Is Best for Foreign Students of English?

    ERIC Educational Resources Information Center

    Yorkey, Richard

    1969-01-01

    "The American College Dictionary, "Funk and Wagnalls Standard College Dictionary," Webster's New World Dictionary of the American Language," The Random House Dictionary of the English Language," and Webster's Seventh New Collegiate Dictionary" are analyzed and ranked as to their usefulness for the foreign learner of English. (FWB)

  18. Multimodal registration via spatial-context mutual information.

    PubMed

    Yi, Zhao; Soatto, Stefano

    2011-01-01

    We propose a method to efficiently compute mutual information between high-dimensional distributions of image patches. This in turn is used to perform accurate registration of images captured under different modalities, while exploiting their local structure otherwise missed in traditional mutual information definition. We achieve this by organizing the space of image patches into orbits under the action of Euclidean transformations of the image plane, and estimating the modes of a distribution in such an orbit space using affinity propagation. This way, large collections of patches that are equivalent up to translations and rotations are mapped to the same representative, or "dictionary element". We then show analytically that computing mutual information for a joint distribution in this space reduces to computing mutual information between the (scalar) label maps, and between the transformations mapping each patch into its closest dictionary element. We show that our approach improves registration performance compared with the state of the art in multimodal registration, using both synthetic and real images with quantitative ground truth.

  19. Evaluation of the Executive Information Requirements for the Market Research Process.

    ERIC Educational Resources Information Center

    Lanser, Michael A.

    A study examined the marketing research information required by those executives of Lakeshore Technical College (Wisconsin) whose decisions affect the college's direction. Data were gathered from the following sources: literature review; development of a data dictionary framework; analysis of the college's current information system through…

  20. When Cancer Returns

    MedlinePlus

    ... content 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  1. Coping with Advanced Cancer

    MedlinePlus

    ... content 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  2. Information technology principles for management, reporting, and research.

    PubMed

    Gillam, Michael; Rothenhaus, Todd; Smith, Vernon; Kanhouwa, Meera

    2004-11-01

    Information technology holds the promise to enhance the ability of individuals and organizations to manage emergency departments, improve data sharing and reporting, and facilitate research. The Society for Academic Emergency Medicine (SAEM) Consensus Committee has identified nine principles to outline a path of optimal features and designs for current and future information technology systems. The principles roughly summarized include the following: utilize open database standards with clear data dictionaries, provide administrative access to necessary data, appoint and recognize individuals with emergency department informatics expertise, allow automated alert and proper identification for enrollment of cases into research, provide visual and statistical tools and training to analyze data, embed automated configurable alarm functionality for clinical and nonclinical systems, allow multiexport standard and format configurable reporting, strategically acquire mission-critical equipment that is networked and capable of automated feedback regarding functional status and location, and dedicate resources toward informatics research and development. The SAEM Consensus Committee concludes that the diligent application of these principles will enhance emergency department management, reporting, and research and ultimately improve the quality of delivered health care.

  3. Thinking about Complementary and Alternative Medicine

    MedlinePlus

    ... content 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  4. Caring for the Caregiver

    MedlinePlus

    ... Español 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  5. The Use of Monolingual Mobile Dictionaries in the Context of Reading by Intermediate Cantonese EFL Learners in Hong Kong

    ERIC Educational Resources Information Center

    Zou, Di; Xie, Haoran; Wang, Fu Lee

    2015-01-01

    Previous studies on dictionary consultation investigated mainly online dictionaries or simple pocket electronic dictionaries as they were commonly used among learners back then, yet the more updated mobile dictionaries were superficially investigated though they have already replaced the pocket electronic dictionaries. These studies are also…

  6. Sparse Representation for Infrared Dim Target Detection via a Discriminative Over-Complete Dictionary Learned Online

    PubMed Central

    Li, Zheng-Zhou; Chen, Jing; Hou, Qian; Fu, Hong-Xia; Dai, Zhen; Jin, Gang; Li, Ru-Zhang; Liu, Chang-Ju

    2014-01-01

    It is difficult for structural over-complete dictionaries such as the Gabor function and discriminative over-complete dictionary, which are learned offline and classified manually, to represent natural images with the goal of ideal sparseness and to enhance the difference between background clutter and target signals. This paper proposes an infrared dim target detection approach based on sparse representation on a discriminative over-complete dictionary. An adaptive morphological over-complete dictionary is trained and constructed online according to the content of infrared image by K-singular value decomposition (K-SVD) algorithm. Then the adaptive morphological over-complete dictionary is divided automatically into a target over-complete dictionary describing target signals, and a background over-complete dictionary embedding background by the criteria that the atoms in the target over-complete dictionary could be decomposed more sparsely based on a Gaussian over-complete dictionary than the one in the background over-complete dictionary. This discriminative over-complete dictionary can not only capture significant features of background clutter and dim targets better than a structural over-complete dictionary, but also strengthens the sparse feature difference between background and target more efficiently than a discriminative over-complete dictionary learned offline and classified manually. The target and background clutter can be sparsely decomposed over their corresponding over-complete dictionaries, yet couldn't be sparsely decomposed based on their opposite over-complete dictionary, so their residuals after reconstruction by the prescribed number of target and background atoms differ very visibly. Some experiments are included and the results show that this proposed approach could not only improve the sparsity more efficiently, but also enhance the performance of small target detection more effectively. PMID:24871988

  7. Sparse representation for infrared Dim target detection via a discriminative over-complete dictionary learned online.

    PubMed

    Li, Zheng-Zhou; Chen, Jing; Hou, Qian; Fu, Hong-Xia; Dai, Zhen; Jin, Gang; Li, Ru-Zhang; Liu, Chang-Ju

    2014-05-27

    It is difficult for structural over-complete dictionaries such as the Gabor function and discriminative over-complete dictionary, which are learned offline and classified manually, to represent natural images with the goal of ideal sparseness and to enhance the difference between background clutter and target signals. This paper proposes an infrared dim target detection approach based on sparse representation on a discriminative over-complete dictionary. An adaptive morphological over-complete dictionary is trained and constructed online according to the content of infrared image by K-singular value decomposition (K-SVD) algorithm. Then the adaptive morphological over-complete dictionary is divided automatically into a target over-complete dictionary describing target signals, and a background over-complete dictionary embedding background by the criteria that the atoms in the target over-complete dictionary could be decomposed more sparsely based on a Gaussian over-complete dictionary than the one in the background over-complete dictionary. This discriminative over-complete dictionary can not only capture significant features of background clutter and dim targets better than a structural over-complete dictionary, but also strengthens the sparse feature difference between background and target more efficiently than a discriminative over-complete dictionary learned offline and classified manually. The target and background clutter can be sparsely decomposed over their corresponding over-complete dictionaries, yet couldn't be sparsely decomposed based on their opposite over-complete dictionary, so their residuals after reconstruction by the prescribed number of target and background atoms differ very visibly. Some experiments are included and the results show that this proposed approach could not only improve the sparsity more efficiently, but also enhance the performance of small target detection more effectively.

  8. Chemical Entity Recognition and Resolution to ChEBI

    PubMed Central

    Grego, Tiago; Pesquita, Catia; Bastos, Hugo P.; Couto, Francisco M.

    2012-01-01

    Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2–5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks. PMID:25937941

  9. Children with Cancer: A Guide for Parents

    MedlinePlus

    ... content 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  10. Boosting drug named entity recognition using an aggregate classifier.

    PubMed

    Korkontzelos, Ioannis; Piliouras, Dimitrios; Dowsey, Andrew W; Ananiadou, Sophia

    2015-10-01

    Drug named entity recognition (NER) is a critical step for complex biomedical NLP tasks such as the extraction of pharmacogenomic, pharmacodynamic and pharmacokinetic parameters. Large quantities of high quality training data are almost always a prerequisite for employing supervised machine-learning techniques to achieve high classification performance. However, the human labour needed to produce and maintain such resources is a significant limitation. In this study, we improve the performance of drug NER without relying exclusively on manual annotations. We perform drug NER using either a small gold-standard corpus (120 abstracts) or no corpus at all. In our approach, we develop a voting system to combine a number of heterogeneous models, based on dictionary knowledge, gold-standard corpora and silver annotations, to enhance performance. To improve recall, we employed genetic programming to evolve 11 regular-expression patterns that capture common drug suffixes and used them as an extra means for recognition. Our approach uses a dictionary of drug names, i.e. DrugBank, a small manually annotated corpus, i.e. the pharmacokinetic corpus, and a part of the UKPMC database, as raw biomedical text. Gold-standard and silver annotated data are used to train maximum entropy and multinomial logistic regression classifiers. Aggregating drug NER methods, based on gold-standard annotations, dictionary knowledge and patterns, improved the performance on models trained on gold-standard annotations, only, achieving a maximum F-score of 95%. In addition, combining models trained on silver annotations, dictionary knowledge and patterns are shown to achieve comparable performance to models trained exclusively on gold-standard data. The main reason appears to be the morphological similarities shared among drug names. We conclude that gold-standard data are not a hard requirement for drug NER. Combining heterogeneous models build on dictionary knowledge can achieve similar or comparable classification performance with that of the best performing model trained on gold-standard annotations. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  11. DTU BCI speller: an SSVEP-based spelling system with dictionary support.

    PubMed

    Vilic, Adnan; Kjaer, Troels W; Thomsen, Carsten E; Puthusserypady, S; Sorensen, Helge B D

    2013-01-01

    In this paper, a new brain computer interface (BCI) speller, named DTU BCI speller, is introduced. It is based on the steady-state visual evoked potential (SSVEP) and features dictionary support. The system focuses on simplicity and user friendliness by using a single electrode for the signal acquisition and displays stimuli on a liquid crystal display (LCD). Nine healthy subjects participated in writing full sentences after a five minutes introduction to the system, and obtained an information transfer rate (ITR) of 21.94 ± 15.63 bits/min. The average amount of characters written per minute (CPM) is 4.90 ± 3.84 with a best case of 8.74 CPM. All subjects reported systematically on different user friendliness measures, and the overall results indicated the potentials of the DTU BCI Speller system. For subjects with high classification accuracies, the introduced dictionary approach greatly reduced the time it took to write full sentences.

  12. Discover mouse gene coexpression landscapes using dictionary learning and sparse coding.

    PubMed

    Li, Yujie; Chen, Hanbo; Jiang, Xi; Li, Xiang; Lv, Jinglei; Peng, Hanchuan; Tsien, Joe Z; Liu, Tianming

    2017-12-01

    Gene coexpression patterns carry rich information regarding enormously complex brain structures and functions. Characterization of these patterns in an unbiased, integrated, and anatomically comprehensive manner will illuminate the higher-order transcriptome organization and offer genetic foundations of functional circuitry. Here using dictionary learning and sparse coding, we derived coexpression networks from the space-resolved anatomical comprehensive in situ hybridization data from Allen Mouse Brain Atlas dataset. The key idea is that if two genes use the same dictionary to represent their original signals, then their gene expressions must share similar patterns, thereby considering them as "coexpressed." For each network, we have simultaneous knowledge of spatial distributions, the genes in the network and the extent a particular gene conforms to the coexpression pattern. Gene ontologies and the comparisons with published gene lists reveal biologically identified coexpression networks, some of which correspond to major cell types, biological pathways, and/or anatomical regions.

  13. FBRDLR: Fast blind reconstruction approach with dictionary learning regularization for infrared microscopy spectra

    NASA Astrophysics Data System (ADS)

    Liu, Tingting; Liu, Hai; Chen, Zengzhao; Chen, Yingying; Wang, Shengming; Liu, Zhi; Zhang, Hao

    2018-05-01

    Infrared (IR) spectra are the fingerprints of the molecules, and the spectral band location closely relates to the structure of a molecule. Thus, specimen identification can be performed based on IR spectroscopy. However, spectrally overlapping components prevent the specific identification of hyperfine molecular information of different substances. In this paper, we propose a fast blind reconstruction approach for IR spectra, which is based on sparse and redundant representations over a dictionary. The proposed method recovers the spectrum with the discrete wavelet transform dictionary on its content. The experimental results demonstrate that the proposed method is superior because of the better performance when compared with other state-of-the-art methods. The method the authors used remove the instrument aging issue to a large extent, thus leading the reconstruction IR spectra a more convenient tool for extracting features of an unknown material and interpreting it.

  14. Multiple sclerosis lesion segmentation using dictionary learning and sparse coding.

    PubMed

    Weiss, Nick; Rueckert, Daniel; Rao, Anil

    2013-01-01

    The segmentation of lesions in the brain during the development of Multiple Sclerosis is part of the diagnostic assessment for this disease and gives information on its current severity. This laborious process is still carried out in a manual or semiautomatic fashion by clinicians because published automatic approaches have not been universal enough to be widely employed in clinical practice. Thus Multiple Sclerosis lesion segmentation remains an open problem. In this paper we present a new unsupervised approach addressing this problem with dictionary learning and sparse coding methods. We show its general applicability to the problem of lesion segmentation by evaluating our approach on synthetic and clinical image data and comparing it to state-of-the-art methods. Furthermore the potential of using dictionary learning and sparse coding for such segmentation tasks is investigated and various possibilities for further experiments are discussed.

  15. Cerebellar Functional Parcellation Using Sparse Dictionary Learning Clustering.

    PubMed

    Wang, Changqing; Kipping, Judy; Bao, Chenglong; Ji, Hui; Qiu, Anqi

    2016-01-01

    The human cerebellum has recently been discovered to contribute to cognition and emotion beyond the planning and execution of movement, suggesting its functional heterogeneity. We aimed to identify the functional parcellation of the cerebellum using information from resting-state functional magnetic resonance imaging (rs-fMRI). For this, we introduced a new data-driven decomposition-based functional parcellation algorithm, called Sparse Dictionary Learning Clustering (SDLC). SDLC integrates dictionary learning, sparse representation of rs-fMRI, and k-means clustering into one optimization problem. The dictionary is comprised of an over-complete set of time course signals, with which a sparse representation of rs-fMRI signals can be constructed. Cerebellar functional regions were then identified using k-means clustering based on the sparse representation of rs-fMRI signals. We solved SDLC using a multi-block hybrid proximal alternating method that guarantees strong convergence. We evaluated the reliability of SDLC and benchmarked its classification accuracy against other clustering techniques using simulated data. We then demonstrated that SDLC can identify biologically reasonable functional regions of the cerebellum as estimated by their cerebello-cortical functional connectivity. We further provided new insights into the cerebello-cortical functional organization in children.

  16. Multi-Source Multi-Target Dictionary Learning for Prediction of Cognitive Decline.

    PubMed

    Zhang, Jie; Li, Qingyang; Caselli, Richard J; Thompson, Paul M; Ye, Jieping; Wang, Yalin

    2017-06-01

    Alzheimer's Disease (AD) is the most common type of dementia. Identifying correct biomarkers may determine pre-symptomatic AD subjects and enable early intervention. Recently, Multi-task sparse feature learning has been successfully applied to many computer vision and biomedical informatics researches. It aims to improve the generalization performance by exploiting the shared features among different tasks. However, most of the existing algorithms are formulated as a supervised learning scheme. Its drawback is with either insufficient feature numbers or missing label information. To address these challenges, we formulate an unsupervised framework for multi-task sparse feature learning based on a novel dictionary learning algorithm. To solve the unsupervised learning problem, we propose a two-stage Multi-Source Multi-Target Dictionary Learning (MMDL) algorithm. In stage 1, we propose a multi-source dictionary learning method to utilize the common and individual sparse features in different time slots. In stage 2, supported by a rigorous theoretical analysis, we develop a multi-task learning method to solve the missing label problem. Empirical studies on an N = 3970 longitudinal brain image data set, which involves 2 sources and 5 targets, demonstrate the improved prediction accuracy and speed efficiency of MMDL in comparison with other state-of-the-art algorithms.

  17. Subject-Specific Sparse Dictionary Learning for Atlas-Based Brain MRI Segmentation.

    PubMed

    Roy, Snehashis; He, Qing; Sweeney, Elizabeth; Carass, Aaron; Reich, Daniel S; Prince, Jerry L; Pham, Dzung L

    2015-09-01

    Quantitative measurements from segmentations of human brain magnetic resonance (MR) images provide important biomarkers for normal aging and disease progression. In this paper, we propose a patch-based tissue classification method from MR images that uses a sparse dictionary learning approach and atlas priors. Training data for the method consists of an atlas MR image, prior information maps depicting where different tissues are expected to be located, and a hard segmentation. Unlike most atlas-based classification methods that require deformable registration of the atlas priors to the subject, only affine registration is required between the subject and training atlas. A subject-specific patch dictionary is created by learning relevant patches from the atlas. Then the subject patches are modeled as sparse combinations of learned atlas patches leading to tissue memberships at each voxel. The combination of prior information in an example-based framework enables us to distinguish tissues having similar intensities but different spatial locations. We demonstrate the efficacy of the approach on the application of whole-brain tissue segmentation in subjects with healthy anatomy and normal pressure hydrocephalus, as well as lesion segmentation in multiple sclerosis patients. For each application, quantitative comparisons are made against publicly available state-of-the art approaches.

  18. Developing a data dictionary for the irish nursing minimum dataset.

    PubMed

    Henry, Pamela; Mac Neela, Pádraig; Clinton, Gerard; Scott, Anne; Treacy, Pearl; Butler, Michelle; Hyde, Abbey; Morris, Roisin; Irving, Kate; Byrne, Anne

    2006-01-01

    One of the challenges in health care in Ireland is the relatively slow acceptance of standardised clinical information systems. Yet the national Irish health reform programme indicates that an Electronic Health Care Record (EHCR) will be implemented on a phased basis. [3-5]. While nursing has a key role in ensuring the quality and comparability of health information, the so- called 'invisibility' of some nursing activities makes this a challenging aim to achieve [3-5]. Any integrated health care system requires the adoption of uniform standards for electronic data exchange [1-2]. One of the pre-requisites for uniform standards is the composition of a data dictionary. Inadequate definition of data elements in a particular dataset hinders the development of an integrated data depository or electronic health care record (EHCR). This paper outlines how work on the data dictionary for the Irish Nursing Minimum Dataset (INMDS) has addressed this issue. Data set elements were devised on the basis of a large scale empirical research programme. ISO 18104, the reference terminology for nursing [6], was used to cross-map the data set elements with semantic domains, categories and links and data set items were dissected.

  19. Usage Notes in the Oxford American Dictionary.

    ERIC Educational Resources Information Center

    Berner, R. Thomas

    1981-01-01

    Compares the "Oxford American Dictionary" with the "American Heritage Dictionary." Examines the dictionaries' differences in philosophies of language, introductory essays, and usage notes. Concludes that the "Oxford American Dictionary" is too conservative, paternalistic, and dogmatic for the 1980s. (DMM)

  20. Treatment Choices for Men with Early-Stage Prostate Cancer

    MedlinePlus

    ... content 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  1. Pain Control: Support for People with Cancer

    MedlinePlus

    ... Español 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  2. Chemotherapy and You: Support for People with Cancer

    MedlinePlus

    ... Español 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  3. Facing Forward Series: Life After Cancer Treatment

    MedlinePlus

    ... Español 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  4. Eating Hints: Before, During, and After Cancer Treatment

    MedlinePlus

    ... Español 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  5. Taking Time: Support for People with Cancer

    MedlinePlus

    ... Español 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  6. On the Application of Syntactic Methodologies in Automatic Text Analysis.

    ERIC Educational Resources Information Center

    Salton, Gerard; And Others

    1990-01-01

    Summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Topics discussed include syntactic analysis; use of machine-readable dictionary information; knowledge base construction; the PLNLP English Grammar (PEG) system; phrase normalization; and statistical and syntactic phrase evaluation used…

  7. Study of Large Data Resources for Multilingual Training and System Porting (Pub Version, Open Access)

    DTIC Science & Technology

    2016-05-03

    extraction trained on a large database corpus – English Fisher. Although the performance of ported monolingual system would be worse in comparison...Language TE LI HA LA ZU LLP hours 8.6 9.6 7.9 8.1 8.4 LM sentences 11935 10743 9861 11577 10644 LM words 68175 83157 93131 93328 60832 dictionary 14505

  8. Radiation Therapy and You: Support for People with Cancer

    MedlinePlus

    ... Español 1-800-4-CANCER Live Chat Publications Dictionary Menu Contact Dictionary Search About Cancer Causes and Prevention Risk Factors ... Levels of Evidence: Integrative Therapies Fact Sheets NCI Dictionaries NCI Dictionary of Cancer Terms NCI Drug Dictionary ...

  9. What Dictionary to Use? A Closer Look at the "Oxford Advanced Learner's Dictionary," the "Longman Dictionary of Contemporary English" and the "Longman Lexicon of Contempory English."

    ERIC Educational Resources Information Center

    Shaw, A. M.

    1983-01-01

    Three dictionaries are compared for their usefulness to teachers of English as a foreign language, teachers in training, students, and other users of English as a foreign language. The issue of monolingual versus bilingual dictionary format is discussed, and a previous analysis of the two bilingual dictionaries is summarized. Pronunciation…

  10. Managing Vocabulary Mapping Services

    PubMed Central

    Che, Chengjian; Monson, Kent; Poon, Kasey B.; Shakib, Shaun C.; Lau, Lee Min

    2005-01-01

    The efficient management and maintenance of large-scale and high-quality vocabulary mapping is an operational challenge. The 3M Health Information Systems (HIS) Healthcare Data Dictionary (HDD) group developed an information management system to provide controlled mapping services, resulting in improved efficiency and quality maintenance. PMID:16779203

  11. A Guide to Information Sources for Reading.

    ERIC Educational Resources Information Center

    Davis, Bonnie M., Comp.

    This volume is intended to serve as a guide to the literature and other sources of information related to the study and teaching of reading. Source materials cited include dictionaries, handbooks, guides, directories, bibliographies, recurring reviews, abstracting and indexing publications, journals, conference proceedings, associations, and…

  12. Machine-Assisted Indexing of Scientific Research Summaries

    ERIC Educational Resources Information Center

    And Others; Hunt, Bernard L.

    1975-01-01

    At the Smithsonian Science Information Exchange, a computer system indexes word combinations in research summaries, according to a Classifying Dictionary, prior to review by the professional staff. (Author/PF)

  13. DOCU-TEXT: A tool before the data dictionary

    NASA Technical Reports Server (NTRS)

    Carter, B.

    1983-01-01

    DOCU-TEXT, a proprietary software package that aids in the production of documentation for a data processing organization and can be installed and operated only on IBM computers is discussed. In organizing information that ultimately will reside in a data dictionary, DOCU-TEXT proved to be a useful documentation tool in extracting information from existing production jobs, procedure libraries, system catalogs, control data sets and related files. DOCU-TEXT reads these files to derive data that is useful at the system level. The output of DOCU-TEXT is a series of user selectable reports. These reports can reflect the interactions within a single job stream, a complete system, or all the systems in an installation. Any single report, or group of reports, can be generated in an independent documentation pass.

  14. DICTIONARIES AND LANGUAGE CHANGE.

    ERIC Educational Resources Information Center

    POOLEY, ROBERT C.

    TWO VIEWS OF A DICTIONARY'S PURPOSE CAME INTO SHARP CONFLICT UPON THE PUBLICATION OF WEBSTER'S "THIRD NEW INTERNATIONAL UNABRIDGED DICTIONARY." THE FIRST VIEW IS THAT A DICTIONARY IS A REFERENCE BOOK ON LANGUAGE ETIQUETTE, AN AUTHORITY FOR MAINTAINING THE PURITY OF THE ENGLISH LANGUAGE. THE SECOND IS THAT A DICTIONARY IS A SCIENTIFIC…

  15. Do Dictionaries Help Students Write?

    ERIC Educational Resources Information Center

    Nesi, Hilary

    Examples are given of real lexical errors made by learner writers, and consideration is given to the way in which three learners' dictionaries could deal with the lexical items that were misused. The dictionaries were the "Oxford Advanced Learner's Dictionary," the "Longman Dictionary of Contemporary English," and the "Chambers Universal Learners'…

  16. Seismic classification through sparse filter dictionaries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hickmann, Kyle Scott; Srinivasan, Gowri

    We tackle a multi-label classi cation problem involving the relation between acoustic- pro le features and the measured seismogram. To isolate components of the seismo- grams unique to each class of acoustic pro le we build dictionaries of convolutional lters. The convolutional- lter dictionaries for the individual classes are then combined into a large dictionary for the entire seismogram set. A given seismogram is classi ed by computing its representation in the large dictionary and then comparing reconstruction accuracy with this representation using each of the sub-dictionaries. The sub-dictionary with the minimal reconstruction error identi es the seismogram class.

  17. Development of the system of reactor thermophysical data on the basis of ontological modelling

    NASA Astrophysics Data System (ADS)

    Chusov, I. A.; Kirillov, P. L.; Bogoslovskaya, G. P.; Yunusov, L. K.; Obysov, N. A.; Novikov, G. E.; Pronyaev, V. G.; Erkimbaev, A. O.; Zitserman, V. Yu; Kobzev, G. A.; Trachtengerts, M. S.; Fokin, L. R.

    2017-11-01

    Compilation and processing of the thermophysical data was always an important task for the nuclear industry. The difficulties of the present stage of this activity are explained by sharp increase of the data volume and the number of new materials, as well as by the increased requirements to the reliability of the data used in the nuclear industry. General trend in the fields with predominantly orientation at the work with data (material science, chemistry and others) consists in the transition to a common infrastructure with integration of separate databases, Web-portals and other resources. This infrastructure provides the interoperability, the procedures of the data exchange, storage and dissemination. Key elements of this infrastructure is a domain-specific ontology, which provides a single information model and dictionary for semantic definitions. Formalizing the subject area, the ontology adapts the definitions for the different database schemes and provides the integration of heterogeneous data. The important property to be inherent for ontologies is a possibility of permanent expanding of new definitions, e.g. list of materials and properties. The expansion of the thermophysical data ontology at the reactor materials includes the creation of taxonomic dictionaries for thermophysical properties; the models for data presentation and their uncertainties; the inclusion along with the parameters of the state, some additional factors, such as the material porosity, the burnup rate, the irradiation rate and others; axiomatics of the properties applicable to the given class of materials.

  18. Chemical entity recognition in patents by combining dictionary-based and statistical approaches

    PubMed Central

    Akhondi, Saber A.; Pons, Ewoud; Afzal, Zubair; van Haagen, Herman; Becker, Benedikt F.H.; Hettne, Kristina M.; van Mulligen, Erik M.; Kors, Jan A.

    2016-01-01

    We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistical one. For this purpose the performance of several lexical resources was assessed using Peregrine, our open-source indexing engine. We combined our dictionary-based results on the patent corpus with the results of tmChem, a chemical recognizer using a conditional random field classifier. To improve the performance of tmChem, we utilized three additional features, viz. part-of-speech tags, lemmas and word-vector clusters. When evaluated on the training data, our final system obtained an F-score of 85.21% for the CEMP task, and an accuracy of 91.53% for the CPD task. On the test set, the best system ranked sixth among 21 teams for CEMP with an F-score of 86.82%, and second among nine teams for CPD with an accuracy of 94.23%. The differences in performance between the best ensemble system and the statistical system separately were small. Database URL: http://biosemantics.org/chemdner-patents PMID:27141091

  19. Management of Information Technology Access Controls

    DTIC Science & Technology

    1991-01-01

    Management Information Systems , (New York: American Elsevier Publishing Company, 1968), 8. 2. Webster’s Third New International Dictionary, Unabridged... Management Information Systems (New York: American Elsevier Publishing company, 1968), 37. 5. Ibid. 6. Ibid. 7. Gerald M. Ward and Jonathan D. Harris, "Data...Controls: A Visual Approach Through Integrated Management Information Systems . New York: American Elsevier Publishing Company, 1968. Brancheau, James C

  20. Current Business Information Sources at the Wright State University Library and the Dayton and Montgomery County Public Library.

    ERIC Educational Resources Information Center

    Jarrell, Howard R., Comp.; McCarthy, Pamela, Comp.

    This bibliography provides nearly 400 governmental and privately published business information sources including LC call numbers. Categories and subjects represented are bibliographies, periodical directories, abstracts and indexes, dictionaries and encyclopedias, specialized handbooks, biographical directories, industrial directories,…

  1. Coupled dictionary learning for joint MR image restoration and segmentation

    NASA Astrophysics Data System (ADS)

    Yang, Xuesong; Fan, Yong

    2018-03-01

    To achieve better segmentation of MR images, image restoration is typically used as a preprocessing step, especially for low-quality MR images. Recent studies have demonstrated that dictionary learning methods could achieve promising performance for both image restoration and image segmentation. These methods typically learn paired dictionaries of image patches from different sources and use a common sparse representation to characterize paired image patches, such as low-quality image patches and their corresponding high quality counterparts for the image restoration, and image patches and their corresponding segmentation labels for the image segmentation. Since learning these dictionaries jointly in a unified framework may improve the image restoration and segmentation simultaneously, we propose a coupled dictionary learning method to concurrently learn dictionaries for joint image restoration and image segmentation based on sparse representations in a multi-atlas image segmentation framework. Particularly, three dictionaries, including a dictionary of low quality image patches, a dictionary of high quality image patches, and a dictionary of segmentation label patches, are learned in a unified framework so that the learned dictionaries of image restoration and segmentation can benefit each other. Our method has been evaluated for segmenting the hippocampus in MR T1 images collected with scanners of different magnetic field strengths. The experimental results have demonstrated that our method achieved better image restoration and segmentation performance than state of the art dictionary learning and sparse representation based image restoration and image segmentation methods.

  2. On the Development of Speech Resources for the Mixtec Language

    PubMed Central

    2013-01-01

    The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). PMID:23710134

  3. WE-G-18A-04: 3D Dictionary Learning Based Statistical Iterative Reconstruction for Low-Dose Cone Beam CT Imaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bai, T; UT Southwestern Medical Center, Dallas, TX; Yan, H

    2014-06-15

    Purpose: To develop a 3D dictionary learning based statistical reconstruction algorithm on graphic processing units (GPU), to improve the quality of low-dose cone beam CT (CBCT) imaging with high efficiency. Methods: A 3D dictionary containing 256 small volumes (atoms) of 3x3x3 voxels was trained from a high quality volume image. During reconstruction, we utilized a Cholesky decomposition based orthogonal matching pursuit algorithm to find a sparse representation on this dictionary basis of each patch in the reconstructed image, in order to regularize the image quality. To accelerate the time-consuming sparse coding in the 3D case, we implemented our algorithm inmore » a parallel fashion by taking advantage of the tremendous computational power of GPU. Evaluations are performed based on a head-neck patient case. FDK reconstruction with full dataset of 364 projections is used as the reference. We compared the proposed 3D dictionary learning based method with a tight frame (TF) based one using a subset data of 121 projections. The image qualities under different resolutions in z-direction, with or without statistical weighting are also studied. Results: Compared to the TF-based CBCT reconstruction, our experiments indicated that 3D dictionary learning based CBCT reconstruction is able to recover finer structures, to remove more streaking artifacts, and is less susceptible to blocky artifacts. It is also observed that statistical reconstruction approach is sensitive to inconsistency between the forward and backward projection operations in parallel computing. Using high a spatial resolution along z direction helps improving the algorithm robustness. Conclusion: 3D dictionary learning based CBCT reconstruction algorithm is able to sense the structural information while suppressing noise, and hence to achieve high quality reconstruction. The GPU realization of the whole algorithm offers a significant efficiency enhancement, making this algorithm more feasible for potential clinical application. A high zresolution is preferred to stabilize statistical iterative reconstruction. This work was supported in part by NIH(1R01CA154747-01), NSFC((No. 61172163), Research Fund for the Doctoral Program of Higher Education of China (No. 20110201110011), China Scholarship Council.« less

  4. Tensor Dictionary Learning for Positive Definite Matrices.

    PubMed

    Sivalingam, Ravishankar; Boley, Daniel; Morellas, Vassilios; Papanikolopoulos, Nikolaos

    2015-11-01

    Sparse models have proven to be extremely successful in image processing and computer vision. However, a majority of the effort has been focused on sparse representation of vectors and low-rank models for general matrices. The success of sparse modeling, along with popularity of region covariances, has inspired the development of sparse coding approaches for these positive definite descriptors. While in earlier work, the dictionary was formed from all, or a random subset of, the training signals, it is clearly advantageous to learn a concise dictionary from the entire training set. In this paper, we propose a novel approach for dictionary learning over positive definite matrices. The dictionary is learned by alternating minimization between sparse coding and dictionary update stages, and different atom update methods are described. A discriminative version of the dictionary learning approach is also proposed, which simultaneously learns dictionaries for different classes in classification or clustering. Experimental results demonstrate the advantage of learning dictionaries from data both from reconstruction and classification viewpoints. Finally, a software library is presented comprising C++ binaries for all the positive definite sparse coding and dictionary learning approaches presented here.

  5. A Participatory Research Approach to develop an Arabic Symbol Dictionary.

    PubMed

    Draffan, E A; Kadous, Amatullah; Idris, Amal; Banes, David; Zeinoun, Nadine; Wald, Mike; Halabi, Nawar

    2015-01-01

    The purpose of the Arabic Symbol Dictionary research discussed in this paper, is to provide a resource of culturally, environmentally and linguistically suitable symbols to aid communication and literacy skills. A participatory approach with the use of online social media and a bespoke symbol management system has been established to enhance the process of matching a user based Arabic and English core vocabulary with appropriate imagery. Participants including AAC users, their families, carers, teachers and therapists who have been involved in the research from the outset, collating the vocabularies, debating cultural nuances for symbols and critiquing the design of technologies for selection procedures. The positive reaction of those who have voted on the symbols with requests for early use have justified the iterative nature of the methodologies used for this part of the project. However, constant re-evaluation will be necessary and in depth analysis of all the data received has yet to be completed.

  6. Medical and dermatology dictionaries: an examination of unstructured definitions and a proposal for the future.

    PubMed

    DeVries, David Todd; Papier, Art; Byrnes, Jennifer; Goldsmith, Lowell A

    2004-01-01

    Medical dictionaries serve to describe and clarify the term set used by medical professionals. In this commentary, we analyze a representative set of skin disease definitions from 2 prominent medical dictionaries, Stedman's Medical Dictionary and Dorland's Illustrated Medical Dictionary. We find that there is an apparent lack of stylistic standards with regard to content and form. We advocate a new standard form for the definition of medical terminology, a standard to complement the easy-to-read yet unstructured style of the traditional dictionary entry. This new form offers a reproducible structure, paving the way for the development of a computer readable "dictionary" of medical terminology. Such a dictionary offers immediate update capability and a fundamental improvement in the ability to search for relationships between terms.

  7. Data-Dictionary-Editing Program

    NASA Technical Reports Server (NTRS)

    Cumming, A. P.

    1989-01-01

    Access to data-dictionary relations and attributes made more convenient. Data Dictionary Editor (DDE) application program provides more convenient read/write access to data-dictionary table ("descriptions table") via data screen using SMARTQUERY function keys. Provides three main advantages: (1) User works with table names and field names rather than with table numbers and field numbers, (2) Provides online access to definitions of data-dictionary keys, and (3) Provides displayed summary list that shows, for each datum, which data-dictionary entries currently exist for any specific relation or attribute. Computer program developed to give developers of data bases more convenient access to the OMNIBASE VAX/IDM data-dictionary relations and attributes.

  8. Dictionary Learning Algorithms for Sparse Representation

    PubMed Central

    Kreutz-Delgado, Kenneth; Murray, Joseph F.; Rao, Bhaskar D.; Engan, Kjersti; Lee, Te-Won; Sejnowski, Terrence J.

    2010-01-01

    Algorithms for data-driven learning of domain-specific overcomplete dictionaries are developed to obtain maximum likelihood and maximum a posteriori dictionary estimates based on the use of Bayesian models with concave/Schur-concave (CSC) negative log priors. Such priors are appropriate for obtaining sparse representations of environmental signals within an appropriately chosen (environmentally matched) dictionary. The elements of the dictionary can be interpreted as concepts, features, or words capable of succinct expression of events encountered in the environment (the source of the measured signals). This is a generalization of vector quantization in that one is interested in a description involving a few dictionary entries (the proverbial “25 words or less”), but not necessarily as succinct as one entry. To learn an environmentally adapted dictionary capable of concise expression of signals generated by the environment, we develop algorithms that iterate between a representative set of sparse representations found by variants of FOCUSS and an update of the dictionary using these sparse representations. Experiments were performed using synthetic data and natural images. For complete dictionaries, we demonstrate that our algorithms have improved performance over other independent component analysis (ICA) methods, measured in terms of signal-to-noise ratios of separated sources. In the overcomplete case, we show that the true underlying dictionary and sparse sources can be accurately recovered. In tests with natural images, learned overcomplete dictionaries are shown to have higher coding efficiency than complete dictionaries; that is, images encoded with an over-complete dictionary have both higher compression (fewer bits per pixel) and higher accuracy (lower mean square error). PMID:12590811

  9. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    PubMed

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  10. Fast Low-Rank Shared Dictionary Learning for Image Classification.

    PubMed

    Tiep Huu Vu; Monga, Vishal

    2017-11-01

    Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. This observation has been exploited partially in a recently proposed dictionary learning framework by separating the particularity and the commonality (COPAR). Inspired by this, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification with more intuitive constraints. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (class-specific) dictionaries. For the shared dictionary, we enforce a low-rank constraint, i.e., claim that its spanning subspace should have low dimension and the coefficients corresponding to this dictionary should be similar. For the particular dictionaries, we impose on them the well-known constraints stated in the Fisher discrimination dictionary learning (FDDL). Furthermore, we develop new fast and accurate algorithms to solve the subproblems in the learning step, accelerating its convergence. The said algorithms could also be applied to FDDL and its extensions. The efficiencies of these algorithms are theoretically and experimentally verified by comparing their complexities and running time with those of other well-known dictionary learning methods. Experimental results on widely used image data sets establish the advantages of our method over the state-of-the-art dictionary learning methods.

  11. Fast Low-Rank Shared Dictionary Learning for Image Classification

    NASA Astrophysics Data System (ADS)

    Vu, Tiep Huu; Monga, Vishal

    2017-11-01

    Despite the fact that different objects possess distinct class-specific features, they also usually share common patterns. This observation has been exploited partially in a recently proposed dictionary learning framework by separating the particularity and the commonality (COPAR). Inspired by this, we propose a novel method to explicitly and simultaneously learn a set of common patterns as well as class-specific features for classification with more intuitive constraints. Our dictionary learning framework is hence characterized by both a shared dictionary and particular (class-specific) dictionaries. For the shared dictionary, we enforce a low-rank constraint, i.e. claim that its spanning subspace should have low dimension and the coefficients corresponding to this dictionary should be similar. For the particular dictionaries, we impose on them the well-known constraints stated in the Fisher discrimination dictionary learning (FDDL). Further, we develop new fast and accurate algorithms to solve the subproblems in the learning step, accelerating its convergence. The said algorithms could also be applied to FDDL and its extensions. The efficiencies of these algorithms are theoretically and experimentally verified by comparing their complexities and running time with those of other well-known dictionary learning methods. Experimental results on widely used image datasets establish the advantages of our method over state-of-the-art dictionary learning methods.

  12. A Study of Comparatively Low Achievement Students' Bilingualized Dictionary Use and Their English Learning

    ERIC Educational Resources Information Center

    Chen, Szu-An

    2016-01-01

    This study investigates bilingualized dictionary use of Taiwanese university students. It aims to examine EFL learners' overall dictionary use behavior and their perspectives on book dictionary as well as the necessity of advance guidance in using dictionaries. Data was collected through questionnaires and analyzed by SPSS 15.0. Findings indicate…

  13. Using Different Types of Dictionaries for Improving EFL Reading Comprehension and Vocabulary Learning

    ERIC Educational Resources Information Center

    Alharbi, Majed A.

    2016-01-01

    This study investigated the effects of monolingual book dictionaries, popup dictionaries, and type-in dictionaries on improving reading comprehension and vocabulary learning in an EFL program. An experimental design involving four groups and a post-test was chosen for the experiment: (1) pop-up dictionary (experimental group 1); (2) type-in…

  14. Students Working with an English Learners' Dictionary on CD-ROM.

    ERIC Educational Resources Information Center

    Winkler, Birgit

    This paper examines the growing literature on pedagogical lexicography and the growing focus on how well the learner uses the dictionary in second language learning. Dictionaries are becoming more user-friendly. This study used the writing task to reveal new insights into how students use a CD-ROM dictionary. It found a lack of dictionary-using…

  15. The Effects of Dictionary Use on the Vocabulary Learning Strategies Used by Language Learners of Spanish.

    ERIC Educational Resources Information Center

    Hsien-jen, Chin

    This study investigated the effects of dictionary use on the vocabulary learning strategies used by intermediate college-level Spanish learners to understand new vocabulary items in a reading test. Participants were randomly assigned to one of three groups: control (without a dictionary), bilingual dictionary (using a Spanish-English dictionary),…

  16. Resilience - A Concept

    DTIC Science & Technology

    2016-04-05

    dictionary ]. Retrieved from http://www.investopedia.com/terms/b/blackbox.asp Bodeau, D., Brtis, J., Graubart, R., & Salwen, J. (2013). Resiliency...techniques for systems-of-systems (Report No. 13-3513). Bedford, MA: The MITRE Corporation. Confidence, (n.d.). In Oxford dictionaries [Online dictionary ...Acquisition, Technology and Logistics. Holistic Strategy Approach. (n.d.). In BusinessDictionary.com [Online business dictionary ]. Retrieved from http

  17. Improving the Incoherence of a Learned Dictionary via Rank Shrinkage.

    PubMed

    Ubaru, Shashanka; Seghouane, Abd-Krim; Saad, Yousef

    2017-01-01

    This letter considers the problem of dictionary learning for sparse signal representation whose atoms have low mutual coherence. To learn such dictionaries, at each step, we first update the dictionary using the method of optimal directions (MOD) and then apply a dictionary rank shrinkage step to decrease its mutual coherence. In the rank shrinkage step, we first compute a rank 1 decomposition of the column-normalized least squares estimate of the dictionary obtained from the MOD step. We then shrink the rank of this learned dictionary by transforming the problem of reducing the rank to a nonnegative garrotte estimation problem and solving it using a path-wise coordinate descent approach. We establish theoretical results that show that the rank shrinkage step included will reduce the coherence of the dictionary, which is further validated by experimental results. Numerical experiments illustrating the performance of the proposed algorithm in comparison to various other well-known dictionary learning algorithms are also presented.

  18. Contour Tracking in Echocardiographic Sequences via Sparse Representation and Dictionary Learning

    PubMed Central

    Huang, Xiaojie; Dione, Donald P.; Compas, Colin B.; Papademetris, Xenophon; Lin, Ben A.; Bregasi, Alda; Sinusas, Albert J.; Staib, Lawrence H.; Duncan, James S.

    2013-01-01

    This paper presents a dynamical appearance model based on sparse representation and dictionary learning for tracking both endocardial and epicardial contours of the left ventricle in echocardiographic sequences. Instead of learning offline spatiotemporal priors from databases, we exploit the inherent spatiotemporal coherence of individual data to constraint cardiac contour estimation. The contour tracker is initialized with a manual tracing of the first frame. It employs multiscale sparse representation of local image appearance and learns online multiscale appearance dictionaries in a boosting framework as the image sequence is segmented frame-by-frame sequentially. The weights of multiscale appearance dictionaries are optimized automatically. Our region-based level set segmentation integrates a spectrum of complementary multilevel information including intensity, multiscale local appearance, and dynamical shape prediction. The approach is validated on twenty-six 4D canine echocardiographic images acquired from both healthy and post-infarct canines. The segmentation results agree well with expert manual tracings. The ejection fraction estimates also show good agreement with manual results. Advantages of our approach are demonstrated by comparisons with a conventional pure intensity model, a registration-based contour tracker, and a state-of-the-art database-dependent offline dynamical shape model. We also demonstrate the feasibility of clinical application by applying the method to four 4D human data sets. PMID:24292554

  19. Multi-Source Multi-Target Dictionary Learning for Prediction of Cognitive Decline

    PubMed Central

    Zhang, Jie; Li, Qingyang; Caselli, Richard J.; Thompson, Paul M.; Ye, Jieping; Wang, Yalin

    2017-01-01

    Alzheimer’s Disease (AD) is the most common type of dementia. Identifying correct biomarkers may determine pre-symptomatic AD subjects and enable early intervention. Recently, Multi-task sparse feature learning has been successfully applied to many computer vision and biomedical informatics researches. It aims to improve the generalization performance by exploiting the shared features among different tasks. However, most of the existing algorithms are formulated as a supervised learning scheme. Its drawback is with either insufficient feature numbers or missing label information. To address these challenges, we formulate an unsupervised framework for multi-task sparse feature learning based on a novel dictionary learning algorithm. To solve the unsupervised learning problem, we propose a two-stage Multi-Source Multi-Target Dictionary Learning (MMDL) algorithm. In stage 1, we propose a multi-source dictionary learning method to utilize the common and individual sparse features in different time slots. In stage 2, supported by a rigorous theoretical analysis, we develop a multi-task learning method to solve the missing label problem. Empirical studies on an N = 3970 longitudinal brain image data set, which involves 2 sources and 5 targets, demonstrate the improved prediction accuracy and speed efficiency of MMDL in comparison with other state-of-the-art algorithms. PMID:28943731

  20. NHD Event Data Dictionary

    EPA Pesticide Factsheets

    The Reach Address Database (RAD) stores reach address information for each Water Program feature that has been linked to the underlying surface water features (streams, lakes, etc) in the National Hydrology Database (NHD) Plus dataset.

  1. Cysts

    MedlinePlus

    ... Pituitary Tumor PNET Schwannoma 2016 WHO Classification Risk Factors Brain Tumor Facts Brain Tumor Dictionary Webinars Anytime Learning About Us Our Founders Board of Directors Staff Leadership Strategic Plan Financials News Careers Brain Tumor Information Brain Anatomy Brain ...

  2. Astrocytoma

    MedlinePlus

    ... Pituitary Tumor PNET Schwannoma 2016 WHO Classification Risk Factors Brain Tumor Facts Brain Tumor Dictionary Webinars Anytime Learning About Us Our Founders Board of Directors Staff Leadership Strategic Plan Financials News Careers Brain Tumor Information Brain Anatomy Brain ...

  3. Chondrosarcoma

    MedlinePlus

    ... Pituitary Tumor PNET Schwannoma 2016 WHO Classification Risk Factors Brain Tumor Facts Brain Tumor Dictionary Webinars Anytime Learning About Us Our Founders Board of Directors Staff Leadership Strategic Plan Financials News Careers Brain Tumor Information Brain Anatomy Brain ...

  4. Ependymoma

    MedlinePlus

    ... Pituitary Tumor PNET Schwannoma 2016 WHO Classification Risk Factors Brain Tumor Facts Brain Tumor Dictionary Webinars Anytime Learning About Us Our Founders Board of Directors Staff Leadership Strategic Plan Financials News Careers Brain Tumor Information Brain Anatomy Brain ...

  5. Highlands County Energy Lessons. Middle School Level - Science, Mathematics, Social Studies, Vocational Education.

    ERIC Educational Resources Information Center

    Allen, Rodney F., Ed.; Farmer, Richard

    Middle school energy skills (Enerskills) and activities (Eneractivities) are provided in seven sections. Areas addressed include: (1) locating energy information using telephone books, dictionaries, card catalogs, and readers' guides; (2) writing letters for energy information; (3) energy and food (food intake/human performance, calories/energy);…

  6. The Making of the "Oxford English Dictionary."

    ERIC Educational Resources Information Center

    Winchester, Simon

    2003-01-01

    Summarizes remarks made to open the Gallaudet University conference on Dictionaries and the Standardization of languages. It concerns the making of what is arguably the world's greatest dictionary, "The Oxford English Dictionary." (VWL)

  7. A novel structured dictionary for fast processing of 3D medical images, with application to computed tomography restoration and denoising

    NASA Astrophysics Data System (ADS)

    Karimi, Davood; Ward, Rabab K.

    2016-03-01

    Sparse representation of signals in learned overcomplete dictionaries has proven to be a powerful tool with applications in denoising, restoration, compression, reconstruction, and more. Recent research has shown that learned overcomplete dictionaries can lead to better results than analytical dictionaries such as wavelets in almost all image processing applications. However, a major disadvantage of these dictionaries is that their learning and usage is very computationally intensive. In particular, finding the sparse representation of a signal in these dictionaries requires solving an optimization problem that leads to very long computational times, especially in 3D image processing. Moreover, the sparse representation found by greedy algorithms is usually sub-optimal. In this paper, we propose a novel two-level dictionary structure that improves the performance and the speed of standard greedy sparse coding methods. The first (i.e., the top) level in our dictionary is a fixed orthonormal basis, whereas the second level includes the atoms that are learned from the training data. We explain how such a dictionary can be learned from the training data and how the sparse representation of a new signal in this dictionary can be computed. As an application, we use the proposed dictionary structure for removing the noise and artifacts in 3D computed tomography (CT) images. Our experiments with real CT images show that the proposed method achieves results that are comparable with standard dictionary-based methods while substantially reducing the computational time.

  8. Revealing topics and their evolution in biomedical literature using Bio-DTM: a case study of ginseng.

    PubMed

    Chen, Qian; Ai, Ni; Liao, Jie; Shao, Xin; Liu, Yufeng; Fan, Xiaohui

    2017-01-01

    Valuable scientific results on biomedicine are very rich, but they are widely scattered in the literature. Topic modeling enables researchers to discover themes from an unstructured collection of documents without any prior annotations or labels. In this paper, taking ginseng as an example, biological dynamic topic model (Bio-DTM) was proposed to conduct a retrospective study and interpret the temporal evolution of the research of ginseng. The system of Bio-DTM mainly includes four components, documents pre-processing, bio-dictionary construction, dynamic topic models, topics analysis and visualization. Scientific articles pertaining to ginseng were retrieved through text mining from PubMed. The bio-dictionary integrates MedTerms medical dictionary, the second edition of side effect resource, a dictionary of biology and HGNC database of human gene names (HGNC). A dynamic topic model, a text mining technique, was used to emphasize on capturing the development trends of topics in a sequentially collected documents. Besides the contents of topics taken on, the evolution of topics was visualized over time using ThemeRiver. From the topic 9, ginseng was used in dietary supplements and complementary and integrative health practices, and became very popular since the early twentieth century. Topic 6 reminded that the planting of ginseng is a major area of research and symbiosis and allelopathy of ginseng became a research hotspot in 2007. In addition, the Bio-DTM model gave an insight into the main pharmacologic effects of ginseng, such as anti-metabolic disorder effect, cardioprotective effect, anti-cancer effect, hepatoprotective effect, anti-thrombotic effect and neuroprotective effect. The Bio-DTM model not only discovers what ginseng's research involving in but also displays how these topics evolving over time. This approach can be applied to the biomedical field to conduct a retrospective study and guide future studies.

  9. MiDas: Automatic Extraction of a Common Domain of Discourse in Sleep Medicine for Multi-center Data Integration

    PubMed Central

    Sahoo, Satya S.; Ogbuji, Chimezie; Luo, Lingyun; Dong, Xiao; Cui, Licong; Redline, Susan S.; Zhang, Guo-Qiang

    2011-01-01

    Clinical studies often use data dictionaries with controlled sets of terms to facilitate data collection, limited interoperability and sharing at a local site. Multi-center retrospective clinical studies require that these data dictionaries, originating from individual participating centers, be harmonized in preparation for the integration of the corresponding clinical research data. Domain ontologies are often used to facilitate multi-center data integration by modeling terms from data dictionaries in a logic-based language, but interoperability among domain ontologies (using automated techniques) is an unresolved issue. Although many upper-level reference ontologies have been proposed to address this challenge, our experience in integrating multi-center sleep medicine data highlights the need for an upper level ontology that models a common set of terms at multiple-levels of abstraction, which is not covered by the existing upper-level ontologies. We introduce a methodology underpinned by a Minimal Domain of Discourse (MiDas) algorithm to automatically extract a minimal common domain of discourse (upper-domain ontology) from an existing domain ontology. Using the Multi-Modality, Multi-Resource Environment for Physiological and Clinical Research (Physio-MIMI) multi-center project in sleep medicine as a use case, we demonstrate the use of MiDas in extracting a minimal domain of discourse for sleep medicine, from Physio-MIMI’s Sleep Domain Ontology (SDO). We then extend the resulting domain of discourse with terms from the data dictionary of the Sleep Heart and Health Study (SHHS) to validate MiDas. To illustrate the wider applicability of MiDas, we automatically extract the respective domains of discourse from 6 sample domain ontologies from the National Center for Biomedical Ontologies (NCBO) and the OBO Foundry. PMID:22195180

  10. MiDas: automatic extraction of a common domain of discourse in sleep medicine for multi-center data integration.

    PubMed

    Sahoo, Satya S; Ogbuji, Chimezie; Luo, Lingyun; Dong, Xiao; Cui, Licong; Redline, Susan S; Zhang, Guo-Qiang

    2011-01-01

    Clinical studies often use data dictionaries with controlled sets of terms to facilitate data collection, limited interoperability and sharing at a local site. Multi-center retrospective clinical studies require that these data dictionaries, originating from individual participating centers, be harmonized in preparation for the integration of the corresponding clinical research data. Domain ontologies are often used to facilitate multi-center data integration by modeling terms from data dictionaries in a logic-based language, but interoperability among domain ontologies (using automated techniques) is an unresolved issue. Although many upper-level reference ontologies have been proposed to address this challenge, our experience in integrating multi-center sleep medicine data highlights the need for an upper level ontology that models a common set of terms at multiple-levels of abstraction, which is not covered by the existing upper-level ontologies. We introduce a methodology underpinned by a Minimal Domain of Discourse (MiDas) algorithm to automatically extract a minimal common domain of discourse (upper-domain ontology) from an existing domain ontology. Using the Multi-Modality, Multi-Resource Environment for Physiological and Clinical Research (Physio-MIMI) multi-center project in sleep medicine as a use case, we demonstrate the use of MiDas in extracting a minimal domain of discourse for sleep medicine, from Physio-MIMI's Sleep Domain Ontology (SDO). We then extend the resulting domain of discourse with terms from the data dictionary of the Sleep Heart and Health Study (SHHS) to validate MiDas. To illustrate the wider applicability of MiDas, we automatically extract the respective domains of discourse from 6 sample domain ontologies from the National Center for Biomedical Ontologies (NCBO) and the OBO Foundry.

  11. Induced lexico-syntactic patterns improve information extraction from online medical forums.

    PubMed

    Gupta, Sonal; MacLean, Diana L; Heer, Jeffrey; Manning, Christopher D

    2014-01-01

    To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  12. Deconvolving molecular signatures of interactions between microbial colonies

    PubMed Central

    Harn, Y.-C.; Powers, M. J.; Shank, E. A.; Jojic, V.

    2015-01-01

    Motivation: The interactions between microbial colonies through chemical signaling are not well understood. A microbial colony can use different molecules to inhibit or accelerate the growth of other colonies. A better understanding of the molecules involved in these interactions could lead to advancements in health and medicine. Imaging mass spectrometry (IMS) applied to co-cultured microbial communities aims to capture the spatial characteristics of the colonies’ molecular fingerprints. These data are high-dimensional and require computational analysis methods to interpret. Results: Here, we present a dictionary learning method that deconvolves spectra of different molecules from IMS data. We call this method MOLecular Dictionary Learning (MOLDL). Unlike standard dictionary learning methods which assume Gaussian-distributed data, our method uses the Poisson distribution to capture the count nature of the mass spectrometry data. Also, our method incorporates universally applicable information on common ion types of molecules in MALDI mass spectrometry. This greatly reduces model parameterization and increases deconvolution accuracy by eliminating spurious solutions. Moreover, our method leverages the spatial nature of IMS data by assuming that nearby locations share similar abundances, thus avoiding overfitting to noise. Tests on simulated datasets show that this method has good performance in recovering molecule dictionaries. We also tested our method on real data measured on a microbial community composed of two species. We confirmed through follow-up validation experiments that our method recovered true and complete signatures of molecules. These results indicate that our method can discover molecules in IMS data reliably, and hence can help advance the study of interaction of microbial colonies. Availability and implementation: The code used in this paper is available at: https://github.com/frizfealer/IMS_project. Contact: vjojic@cs.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26072476

  13. Dictionary Approaches to Image Compression and Reconstruction

    NASA Technical Reports Server (NTRS)

    Ziyad, Nigel A.; Gilmore, Erwin T.; Chouikha, Mohamed F.

    1998-01-01

    This paper proposes using a collection of parameterized waveforms, known as a dictionary, for the purpose of medical image compression. These waveforms, denoted as phi(sub gamma), are discrete time signals, where gamma represents the dictionary index. A dictionary with a collection of these waveforms is typically complete or overcomplete. Given such a dictionary, the goal is to obtain a representation image based on the dictionary. We examine the effectiveness of applying Basis Pursuit (BP), Best Orthogonal Basis (BOB), Matching Pursuits (MP), and the Method of Frames (MOF) methods for the compression of digitized radiological images with a wavelet-packet dictionary. The performance of these algorithms is studied for medical images with and without additive noise.

  14. Polarimetric SAR image classification based on discriminative dictionary learning model

    NASA Astrophysics Data System (ADS)

    Sang, Cheng Wei; Sun, Hong

    2018-03-01

    Polarimetric SAR (PolSAR) image classification is one of the important applications of PolSAR remote sensing. It is a difficult high-dimension nonlinear mapping problem, the sparse representations based on learning overcomplete dictionary have shown great potential to solve such problem. The overcomplete dictionary plays an important role in PolSAR image classification, however for PolSAR image complex scenes, features shared by different classes will weaken the discrimination of learned dictionary, so as to degrade classification performance. In this paper, we propose a novel overcomplete dictionary learning model to enhance the discrimination of dictionary. The learned overcomplete dictionary by the proposed model is more discriminative and very suitable for PolSAR classification.

  15. Dictionary Approaches to Image Compression and Reconstruction

    NASA Technical Reports Server (NTRS)

    Ziyad, Nigel A.; Gilmore, Erwin T.; Chouikha, Mohamed F.

    1998-01-01

    This paper proposes using a collection of parameterized waveforms, known as a dictionary, for the purpose of medical image compression. These waveforms, denoted as lambda, are discrete time signals, where y represents the dictionary index. A dictionary with a collection of these waveforms Is typically complete or over complete. Given such a dictionary, the goal is to obtain a representation Image based on the dictionary. We examine the effectiveness of applying Basis Pursuit (BP), Best Orthogonal Basis (BOB), Matching Pursuits (MP), and the Method of Frames (MOF) methods for the compression of digitized radiological images with a wavelet-packet dictionary. The performance of these algorithms is studied for medical images with and without additive noise.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Q; Han, H; Xing, L

    Purpose: Dictionary learning based method has attracted more and more attentions in low-dose CT due to the superior performance on suppressing noise and preserving structural details. Considering the structures and noise vary from region to region in one imaging object, we propose a region-specific dictionary learning method to improve the low-dose CT reconstruction. Methods: A set of normal-dose images was used for dictionary learning. Segmentations were performed on these images, so that the training patch sets corresponding to different regions can be extracted out. After that, region-specific dictionaries were learned from these training sets. For the low-dose CT reconstruction, amore » conventional reconstruction, such as filtered back-projection (FBP), was performed firstly, and then segmentation was followed to segment the image into different regions. Sparsity constraints of each region based on its dictionary were used as regularization terms. The regularization parameters were selected adaptively according to different regions. A low-dose human thorax dataset was used to evaluate the proposed method. The single dictionary based method was performed for comparison. Results: Since the lung region is very different from the other part of thorax, two dictionaries corresponding to lung region and the rest part of thorax respectively were learned to better express the structural details and avoid artifacts. With only one dictionary some artifact appeared in the body region caused by the spot atoms corresponding to the structures in the lung region. And also some structure in the lung regions cannot be recovered well by only one dictionary. The quantitative indices of the result by the proposed method were also improved a little compared to the single dictionary based method. Conclusion: Region-specific dictionary can make the dictionary more adaptive to different region characteristics, which is much desirable for enhancing the performance of dictionary learning based method.« less

  17. Robust multi-atlas label propagation by deep sparse representation

    PubMed Central

    Zu, Chen; Wang, Zhengxia; Zhang, Daoqiang; Liang, Peipeng; Shi, Yonghong; Shen, Dinggang; Wu, Guorong

    2016-01-01

    Recently, multi-atlas patch-based label fusion has achieved many successes in medical imaging area. The basic assumption in the current state-of-the-art approaches is that the image patch at the target image point can be represented by a patch dictionary consisting of atlas patches from registered atlas images. Therefore, the label at the target image point can be determined by fusing labels of atlas image patches with similar anatomical structures. However, such assumption on image patch representation does not always hold in label fusion since (1) the image content within the patch may be corrupted due to noise and artifact; and (2) the distribution of morphometric patterns among atlas patches might be unbalanced such that the majority patterns can dominate label fusion result over other minority patterns. The violation of the above basic assumptions could significantly undermine the label fusion accuracy. To overcome these issues, we first consider forming label-specific group for the atlas patches with the same label. Then, we alter the conventional flat and shallow dictionary to a deep multi-layer structure, where the top layer (label-specific dictionaries) consists of groups of representative atlas patches and the subsequent layers (residual dictionaries) hierarchically encode the patchwise residual information in different scales. Thus, the label fusion follows the representation consensus across representative dictionaries. However, the representation of target patch in each group is iteratively optimized by using the representative atlas patches in each label-specific dictionary exclusively to match the principal patterns and also using all residual patterns across groups collaboratively to overcome the issue that some groups might be absent of certain variation patterns presented in the target image patch. Promising segmentation results have been achieved in labeling hippocampus on ADNI dataset, as well as basal ganglia and brainstem structures, compared to other counterpart label fusion methods. PMID:27942077

  18. Coding of adverse events of suicidality in clinical study reports of duloxetine for the treatment of major depressive disorder: descriptive study.

    PubMed

    Maund, Emma; Tendal, Britta; Hróbjartsson, Asbjørn; Lundh, Andreas; Gøtzsche, Peter C

    2014-06-04

    To assess the effects of coding and coding conventions on summaries and tabulations of adverse events data on suicidality within clinical study reports. Systematic electronic search for adverse events of suicidality in tables, narratives, and listings of adverse events in individual patients within clinical study reports. Where possible, for each event we extracted the original term reported by the investigator, the term as coded by the medical coding dictionary, medical coding dictionary used, and the patient's trial identification number. Using the patient's trial identification number, we attempted to reconcile data on the same event between the different formats for presenting data on adverse events within the clinical study report. 9 randomised placebo controlled trials of duloxetine for major depressive disorder submitted to the European Medicines Agency for marketing approval. Clinical study reports obtained from the EMA in 2011. Six trials used the medical coding dictionary COSTART (Coding Symbols for a Thesaurus of Adverse Reaction Terms) and three used MedDRA (Medical Dictionary for Regulatory Activities). Suicides were clearly identifiable in all formats of adverse event data in clinical study reports. Suicide attempts presented in tables included both definitive and provisional diagnoses. Suicidal ideation and preparatory behaviour were obscured in some tables owing to the lack of specificity of the medical coding dictionary, especially COSTART. Furthermore, we found one event of suicidal ideation described in narrative text that was absent from tables and adverse event listings of individual patients. The reason for this is unclear, but may be due to the coding conventions used. Data on adverse events in tables in clinical study reports may not accurately represent the underlying patient data because of the medical dictionaries and coding conventions used. In clinical study reports, the listings of adverse events for individual patients and narratives of adverse events can provide additional information, including original investigator reported adverse event terms, which can enable a more accurate estimate of harms. © Maund et al 2014.

  19. Robust multi-atlas label propagation by deep sparse representation.

    PubMed

    Zu, Chen; Wang, Zhengxia; Zhang, Daoqiang; Liang, Peipeng; Shi, Yonghong; Shen, Dinggang; Wu, Guorong

    2017-03-01

    Recently, multi-atlas patch-based label fusion has achieved many successes in medical imaging area. The basic assumption in the current state-of-the-art approaches is that the image patch at the target image point can be represented by a patch dictionary consisting of atlas patches from registered atlas images. Therefore, the label at the target image point can be determined by fusing labels of atlas image patches with similar anatomical structures. However, such assumption on image patch representation does not always hold in label fusion since (1) the image content within the patch may be corrupted due to noise and artifact; and (2) the distribution of morphometric patterns among atlas patches might be unbalanced such that the majority patterns can dominate label fusion result over other minority patterns. The violation of the above basic assumptions could significantly undermine the label fusion accuracy. To overcome these issues, we first consider forming label-specific group for the atlas patches with the same label. Then, we alter the conventional flat and shallow dictionary to a deep multi-layer structure, where the top layer ( label-specific dictionaries ) consists of groups of representative atlas patches and the subsequent layers ( residual dictionaries ) hierarchically encode the patchwise residual information in different scales. Thus, the label fusion follows the representation consensus across representative dictionaries. However, the representation of target patch in each group is iteratively optimized by using the representative atlas patches in each label-specific dictionary exclusively to match the principal patterns and also using all residual patterns across groups collaboratively to overcome the issue that some groups might be absent of certain variation patterns presented in the target image patch. Promising segmentation results have been achieved in labeling hippocampus on ADNI dataset, as well as basal ganglia and brainstem structures, compared to other counterpart label fusion methods.

  20. Brain Tumor Symptoms

    MedlinePlus

    ... Fatigue Other Symptoms Diagnosis Types of Tumors Risk Factors Brain Tumor Statistics Brain Tumor Dictionary Webinars Anytime Learning About Us Our Founders Board of Directors Staff Leadership Strategic Plan Financials News Careers Brain Tumor Information Brain Anatomy Brain ...

  1. Knowledge retrieval from PubMed abstracts and electronic medical records with the Multiple Sclerosis Ontology.

    PubMed

    Malhotra, Ashutosh; Gündel, Michaela; Rajput, Abdul Mateen; Mevissen, Heinz-Theodor; Saiz, Albert; Pastor, Xavier; Lozano-Rubi, Raimundo; Martinez-Lapiscina, Elena H; Martinez-Lapsicina, Elena H; Zubizarreta, Irati; Mueller, Bernd; Kotelnikova, Ekaterina; Toldo, Luca; Hofmann-Apitius, Martin; Villoslada, Pablo

    2015-01-01

    In order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS). The MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology. Validation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports. The MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.

  2. Cross-language opinion lexicon extraction using mutual-reinforcement label propagation.

    PubMed

    Lin, Zheng; Tan, Songbo; Liu, Yue; Cheng, Xueqi; Xu, Xueke

    2013-01-01

    There is a growing interest in automatically building opinion lexicon from sources such as product reviews. Most of these methods depend on abundant external resources such as WordNet, which limits the applicability of these methods. Unsupervised or semi-supervised learning provides an optional solution to multilingual opinion lexicon extraction. However, the datasets are imbalanced in different languages. For some languages, the high-quality corpora are scarce or hard to obtain, which limits the research progress. To solve the above problems, we explore a mutual-reinforcement label propagation framework. First, for each language, a label propagation algorithm is applied to a word relation graph, and then a bilingual dictionary is used as a bridge to transfer information between two languages. A key advantage of this model is its ability to make two languages learn from each other and boost each other. The experimental results show that the proposed approach outperforms baseline significantly.

  3. Cross-Language Opinion Lexicon Extraction Using Mutual-Reinforcement Label Propagation

    PubMed Central

    Lin, Zheng; Tan, Songbo; Liu, Yue; Cheng, Xueqi; Xu, Xueke

    2013-01-01

    There is a growing interest in automatically building opinion lexicon from sources such as product reviews. Most of these methods depend on abundant external resources such as WordNet, which limits the applicability of these methods. Unsupervised or semi-supervised learning provides an optional solution to multilingual opinion lexicon extraction. However, the datasets are imbalanced in different languages. For some languages, the high-quality corpora are scarce or hard to obtain, which limits the research progress. To solve the above problems, we explore a mutual-reinforcement label propagation framework. First, for each language, a label propagation algorithm is applied to a word relation graph, and then a bilingual dictionary is used as a bridge to transfer information between two languages. A key advantage of this model is its ability to make two languages learn from each other and boost each other. The experimental results show that the proposed approach outperforms baseline significantly. PMID:24260190

  4. Multi-View Multi-Instance Learning Based on Joint Sparse Representation and Multi-View Dictionary Learning.

    PubMed

    Li, Bing; Yuan, Chunfeng; Xiong, Weihua; Hu, Weiming; Peng, Houwen; Ding, Xinmiao; Maybank, Steve

    2017-12-01

    In multi-instance learning (MIL), the relations among instances in a bag convey important contextual information in many applications. Previous studies on MIL either ignore such relations or simply model them with a fixed graph structure so that the overall performance inevitably degrades in complex environments. To address this problem, this paper proposes a novel multi-view multi-instance learning algorithm (MIL) that combines multiple context structures in a bag into a unified framework. The novel aspects are: (i) we propose a sparse -graph model that can generate different graphs with different parameters to represent various context relations in a bag, (ii) we propose a multi-view joint sparse representation that integrates these graphs into a unified framework for bag classification, and (iii) we propose a multi-view dictionary learning algorithm to obtain a multi-view graph dictionary that considers cues from all views simultaneously to improve the discrimination of the MIL. Experiments and analyses in many practical applications prove the effectiveness of the M IL.

  5. A sparse representation of the pathologist's interaction with whole slide images to improve the assigned relevance of regions of interest

    NASA Astrophysics Data System (ADS)

    Santiago, Daniel; Corredor, Germán.; Romero, Eduardo

    2017-11-01

    During a diagnosis task, a Pathologist looks over a Whole Slide Image (WSI), aiming to find out relevant pathological patterns. Nonetheless, a virtual microscope captures these structures, but also other cellular patterns with different or none diagnostic meaning. Annotation of these images depends on manual delineation, which in practice becomes a hard task. This article contributes a new method for detecting relevant regions in WSI using the routine navigations in a virtual microscope. This method constructs a sparse representation or dictionary of each navigation path and determines the hidden relevance by maximizing the incoherence between several paths. The resulting dictionaries are then projected onto each other and relevant information is set to the dictionary atoms whose similarity is higher than a custom threshold. Evaluation was performed with 6 pathological images segmented from a skin biopsy already diagnosed with basal cell carcinoma (BCC). Results show that our proposal outperforms the baseline by more than 20%.

  6. An Online Dictionary Learning-Based Compressive Data Gathering Algorithm in Wireless Sensor Networks

    PubMed Central

    Wang, Donghao; Wan, Jiangwen; Chen, Junying; Zhang, Qiang

    2016-01-01

    To adapt to sense signals of enormous diversities and dynamics, and to decrease the reconstruction errors caused by ambient noise, a novel online dictionary learning method-based compressive data gathering (ODL-CDG) algorithm is proposed. The proposed dictionary is learned from a two-stage iterative procedure, alternately changing between a sparse coding step and a dictionary update step. The self-coherence of the learned dictionary is introduced as a penalty term during the dictionary update procedure. The dictionary is also constrained with sparse structure. It’s theoretically demonstrated that the sensing matrix satisfies the restricted isometry property (RIP) with high probability. In addition, the lower bound of necessary number of measurements for compressive sensing (CS) reconstruction is given. Simulation results show that the proposed ODL-CDG algorithm can enhance the recovery accuracy in the presence of noise, and reduce the energy consumption in comparison with other dictionary based data gathering methods. PMID:27669250

  7. An Online Dictionary Learning-Based Compressive Data Gathering Algorithm in Wireless Sensor Networks.

    PubMed

    Wang, Donghao; Wan, Jiangwen; Chen, Junying; Zhang, Qiang

    2016-09-22

    To adapt to sense signals of enormous diversities and dynamics, and to decrease the reconstruction errors caused by ambient noise, a novel online dictionary learning method-based compressive data gathering (ODL-CDG) algorithm is proposed. The proposed dictionary is learned from a two-stage iterative procedure, alternately changing between a sparse coding step and a dictionary update step. The self-coherence of the learned dictionary is introduced as a penalty term during the dictionary update procedure. The dictionary is also constrained with sparse structure. It's theoretically demonstrated that the sensing matrix satisfies the restricted isometry property (RIP) with high probability. In addition, the lower bound of necessary number of measurements for compressive sensing (CS) reconstruction is given. Simulation results show that the proposed ODL-CDG algorithm can enhance the recovery accuracy in the presence of noise, and reduce the energy consumption in comparison with other dictionary based data gathering methods.

  8. Developing a National-Level Concept Dictionary for EHR Implementations in Kenya.

    PubMed

    Keny, Aggrey; Wanyee, Steven; Kwaro, Daniel; Mulwa, Edwin; Were, Martin C

    2015-01-01

    The increasing adoption of Electronic Health Records (EHR) by developing countries comes with the need to develop common terminology standards to assure semantic interoperability. In Kenya, where the Ministry of Health has rolled out an EHR at 646 sites, several challenges have emerged including variable dictionaries across implementations, inability to easily share data across systems, lack of expertise in dictionary management, lack of central coordination and custody of a terminology service, inadequately defined policies and processes, insufficient infrastructure, among others. A Concept Working Group was constituted to address these challenges. The country settled on a common Kenya data dictionary, initially derived as a subset of the Columbia International eHealth Laboratory (CIEL)/Millennium Villages Project (MVP) dictionary. The initial dictionary scope largely focuses on clinical needs. Processes and policies around dictionary management are being guided by the framework developed by Bakhshi-Raiez et al. Technical and infrastructure-based approaches are also underway to streamline workflow for dictionary management and distribution across implementations. Kenya's approach on comprehensive common dictionary can serve as a model for other countries in similar settings.

  9. Bayesian nonparametric dictionary learning for compressed sensing MRI.

    PubMed

    Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping

    2014-12-01

    We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.

  10. The Text Retrieval Conferences (TRECs)

    DTIC Science & Technology

    1998-10-01

    per- form a monolingual run in the target language to act as a baseline. Thirteen groups participated in the TREC-6 CLIR track. Three major...language; the use of machine-readable bilingual dictionaries or other existing linguistic re- sources; and the use of corpus resources to train or...formance for each method. In general, the best cross- language performance was between 50%-75% as ef- fective as a quality monolingual run. The TREC-7

  11. Enabling Long-Term Earth Science Research: Changing Data Practices (Invited)

    NASA Astrophysics Data System (ADS)

    Baker, K. S.

    2013-12-01

    Data stewardship plans are shaped by our shared experiences. As a result, community engagement and collaborative activities are central to the stewardship of data. Since modes and mechanisms of engagement have changed, we benefit from asking anew: ';Who are the communities?' and ';What are the lessons learned?'. Data stewardship with its long-term care perspective, is enriched by reflection on community experience. This presentation draws on data management issues and strategies originating from within long-term research communities as well as on recent studies informed by library and information science. Ethnographic case studies that capture project activities and histories are presented as resources for comparative analysis. Agency requirements and funding opportunities are stimulating collaborative endeavors focused on data re-use and archiving. Research groups including earth scientists, information professionals, and data systems designers are recognizing the possibilities for new ways of thinking about data in the digital arena. Together, these groups are re-conceptualizing and reconfiguring for data management and data curation. A differentiation between managing data for local use and production of data for re-use remotely in locations and fields remote from the data origin is just one example of the concepts emerging to facilitate development of data management. While earth scientists as data generators have the responsibility to plan new workflows and documentation practices, data and information specialists have responsibility to promote best practices as well as to facilitate the development of community resources such as controlled vocabularies and data dictionaries. With data-centric activities and changing data practices, the potential for creating dynamic community information environments in conjunction with development of data facilities exists but remains elusive.

  12. Sparsity-promoting orthogonal dictionary updating for image reconstruction from highly undersampled magnetic resonance data.

    PubMed

    Huang, Jinhong; Guo, Li; Feng, Qianjin; Chen, Wufan; Feng, Yanqiu

    2015-07-21

    Image reconstruction from undersampled k-space data accelerates magnetic resonance imaging (MRI) by exploiting image sparseness in certain transform domains. Employing image patch representation over a learned dictionary has the advantage of being adaptive to local image structures and thus can better sparsify images than using fixed transforms (e.g. wavelets and total variations). Dictionary learning methods have recently been introduced to MRI reconstruction, and these methods demonstrate significantly reduced reconstruction errors compared to sparse MRI reconstruction using fixed transforms. However, the synthesis sparse coding problem in dictionary learning is NP-hard and computationally expensive. In this paper, we present a novel sparsity-promoting orthogonal dictionary updating method for efficient image reconstruction from highly undersampled MRI data. The orthogonality imposed on the learned dictionary enables the minimization problem in the reconstruction to be solved by an efficient optimization algorithm which alternately updates representation coefficients, orthogonal dictionary, and missing k-space data. Moreover, both sparsity level and sparse representation contribution using updated dictionaries gradually increase during iterations to recover more details, assuming the progressively improved quality of the dictionary. Simulation and real data experimental results both demonstrate that the proposed method is approximately 10 to 100 times faster than the K-SVD-based dictionary learning MRI method and simultaneously improves reconstruction accuracy.

  13. Manifold optimization-based analysis dictionary learning with an ℓ1∕2-norm regularizer.

    PubMed

    Li, Zhenni; Ding, Shuxue; Li, Yujie; Yang, Zuyuan; Xie, Shengli; Chen, Wuhui

    2018-02-01

    Recently there has been increasing attention towards analysis dictionary learning. In analysis dictionary learning, it is an open problem to obtain the strong sparsity-promoting solutions efficiently while simultaneously avoiding the trivial solutions of the dictionary. In this paper, to obtain the strong sparsity-promoting solutions, we employ the ℓ 1∕2 norm as a regularizer. The very recent study on ℓ 1∕2 norm regularization theory in compressive sensing shows that its solutions can give sparser results than using the ℓ 1 norm. We transform a complex nonconvex optimization into a number of one-dimensional minimization problems. Then the closed-form solutions can be obtained efficiently. To avoid trivial solutions, we apply manifold optimization to update the dictionary directly on the manifold satisfying the orthonormality constraint, so that the dictionary can avoid the trivial solutions well while simultaneously capturing the intrinsic properties of the dictionary. The experiments with synthetic and real-world data verify that the proposed algorithm for analysis dictionary learning can not only obtain strong sparsity-promoting solutions efficiently, but also learn more accurate dictionary in terms of dictionary recovery and image processing than the state-of-the-art algorithms. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Alzheimer's disease detection via automatic 3D caudate nucleus segmentation using coupled dictionary learning with level set formulation.

    PubMed

    Al-Shaikhli, Saif Dawood Salman; Yang, Michael Ying; Rosenhahn, Bodo

    2016-12-01

    This paper presents a novel method for Alzheimer's disease classification via an automatic 3D caudate nucleus segmentation. The proposed method consists of segmentation and classification steps. In the segmentation step, we propose a novel level set cost function. The proposed cost function is constrained by a sparse representation of local image features using a dictionary learning method. We present coupled dictionaries: a feature dictionary of a grayscale brain image and a label dictionary of a caudate nucleus label image. Using online dictionary learning, the coupled dictionaries are learned from the training data. The learned coupled dictionaries are embedded into a level set function. In the classification step, a region-based feature dictionary is built. The region-based feature dictionary is learned from shape features of the caudate nucleus in the training data. The classification is based on the measure of the similarity between the sparse representation of region-based shape features of the segmented caudate in the test image and the region-based feature dictionary. The experimental results demonstrate the superiority of our method over the state-of-the-art methods by achieving a high segmentation (91.5%) and classification (92.5%) accuracy. In this paper, we find that the study of the caudate nucleus atrophy gives an advantage over the study of whole brain structure atrophy to detect Alzheimer's disease. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  15. English Collocation Learning through Corpus Data: On-Line Concordance and Statistical Information

    ERIC Educational Resources Information Center

    Ohtake, Hiroshi; Fujita, Nobuyuki; Kawamoto, Takeshi; Morren, Brian; Ugawa, Yoshihiro; Kaneko, Shuji

    2012-01-01

    We developed an English Collocations On Demand system offering on-line corpus and concordance information to help Japanese researchers acquire a better command of English collocation patterns. The Life Science Dictionary Corpus consists of approximately 90,000,000 words collected from life science related research papers published in academic…

  16. Joint sparse coding based spatial pyramid matching for classification of color medical image.

    PubMed

    Shi, Jun; Li, Yi; Zhu, Jie; Sun, Haojie; Cai, Yin

    2015-04-01

    Although color medical images are important in clinical practice, they are usually converted to grayscale for further processing in pattern recognition, resulting in loss of rich color information. The sparse coding based linear spatial pyramid matching (ScSPM) and its variants are popular for grayscale image classification, but cannot extract color information. In this paper, we propose a joint sparse coding based SPM (JScSPM) method for the classification of color medical images. A joint dictionary can represent both the color information in each color channel and the correlation between channels. Consequently, the joint sparse codes calculated from a joint dictionary can carry color information, and therefore this method can easily transform a feature descriptor originally designed for grayscale images to a color descriptor. A color hepatocellular carcinoma histological image dataset was used to evaluate the performance of the proposed JScSPM algorithm. Experimental results show that JScSPM provides significant improvements as compared with the majority voting based ScSPM and the original ScSPM for color medical image classification. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. The efficacy of dictionary use while reading for learning new words.

    PubMed

    Hamilton, Harley

    2012-01-01

    The researcher investigated the use of three types of dictionaries while reading by high school students with severe to profound hearing loss. The objective of the study was to determine the effectiveness of each type of dictionary for acquiring the meanings of unknown vocabulary in text. The three types of dictionaries were (a) an online bilingual multimedia English-American Sign Language (ASL) dictionary (OBMEAD), (b) a paper English-ASL dictionary (PBEAD), and (c) an online monolingual English dictionary (OMED). It was found that for immediate recall of target words, the OBMEAD was superior to both the PBEAD and the OMED. For later recall, no significant difference appeared between the OBMEAD and the PBEAD. For both of these, recall was statistically superior to recall for words learned via the OMED.

  18. CELLPEDIA: a repository for human cell information for cell studies and differentiation analyses.

    PubMed

    Hatano, Akiko; Chiba, Hirokazu; Moesa, Harry Amri; Taniguchi, Takeaki; Nagaie, Satoshi; Yamanegi, Koji; Takai-Igarashi, Takako; Tanaka, Hiroshi; Fujibuchi, Wataru

    2011-01-01

    CELLPEDIA is a repository database for current knowledge about human cells. It contains various types of information, such as cell morphologies, gene expression and literature references. The major role of CELLPEDIA is to provide a digital dictionary of human cells for the biomedical field, including support for the characterization of artificially generated cells in regenerative medicine. CELLPEDIA features (i) its own cell classification scheme, in which whole human cells are classified by their physical locations in addition to conventional taxonomy; and (ii) cell differentiation pathways compiled from biomedical textbooks and journal papers. Currently, human differentiated cells and stem cells are classified into 2260 and 66 cell taxonomy keys, respectively, from which 934 parent-child relationships reported in cell differentiation or transdifferentiation pathways are retrievable. As far as we know, this is the first attempt to develop a digital cell bank to function as a public resource for the accumulation of current knowledge about human cells. The CELLPEDIA homepage is freely accessible except for the data submission pages that require authentication (please send a password request to cell-info@cbrc.jp). Database URL: http://cellpedia.cbrc.jp/

  19. Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra*

    PubMed Central

    Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno; Pevzner, Pavel A.

    2011-01-01

    Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-GappedDictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches. PMID:21444829

  20. Chemical entity recognition in patents by combining dictionary-based and statistical approaches.

    PubMed

    Akhondi, Saber A; Pons, Ewoud; Afzal, Zubair; van Haagen, Herman; Becker, Benedikt F H; Hettne, Kristina M; van Mulligen, Erik M; Kors, Jan A

    2016-01-01

    We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistical one. For this purpose the performance of several lexical resources was assessed using Peregrine, our open-source indexing engine. We combined our dictionary-based results on the patent corpus with the results of tmChem, a chemical recognizer using a conditional random field classifier. To improve the performance of tmChem, we utilized three additional features, viz. part-of-speech tags, lemmas and word-vector clusters. When evaluated on the training data, our final system obtained an F-score of 85.21% for the CEMP task, and an accuracy of 91.53% for the CPD task. On the test set, the best system ranked sixth among 21 teams for CEMP with an F-score of 86.82%, and second among nine teams for CPD with an accuracy of 94.23%. The differences in performance between the best ensemble system and the statistical system separately were small.Database URL: http://biosemantics.org/chemdner-patents. © The Author(s) 2016. Published by Oxford University Press.

  1. Chinese-English Nuclear and Physics Dictionary.

    ERIC Educational Resources Information Center

    Air Force Systems Command, Wright-Patterson AFB, OH. Foreign Technology Div.

    The Nuclear and Physics Dictionary is one of a series of Chinese-English technical dictionaries prepared by the Foreign Technology Division, United States Air Force Systems Command. The purpose of this dictionary is to provide rapid reference tools for translators, abstractors, and research analysts concerned with scientific and technical…

  2. Mandarin Chinese Dictionary: English-Chinese.

    ERIC Educational Resources Information Center

    Wang, Fred Fangyu

    This dictionary is a companion volume to the "Mandarin Chinese Dictionary (Chinese-English)" published in 1967 by Seton Hall University. The purpose of the dictionary is to help English-speaking students produce Chinese sentences in certain cultural situations by looking up the English expressions. Natural, spoken Chinese expressions within the…

  3. MEANING DISCRIMINATION IN BILINGUAL DICTIONARIES.

    ERIC Educational Resources Information Center

    IANNUCCI, JAMES E.

    SEMANTIC DISCRIMINATION OF POLYSEMOUS ENTRY WORDS IN BILINGUAL DICTIONARIES WAS DISCUSSED IN THE PAPER. HANDICAPS OF PRESENT BILINGUAL DICTIONARIES AND BARRIERS TO THEIR FULL UTILIZATION WERE ENUMERATED. THE AUTHOR CONCLUDED THAT (1) A BILINGUAL DICTIONARY SHOULD HAVE A DISCRIMINATION FOR EVERY TRANSLATION OF AN ENTRY WORD WHICH HAS SEVERAL…

  4. The Use of Hyper-Reference and Conventional Dictionaries.

    ERIC Educational Resources Information Center

    Aust, Ronald; And Others

    1993-01-01

    Describes a study of 80 undergraduate foreign language learners that compared the use of a hyper-reference source incorporating an electronic dictionary and a conventional paper dictionary. Measures of consultation frequency, study time, efficiency, and comprehension are examined; bilingual and monolingual dictionary use is compared; and further…

  5. TSAR (Theater Simulation of Airbase Resources) Database Dictionary F-4G.

    DTIC Science & Technology

    1987-06-05

    REC/TRANS RT-1159 388 71ZDO CNTL UNIT C- 10062 /A 391 71320 CNTL ARN-127 393 723A0 REC/TRANS RT-689 394 723B0 INDIC, HEIGHT TOTAL NUMBER OF PART REPAIR...PROCEDURES - 2 LOU " 4"p" IW qg t~l pb i.go stanc E lm FIGURE 98 111-260 S.. RESOURCE REQUIREMENTS 111.1.5.25 LIU’S #45 - #49 - *LRU PART TIME...RT-1159 AG 386 71ZBO ADAPTER MX9577 AG 387 71ZCO MOUNT (REC/TRANS) AG 388 71ZDO CNTL UNIT C- 10062 /A AG 389 71ZEO MOUNT (DIG TO ANALOG CONVERTER) AG

  6. Implementing the Freight Transportation Data Architecture : Data Element Dictionary

    DOT National Transportation Integrated Search

    2015-01-01

    NCFRP Report 9: Guidance for Developing a Freight Data Architecture articulates the value of establishing architecture for linking data across modes, subjects, and levels of geography to obtain essential information for decision making. Central to th...

  7. Vitamin and Mineral Supplement Fact Sheets

    MedlinePlus

    ... Dictionary of Dietary Supplement Terms Dietary Supplement Label Database (DSLD) Información en español Consumer information in Spanish ... Analytical Methods and Reference Materials Dietary Supplement Label Database (DSLD) Dietary Supplement Ingredient Database (DSID) Computer Access ...

  8. Experiments with Cross-Language Information Retrieval on a Health Portal for Psychology and Psychotherapy.

    PubMed

    Andrenucci, Andrea

    2016-01-01

    Few studies have been performed within cross-language information retrieval (CLIR) in the field of psychology and psychotherapy. The aim of this paper is to to analyze and assess the quality of available query translation methods for CLIR on a health portal for psychology. A test base of 100 user queries, 50 Multi Word Units (WUs) and 50 Single WUs, was used. Swedish was the source language and English the target language. Query translation methods based on machine translation (MT) and dictionary look-up were utilized in order to submit query translations to two search engines: Google Site Search and Quick Ask. Standard IR evaluation measures and a qualitative analysis were utilized to assess the results. The lexicon extracted with word alignment of the portal's parallel corpus provided better statistical results among dictionary look-ups. Google Translate provided more linguistically correct translations overall and also delivered better retrieval results in MT.

  9. Why is data sharing in collaborative natural resource efforts so hard and what can we do to improve it?

    PubMed

    Volk, Carol J; Lucero, Yasmin; Barnas, Katie

    2014-05-01

    Increasingly, research and management in natural resource science rely on very large datasets compiled from multiple sources. While it is generally good to have more data, utilizing large, complex datasets has introduced challenges in data sharing, especially for collaborating researchers in disparate locations ("distributed research teams"). We surveyed natural resource scientists about common data-sharing problems. The major issues identified by our survey respondents (n = 118) when providing data were lack of clarity in the data request (including format of data requested). When receiving data, survey respondents reported various insufficiencies in documentation describing the data (e.g., no data collection description/no protocol, data aggregated, or summarized without explanation). Since metadata, or "information about the data," is a central obstacle in efficient data handling, we suggest documenting metadata through data dictionaries, protocols, read-me files, explicit null value documentation, and process metadata as essential to any large-scale research program. We advocate for all researchers, but especially those involved in distributed teams to alleviate these problems with the use of several readily available communication strategies including the use of organizational charts to define roles, data flow diagrams to outline procedures and timelines, and data update cycles to guide data-handling expectations. In particular, we argue that distributed research teams magnify data-sharing challenges making data management training even more crucial for natural resource scientists. If natural resource scientists fail to overcome communication and metadata documentation issues, then negative data-sharing experiences will likely continue to undermine the success of many large-scale collaborative projects.

  10. Why is Data Sharing in Collaborative Natural Resource Efforts so Hard and What can We Do to Improve it?

    NASA Astrophysics Data System (ADS)

    Volk, Carol J.; Lucero, Yasmin; Barnas, Katie

    2014-05-01

    Increasingly, research and management in natural resource science rely on very large datasets compiled from multiple sources. While it is generally good to have more data, utilizing large, complex datasets has introduced challenges in data sharing, especially for collaborating researchers in disparate locations ("distributed research teams"). We surveyed natural resource scientists about common data-sharing problems. The major issues identified by our survey respondents ( n = 118) when providing data were lack of clarity in the data request (including format of data requested). When receiving data, survey respondents reported various insufficiencies in documentation describing the data (e.g., no data collection description/no protocol, data aggregated, or summarized without explanation). Since metadata, or "information about the data," is a central obstacle in efficient data handling, we suggest documenting metadata through data dictionaries, protocols, read-me files, explicit null value documentation, and process metadata as essential to any large-scale research program. We advocate for all researchers, but especially those involved in distributed teams to alleviate these problems with the use of several readily available communication strategies including the use of organizational charts to define roles, data flow diagrams to outline procedures and timelines, and data update cycles to guide data-handling expectations. In particular, we argue that distributed research teams magnify data-sharing challenges making data management training even more crucial for natural resource scientists. If natural resource scientists fail to overcome communication and metadata documentation issues, then negative data-sharing experiences will likely continue to undermine the success of many large-scale collaborative projects.

  11. Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data.

    PubMed

    Kasthurirathne, Suranga N; Dixon, Brian E; Gichoya, Judy; Xu, Huiping; Xia, Yuni; Mamlin, Burke; Grannis, Shaun J

    2017-05-01

    Existing approaches to derive decision models from plaintext clinical data frequently depend on medical dictionaries as the sources of potential features. Prior research suggests that decision models developed using non-dictionary based feature sourcing approaches and "off the shelf" tools could predict cancer with performance metrics between 80% and 90%. We sought to compare non-dictionary based models to models built using features derived from medical dictionaries. We evaluated the detection of cancer cases from free text pathology reports using decision models built with combinations of dictionary or non-dictionary based feature sourcing approaches, 4 feature subset sizes, and 5 classification algorithms. Each decision model was evaluated using the following performance metrics: sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Decision models parameterized using dictionary and non-dictionary feature sourcing approaches produced performance metrics between 70 and 90%. The source of features and feature subset size had no impact on the performance of a decision model. Our study suggests there is little value in leveraging medical dictionaries for extracting features for decision model building. Decision models built using features extracted from the plaintext reports themselves achieve comparable results to those built using medical dictionaries. Overall, this suggests that existing "off the shelf" approaches can be leveraged to perform accurate cancer detection using less complex Named Entity Recognition (NER) based feature extraction, automated feature selection and modeling approaches. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Markov Chain Monte Carlo Inference of Parametric Dictionaries for Sparse Bayesian Approximations

    PubMed Central

    Chaspari, Theodora; Tsiartas, Andreas; Tsilifis, Panagiotis; Narayanan, Shrikanth

    2016-01-01

    Parametric dictionaries can increase the ability of sparse representations to meaningfully capture and interpret the underlying signal information, such as encountered in biomedical problems. Given a mapping function from the atom parameter space to the actual atoms, we propose a sparse Bayesian framework for learning the atom parameters, because of its ability to provide full posterior estimates, take uncertainty into account and generalize on unseen data. Inference is performed with Markov Chain Monte Carlo, that uses block sampling to generate the variables of the Bayesian problem. Since the parameterization of dictionary atoms results in posteriors that cannot be analytically computed, we use a Metropolis-Hastings-within-Gibbs framework, according to which variables with closed-form posteriors are generated with the Gibbs sampler, while the remaining ones with the Metropolis Hastings from appropriate candidate-generating densities. We further show that the corresponding Markov Chain is uniformly ergodic ensuring its convergence to a stationary distribution independently of the initial state. Results on synthetic data and real biomedical signals indicate that our approach offers advantages in terms of signal reconstruction compared to previously proposed Steepest Descent and Equiangular Tight Frame methods. This paper demonstrates the ability of Bayesian learning to generate parametric dictionaries that can reliably represent the exemplar data and provides the foundation towards inferring the entire variable set of the sparse approximation problem for signal denoising, adaptation and other applications. PMID:28649173

  13. The Oxford English Dictionary: A Brief History.

    ERIC Educational Resources Information Center

    Fritze, Ronald H.

    1989-01-01

    Reviews the development of English dictionaries in general and the Oxford English Dictionary (OED) in particular. The discussion covers the decision by the Philological Society to create the dictionary, the principles that guided its development, the involvement of James Augustus Henry Murray, the magnitude and progress of the project, and the…

  14. Dictionary Making: A Case of Kiswahili Dictionaries.

    ERIC Educational Resources Information Center

    Mohamed, Mohamed A.

    Two Swahili dictionaries and two bilingual dictionaries by the same author (one English-Swahili and one Swahili-English) are evaluated for their form and content, with illustrations offered from each. Aspects examined include: the compilation of headwords, including their meanings with relation to basic and extended meanings; treatment of…

  15. Buying and Selling Words: What Every Good Librarian Should Know about the Dictionary Business.

    ERIC Educational Resources Information Center

    Kister, Ken

    1993-01-01

    Discusses features to consider when selecting dictionaries. Topics addressed include the publishing industry; the dictionary market; profits from dictionaries; pricing; competitive marketing tactics, including similar titles, claims to numbers of entries and numbers of definitions, and similar physical appearance; a trademark infringement case;…

  16. The New Unabridged English-Persian Dictionary.

    ERIC Educational Resources Information Center

    Aryanpur, Abbas; Saleh, Jahan Shah

    This five-volume English-Persian dictionary is based on Webster's International Dictionary (1960 and 1961) and The Shorter Oxford English Dictionary (1959); it attempts to provide Persian equivalents of all the words of Oxford and all the key-words of Webster. Pronunciation keys for the English phonetic transcription and for the difficult Persian…

  17. Evaluating L2 Readers' Vocabulary Strategies and Dictionary Use

    ERIC Educational Resources Information Center

    Prichard, Caleb

    2008-01-01

    A review of the relevant literature concerning second language dictionary use while reading suggests that selective dictionary use may lead to improved comprehension and efficient vocabulary development. This study aims to examine the dictionary use of Japanese university students to determine just how selective they are when reading nonfiction…

  18. Online English-English Learner Dictionaries Boost Word Learning

    ERIC Educational Resources Information Center

    Nurmukhamedov, Ulugbek

    2012-01-01

    Learners of English might be familiar with several online monolingual dictionaries that are not necessarily the best choices for the English as Second/Foreign Language (ESL/EFL) context. Although these monolingual online dictionaries contain definitions, pronunciation guides, and other elements normally found in general-use dictionaries, they are…

  19. Research Timeline: Dictionary Use by English Language Learners

    ERIC Educational Resources Information Center

    Nesi, Hilary

    2014-01-01

    The history of research into dictionary use tends to be characterised by small-scale studies undertaken in a variety of different contexts, rather than larger-scale, longer-term funded projects. The research conducted by dictionary publishers is not generally made public, because of its commercial sensitivity, yet because dictionary production is…

  20. The Dictionary and Vocabulary Behavior: A Single Word or a Handful?

    ERIC Educational Resources Information Center

    Baxter, James

    1980-01-01

    To provide a context for dictionary selection, the vocabulary behavior of students is examined. Distinguishing between written and spoken English, the relation between dictionary use, classroom vocabulary behavior, and students' success in meeting their communicative needs is discussed. The choice of a monolingual English learners' dictionary is…

  1. UMass at TREC 2002: Cross Language and Novelty Tracks

    DTIC Science & Technology

    2002-01-01

    resources – stemmers, dictionaries , machine translation, and an acronym database. We found that proper names were extremely important in this year’s queries...data by manually annotating 48 additional topics. 1. Cross Language Track We submitted one monolingual run and four cross-language runs. For the... monolingual run, the technology was essentially the same as the system we used for TREC 2001. For the cross-language run, we integrated some new

  2. NHEXAS PHASE I ARIZONA STUDY--STANDARD OPERATING PROCEDURE FOR THE GENERATION AND OPERATION OF DATA DICTIONARIES (UA-D-4.0)

    EPA Science Inventory

    The purpose of this SOP is to provide a standard method for the writing of data dictionaries. This procedure applies to the dictionaries used during the Arizona NHEXAS project and the "Border" study. Keywords: guidelines; data dictionaries.

    The National Human Exposure Assessme...

  3. Should Dictionaries Be Used in Translation Tests and Examinations?

    ERIC Educational Resources Information Center

    Mahmoud, Abdulmoneim

    2017-01-01

    Motivated by the conflicting views regarding the use of the dictionary in translation tests and examinations this study was intended to verify the dictionary-free vs dictionary-based translation hypotheses. The subjects were 135 Arabic-speaking male and female EFL third-year university students. A group consisting of 62 students translated a text…

  4. The Creation of Learner-Centred Dictionaries for Endangered Languages: A Rotuman Example

    ERIC Educational Resources Information Center

    Vamarasi, M.

    2014-01-01

    This article examines the creation of dictionaries for endangered languages (ELs). Though each dictionary is uniquely prepared for its users, all dictionaries should be based on sound principles of vocabulary learning, including the importance of lexical chunks, as emphasised by Michael Lewis in his "Lexical Approach." Many of the…

  5. Evaluating Bilingual and Monolingual Dictionaries for L2 Learners.

    ERIC Educational Resources Information Center

    Hunt, Alan

    1997-01-01

    A discussion of dictionaries and their use for second language (L2) learning suggests that lack of computerized modern language corpora can adversely affect bilingual dictionaries, commonly used by L2 learners, and shows how use of such corpora has benefitted two contemporary monolingual L2 learner dictionaries (1995 editions of the Longman…

  6. Discriminative Bayesian Dictionary Learning for Classification.

    PubMed

    Akhtar, Naveed; Shafait, Faisal; Mian, Ajmal

    2016-12-01

    We propose a Bayesian approach to learn discriminative dictionaries for sparse representation of data. The proposed approach infers probability distributions over the atoms of a discriminative dictionary using a finite approximation of Beta Process. It also computes sets of Bernoulli distributions that associate class labels to the learned dictionary atoms. This association signifies the selection probabilities of the dictionary atoms in the expansion of class-specific data. Furthermore, the non-parametric character of the proposed approach allows it to infer the correct size of the dictionary. We exploit the aforementioned Bernoulli distributions in separately learning a linear classifier. The classifier uses the same hierarchical Bayesian model as the dictionary, which we present along the analytical inference solution for Gibbs sampling. For classification, a test instance is first sparsely encoded over the learned dictionary and the codes are fed to the classifier. We performed experiments for face and action recognition; and object and scene-category classification using five public datasets and compared the results with state-of-the-art discriminative sparse representation approaches. Experiments show that the proposed Bayesian approach consistently outperforms the existing approaches.

  7. Sparse dictionary for synthetic transmit aperture medical ultrasound imaging.

    PubMed

    Wang, Ping; Jiang, Jin-Yang; Li, Na; Luo, Han-Wu; Li, Fang; Cui, Shi-Gang

    2017-07-01

    It is possible to recover a signal below the Nyquist sampling limit using a compressive sensing technique in ultrasound imaging. However, the reconstruction enabled by common sparse transform approaches does not achieve satisfactory results. Considering the ultrasound echo signal's features of attenuation, repetition, and superposition, a sparse dictionary with the emission pulse signal is proposed. Sparse coefficients in the proposed dictionary have high sparsity. Images reconstructed with this dictionary were compared with those obtained with the three other common transforms, namely, discrete Fourier transform, discrete cosine transform, and discrete wavelet transform. The performance of the proposed dictionary was analyzed via a simulation and experimental data. The mean absolute error (MAE) was used to quantify the quality of the reconstructions. Experimental results indicate that the MAE associated with the proposed dictionary was always the smallest, the reconstruction time required was the shortest, and the lateral resolution and contrast of the reconstructed images were also the closest to the original images. The proposed sparse dictionary performed better than the other three sparse transforms. With the same sampling rate, the proposed dictionary achieved excellent reconstruction quality.

  8. Robust Visual Tracking via Online Discriminative and Low-Rank Dictionary Learning.

    PubMed

    Zhou, Tao; Liu, Fanghui; Bhaskar, Harish; Yang, Jie

    2017-09-12

    In this paper, we propose a novel and robust tracking framework based on online discriminative and low-rank dictionary learning. The primary aim of this paper is to obtain compact and low-rank dictionaries that can provide good discriminative representations of both target and background. We accomplish this by exploiting the recovery ability of low-rank matrices. That is if we assume that the data from the same class are linearly correlated, then the corresponding basis vectors learned from the training set of each class shall render the dictionary to become approximately low-rank. The proposed dictionary learning technique incorporates a reconstruction error that improves the reliability of classification. Also, a multiconstraint objective function is designed to enable active learning of a discriminative and robust dictionary. Further, an optimal solution is obtained by iteratively computing the dictionary, coefficients, and by simultaneously learning the classifier parameters. Finally, a simple yet effective likelihood function is implemented to estimate the optimal state of the target during tracking. Moreover, to make the dictionary adaptive to the variations of the target and background during tracking, an online update criterion is employed while learning the new dictionary. Experimental results on a publicly available benchmark dataset have demonstrated that the proposed tracking algorithm performs better than other state-of-the-art trackers.

  9. Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) and Its Application to Inverse Problems.

    PubMed

    Ravishankar, Saiprasad; Nadakuditi, Raj Rao; Fessler, Jeffrey A

    2017-12-01

    The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. This paper exploits the ideas that drive algorithms such as K-SVD, and investigates in detail efficient methods for aggregate sparsity penalized dictionary learning by first approximating the data with a sum of sparse rank-one matrices (outer products) and then using a block coordinate descent approach to estimate the unknowns. The resulting block coordinate descent algorithms involve efficient closed-form solutions. Furthermore, we consider the problem of dictionary-blind image reconstruction, and propose novel and efficient algorithms for adaptive image reconstruction using block coordinate descent and sum of outer products methodologies. We provide a convergence study of the algorithms for dictionary learning and dictionary-blind image reconstruction. Our numerical experiments show the promising performance and speedups provided by the proposed methods over previous schemes in sparse data representation and compressed sensing-based image reconstruction.

  10. Cross-View Action Recognition via Transferable Dictionary Learning.

    PubMed

    Zheng, Jingjing; Jiang, Zhuolin; Chellappa, Rama

    2016-05-01

    Discriminative appearance features are effective for recognizing actions in a fixed view, but may not generalize well to a new view. In this paper, we present two effective approaches to learn dictionaries for robust action recognition across views. In the first approach, we learn a set of view-specific dictionaries where each dictionary corresponds to one camera view. These dictionaries are learned simultaneously from the sets of correspondence videos taken at different views with the aim of encouraging each video in the set to have the same sparse representation. In the second approach, we additionally learn a common dictionary shared by different views to model view-shared features. This approach represents the videos in each view using a view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from the different views of the same action to have the similar sparse representations. The learned common dictionary not only has the capability to represent actions from unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labeled videos exist in the target view. The extensive experiments using three public datasets demonstrate that the proposed approach outperforms recently developed approaches for cross-view action recognition.

  11. Orthogonal Procrustes Analysis for Dictionary Learning in Sparse Linear Representation.

    PubMed

    Grossi, Giuliano; Lanzarotti, Raffaella; Lin, Jianyi

    2017-01-01

    In the sparse representation model, the design of overcomplete dictionaries plays a key role for the effectiveness and applicability in different domains. Recent research has produced several dictionary learning approaches, being proven that dictionaries learnt by data examples significantly outperform structured ones, e.g. wavelet transforms. In this context, learning consists in adapting the dictionary atoms to a set of training signals in order to promote a sparse representation that minimizes the reconstruction error. Finding the best fitting dictionary remains a very difficult task, leaving the question still open. A well-established heuristic method for tackling this problem is an iterative alternating scheme, adopted for instance in the well-known K-SVD algorithm. Essentially, it consists in repeating two stages; the former promotes sparse coding of the training set and the latter adapts the dictionary to reduce the error. In this paper we present R-SVD, a new method that, while maintaining the alternating scheme, adopts the Orthogonal Procrustes analysis to update the dictionary atoms suitably arranged into groups. Comparative experiments on synthetic data prove the effectiveness of R-SVD with respect to well known dictionary learning algorithms such as K-SVD, ILS-DLA and the online method OSDL. Moreover, experiments on natural data such as ECG compression, EEG sparse representation, and image modeling confirm R-SVD's robustness and wide applicability.

  12. Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) and Its Application to Inverse Problems

    PubMed Central

    Ravishankar, Saiprasad; Nadakuditi, Raj Rao; Fessler, Jeffrey A.

    2017-01-01

    The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. This paper exploits the ideas that drive algorithms such as K-SVD, and investigates in detail efficient methods for aggregate sparsity penalized dictionary learning by first approximating the data with a sum of sparse rank-one matrices (outer products) and then using a block coordinate descent approach to estimate the unknowns. The resulting block coordinate descent algorithms involve efficient closed-form solutions. Furthermore, we consider the problem of dictionary-blind image reconstruction, and propose novel and efficient algorithms for adaptive image reconstruction using block coordinate descent and sum of outer products methodologies. We provide a convergence study of the algorithms for dictionary learning and dictionary-blind image reconstruction. Our numerical experiments show the promising performance and speedups provided by the proposed methods over previous schemes in sparse data representation and compressed sensing-based image reconstruction. PMID:29376111

  13. 75 FR 63888 - Occupational Information Development Advisory Panel Meeting

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-18

    ... independent advice and recommendations on plans and activities to replace the Dictionary of Occupational...: Medical and vocational analysis of disability claims; occupational analysis, including definitions... to our disability programs and improve the medical-vocational adjudication policies and processes...

  14. Print-Format Information Sources for Urban Research.

    ERIC Educational Resources Information Center

    Sable, Martin H.

    1982-01-01

    Describes various reference tools that would be useful to urban researchers, including bibliographies, indexing and abstracting services, dictionaries, encyclopedias, handbooks, yearbooks, and directories on urban studies, political science, and economics. (For journal availability, see UD 509 682.) (Author/MJL)

  15. Alternatively Constrained Dictionary Learning For Image Superresolution.

    PubMed

    Lu, Xiaoqiang; Yuan, Yuan; Yan, Pingkun

    2014-03-01

    Dictionaries are crucial in sparse coding-based algorithm for image superresolution. Sparse coding is a typical unsupervised learning method to study the relationship between the patches of high-and low-resolution images. However, most of the sparse coding methods for image superresolution fail to simultaneously consider the geometrical structure of the dictionary and the corresponding coefficients, which may result in noticeable superresolution reconstruction artifacts. In other words, when a low-resolution image and its corresponding high-resolution image are represented in their feature spaces, the two sets of dictionaries and the obtained coefficients have intrinsic links, which has not yet been well studied. Motivated by the development on nonlocal self-similarity and manifold learning, a novel sparse coding method is reported to preserve the geometrical structure of the dictionary and the sparse coefficients of the data. Moreover, the proposed method can preserve the incoherence of dictionary entries and provide the sparse coefficients and learned dictionary from a new perspective, which have both reconstruction and discrimination properties to enhance the learning performance. Furthermore, to utilize the model of the proposed method more effectively for single-image superresolution, this paper also proposes a novel dictionary-pair learning method, which is named as two-stage dictionary training. Extensive experiments are carried out on a large set of images comparing with other popular algorithms for the same purpose, and the results clearly demonstrate the effectiveness of the proposed sparse representation model and the corresponding dictionary learning algorithm.

  16. Sensor-Based Vibration Signal Feature Extraction Using an Improved Composite Dictionary Matching Pursuit Algorithm

    PubMed Central

    Cui, Lingli; Wu, Na; Wang, Wenjing; Kang, Chenhui

    2014-01-01

    This paper presents a new method for a composite dictionary matching pursuit algorithm, which is applied to vibration sensor signal feature extraction and fault diagnosis of a gearbox. Three advantages are highlighted in the new method. First, the composite dictionary in the algorithm has been changed from multi-atom matching to single-atom matching. Compared to non-composite dictionary single-atom matching, the original composite dictionary multi-atom matching pursuit (CD-MaMP) algorithm can achieve noise reduction in the reconstruction stage, but it cannot dramatically reduce the computational cost and improve the efficiency in the decomposition stage. Therefore, the optimized composite dictionary single-atom matching algorithm (CD-SaMP) is proposed. Second, the termination condition of iteration based on the attenuation coefficient is put forward to improve the sparsity and efficiency of the algorithm, which adjusts the parameters of the termination condition constantly in the process of decomposition to avoid noise. Third, composite dictionaries are enriched with the modulation dictionary, which is one of the important structural characteristics of gear fault signals. Meanwhile, the termination condition of iteration settings, sub-feature dictionary selections and operation efficiency between CD-MaMP and CD-SaMP are discussed, aiming at gear simulation vibration signals with noise. The simulation sensor-based vibration signal results show that the termination condition of iteration based on the attenuation coefficient enhances decomposition sparsity greatly and achieves a good effect of noise reduction. Furthermore, the modulation dictionary achieves a better matching effect compared to the Fourier dictionary, and CD-SaMP has a great advantage of sparsity and efficiency compared with the CD-MaMP. The sensor-based vibration signals measured from practical engineering gearbox analyses have further shown that the CD-SaMP decomposition and reconstruction algorithm is feasible and effective. PMID:25207870

  17. Sensor-based vibration signal feature extraction using an improved composite dictionary matching pursuit algorithm.

    PubMed

    Cui, Lingli; Wu, Na; Wang, Wenjing; Kang, Chenhui

    2014-09-09

    This paper presents a new method for a composite dictionary matching pursuit algorithm, which is applied to vibration sensor signal feature extraction and fault diagnosis of a gearbox. Three advantages are highlighted in the new method. First, the composite dictionary in the algorithm has been changed from multi-atom matching to single-atom matching. Compared to non-composite dictionary single-atom matching, the original composite dictionary multi-atom matching pursuit (CD-MaMP) algorithm can achieve noise reduction in the reconstruction stage, but it cannot dramatically reduce the computational cost and improve the efficiency in the decomposition stage. Therefore, the optimized composite dictionary single-atom matching algorithm (CD-SaMP) is proposed. Second, the termination condition of iteration based on the attenuation coefficient is put forward to improve the sparsity and efficiency of the algorithm, which adjusts the parameters of the termination condition constantly in the process of decomposition to avoid noise. Third, composite dictionaries are enriched with the modulation dictionary, which is one of the important structural characteristics of gear fault signals. Meanwhile, the termination condition of iteration settings, sub-feature dictionary selections and operation efficiency between CD-MaMP and CD-SaMP are discussed, aiming at gear simulation vibration signals with noise. The simulation sensor-based vibration signal results show that the termination condition of iteration based on the attenuation coefficient enhances decomposition sparsity greatly and achieves a good effect of noise reduction. Furthermore, the modulation dictionary achieves a better matching effect compared to the Fourier dictionary, and CD-SaMP has a great advantage of sparsity and efficiency compared with the CD-MaMP. The sensor-based vibration signals measured from practical engineering gearbox analyses have further shown that the CD-SaMP decomposition and reconstruction algorithm is feasible and effective.

  18. Dictionary of environmental quotations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rodes, B.K.; Odell, R.

    1997-12-31

    Here are more than 3,700 quotations in 143 categories -- from Acid Rain to Zoos -- that provide a comprehensive collection of the wise and witty observations about the natural environment. The dictionary will delight, provoke, and inform readers. It is at once stimulating, entertaining, and enlightening, with quotations that provide a complete range of human thought about nature and the environment. Quotations have been drawn from a variety of documented sources, including poems, proverbs, slogans, radio, and television, congressional hearings, magazines, and newspapers. The authors of the quotes range from a philosopher in pre-Christian times to a contemporary economist,more » from a poet who speaks of forests to an engineer concerned with air pollutants.« less

  19. Patient-Specific Seizure Detection in Long-Term EEG Using Signal-Derived Empirical Mode Decomposition (EMD)-based Dictionary Approach.

    PubMed

    Kaleem, Muhammad; Gurve, Dharmendra; Guergachi, Aziz; Krishnan, Sridhar

    2018-06-25

    The objective of the work described in this paper is development of a computationally efficient methodology for patient-specific automatic seizure detection in long-term multi-channel EEG recordings. Approach: A novel patient-specific seizure detection approach based on signal-derived Empirical Mode Decomposition (EMD)-based dictionary approach is proposed. For this purpose, we use an empirical framework for EMD-based dictionary creation and learning, inspired by traditional dictionary learning methods, in which the EMD-based dictionary is learned from the multi-channel EEG data being analyzed for automatic seizure detection. We present the algorithm for dictionary creation and learning, whose purpose is to learn dictionaries with a small number of atoms. Using training signals belonging to seizure and non-seizure classes, an initial dictionary, termed as the raw dictionary, is formed. The atoms of the raw dictionary are composed of intrinsic mode functions obtained after decomposition of the training signals using the empirical mode decomposition algorithm. The raw dictionary is then trained using a learning algorithm, resulting in a substantial decrease in the number of atoms in the trained dictionary. The trained dictionary is then used for automatic seizure detection, such that coefficients of orthogonal projections of test signals against the trained dictionary form the features used for classification of test signals into seizure and non-seizure classes. Thus no hand-engineered features have to be extracted from the data as in traditional seizure detection approaches. Main results: The performance of the proposed approach is validated using the CHB-MIT benchmark database, and averaged accuracy, sensitivity and specificity values of 92.9%, 94.3% and 91.5%, respectively, are obtained using support vector machine classifier and five-fold cross-validation method. These results are compared with other approaches using the same database, and the suitability of the approach for seizure detection in long-term multi-channel EEG recordings is discussed. Significance: The proposed approach describes a computationally efficient method for automatic seizure detection in long-term multi-channel EEG recordings. The method does not rely on hand-engineered features, as are required in traditional approaches. Furthermore, the approach is suitable for scenarios where the dictionary once formed and trained can be used for automatic seizure detection of newly recorded data, making the approach suitable for long-term multi-channel EEG recordings. © 2018 IOP Publishing Ltd.

  20. Improved interior wall detection using designated dictionaries in compressive urban sensing problems

    NASA Astrophysics Data System (ADS)

    Lagunas, Eva; Amin, Moeness G.; Ahmad, Fauzia; Nájar, Montse

    2013-05-01

    In this paper, we address sparsity-based imaging of building interior structures for through-the-wall radar imaging and urban sensing applications. The proposed approach utilizes information about common building construction practices to form an appropriate sparse representation of the building layout. With a ground based SAR system, and considering that interior walls are either parallel or perpendicular to the exterior walls, the antenna at each position would receive reflections from the walls parallel to the radar's scan direction as well as from the corners between two meeting walls. We propose a two-step approach for wall detection and localization. In the first step, a dictionary of possible wall locations is used to recover the positions of both interior and exterior walls that are parallel to the scan direction. A follow-on step uses a dictionary of possible corner reflectors to locate wall-wall junctions along the detected wall segments, thereby determining the true wall extents and detecting walls perpendicular to the scan direction. The utility of the proposed approach is demonstrated using simulated data.

  1. Grammar Coding in the "Oxford Advanced Learner's Dictionary of Current English."

    ERIC Educational Resources Information Center

    Wekker, Herman

    1992-01-01

    Focuses on the revised system of grammar coding for verbs in the fourth edition of the "Oxford Advanced Learner's Dictionary of Current English" (OALD4), comparing it with two other similar dictionaries. It is shown that the OALD4 is found to be more favorable on many criteria than the other comparable dictionaries. (16 references) (VWL)

  2. A Study on the Use of Mobile Dictionaries in Vocabulary Teaching

    ERIC Educational Resources Information Center

    Aslan, Erdinç

    2016-01-01

    In recent years, rapid developments in technology have placed books and notebooks into the mobile phones and tablets and also the dictionaries into these small boxes. Giant dictionaries, which we once barely managed to carry, have been replaced by mobile dictionaries through which we can reach any words we want with only few touches. Mobile…

  3. Letters to a Dictionary: Competing Views of Language in the Reception of "Webster's Third New International Dictionary"

    ERIC Educational Resources Information Center

    Bello, Anne Pence

    2013-01-01

    The publication of "Webster's Third New International Dictionary" in September 1961 set off a national controversy about dictionaries and language that ultimately included issues related to linguistics and English education. The negative reviews published in the press about the "Third" have shaped beliefs about the nature of…

  4. Effects of Printed, Pocket Electronic, and Online Dictionaries on High School Students' English Vocabulary Retention

    ERIC Educational Resources Information Center

    Chiu, Li-Ling; Liu, Gi-Zen

    2013-01-01

    This study obtained empirical evidence regarding the effects of using printed dictionaries (PD), pocket electronic dictionaries (PED), and online type-in dictionaries (OTID) on English vocabulary retention at a junior high school. A mixed-methods research methodology was adopted in this study. Thirty-three seventh graders were asked to use all…

  5. The Efficacy of Dictionary Use while Reading for Learning New Words

    ERIC Educational Resources Information Center

    Hamilton, Harley

    2012-01-01

    The researcher investigated the use of three types of dictionaries while reading by high school students with severe to profound hearing loss. The objective of the study was to determine the effectiveness of each type of dictionary for acquiring the meanings of unknown vocabulary in text. The three types of dictionaries were (a) an online…

  6. Dictionaries Can Help Writing--If Students Know How To Use Them.

    ERIC Educational Resources Information Center

    Jacobs, George M.

    A study investigated whether instruction in how to use a dictionary led to improved second language performance and greater dictionary use among English majors (N=54) in a reading and writing course at a Thai university. One of three participating classes was instructed in the use of a monolingual learner's dictionary. A passage correction test…

  7. Dictionary-Based Tensor Canonical Polyadic Decomposition

    NASA Astrophysics Data System (ADS)

    Cohen, Jeremy Emile; Gillis, Nicolas

    2018-04-01

    To ensure interpretability of extracted sources in tensor decomposition, we introduce in this paper a dictionary-based tensor canonical polyadic decomposition which enforces one factor to belong exactly to a known dictionary. A new formulation of sparse coding is proposed which enables high dimensional tensors dictionary-based canonical polyadic decomposition. The benefits of using a dictionary in tensor decomposition models are explored both in terms of parameter identifiability and estimation accuracy. Performances of the proposed algorithms are evaluated on the decomposition of simulated data and the unmixing of hyperspectral images.

  8. Coding of adverse events of suicidality in clinical study reports of duloxetine for the treatment of major depressive disorder: descriptive study

    PubMed Central

    Tendal, Britta; Hróbjartsson, Asbjørn; Lundh, Andreas; Gøtzsche, Peter C

    2014-01-01

    Objective To assess the effects of coding and coding conventions on summaries and tabulations of adverse events data on suicidality within clinical study reports. Design Systematic electronic search for adverse events of suicidality in tables, narratives, and listings of adverse events in individual patients within clinical study reports. Where possible, for each event we extracted the original term reported by the investigator, the term as coded by the medical coding dictionary, medical coding dictionary used, and the patient’s trial identification number. Using the patient’s trial identification number, we attempted to reconcile data on the same event between the different formats for presenting data on adverse events within the clinical study report. Setting 9 randomised placebo controlled trials of duloxetine for major depressive disorder submitted to the European Medicines Agency for marketing approval. Data sources Clinical study reports obtained from the EMA in 2011. Results Six trials used the medical coding dictionary COSTART (Coding Symbols for a Thesaurus of Adverse Reaction Terms) and three used MedDRA (Medical Dictionary for Regulatory Activities). Suicides were clearly identifiable in all formats of adverse event data in clinical study reports. Suicide attempts presented in tables included both definitive and provisional diagnoses. Suicidal ideation and preparatory behaviour were obscured in some tables owing to the lack of specificity of the medical coding dictionary, especially COSTART. Furthermore, we found one event of suicidal ideation described in narrative text that was absent from tables and adverse event listings of individual patients. The reason for this is unclear, but may be due to the coding conventions used. Conclusion Data on adverse events in tables in clinical study reports may not accurately represent the underlying patient data because of the medical dictionaries and coding conventions used. In clinical study reports, the listings of adverse events for individual patients and narratives of adverse events can provide additional information, including original investigator reported adverse event terms, which can enable a more accurate estimate of harms. PMID:24899651

  9. Vehicle Information Exchange Needs for Mobility Applications

    DOT National Transportation Integrated Search

    2012-02-13

    Connected Vehicle to Vehicle (V2V) safety applications heavily rely on the BSM, which is one of the messages defined in the Society of Automotive standard J2735, Dedicated Short Range Communications (DSRC) Message Set Dictionary, November 2009. The B...

  10. CUHK Papers in Linguistics, Number 4.

    ERIC Educational Resources Information Center

    Tang, Gladys, Ed.

    1993-01-01

    Papers in this issue include the following: "Code-Mixing in Hongkong Cantonese-English Bilinguals: Constraints and Processes" (Brian Chan Hok-shing); "Information on Quantifiers and Argument Structure in English Learner's Dictionaries" (Thomas Hun-tak Lee); "Systematic Variability: In Search of a Linguistic…

  11. Documentation in Education.

    ERIC Educational Resources Information Center

    Burke, Arvid J.; Burke, Mary A.

    After a summary of background knowledge useful in searching for information, the authors cover extensively the sources available to the researcher interested in locating educational data or conducting a search of bibliographic materials. They list reference books, dictionaries, almanacs, yearbooks, subject matter summaries; and sources for…

  12. Intelligent Transportation Systems (ITS) logical architecture : volume 3 : data dictionary

    DOT National Transportation Integrated Search

    1982-01-01

    A Guide to Reporting Highway Statistics is a principal part of Federal Highway Administration's comprehensive highway information collection effort. This Guide has two objectives: 1) To serve as a reference to the reporting system that the Federal Hi...

  13. Reference Collections and Standards.

    ERIC Educational Resources Information Center

    Winkel, Lois

    1999-01-01

    Reviews six reference materials for young people: "The New York Public Library Kid's Guide to Research"; "National Audubon Society First Field Guide. Mammals"; "Star Wars: The Visual Dictionary"; "Encarta Africana"; "World Fact Book, 1998"; and "Factastic Book of 1001 Lists". Includes ordering information.(AEF)

  14. The Art of Creating Enjoyable Career Counseling Classes.

    ERIC Educational Resources Information Center

    Miller, Mark J.; Soper, Barlow

    1982-01-01

    Provides counselor educators with some specific suggestions for making career counseling classes enjoyable and educational. Discusses lecture topics, small group activities, creative use of Dictionary of Occupational Titles, evaluation of occupational information, and imaginative ways of interpreting test results. (RC)

  15. [Old English plant names from the linguistic and lexicographic viewpoint].

    PubMed

    Sauer, Hans; Krischke, Ulrike

    2004-01-01

    Roughly 1350 Old English plant names have come down to us; this is a relatively large number considering that the attested Old English vocabulary comprises ca. 24 000 words. The plant names are not only interesting for botanists, historians of medicine and many others, but also for philologists and linguists; among other aspects they can investigate their etymology, their morphology (including word-formation) and their meaning and motivation. Practically all Old English texts where plant names occur have been edited (including glosses and glossaries), the names have been listed in the Old English dictionaries, and some specific studies have been devoted to them. Nevertheless no comprehensive systematic analysis of their linguistic structure has been made. Ulrike Krischke is preparing such an analysis. A proper dictionary of the Old English plant names is also a desideratum, especially since the Old English dictionaries available and in progress normally do not deal with morphological and semantic aspects, and many do not provide etymological information. A plant-name dictionary concentrating on this information is being prepared by Hans Sauer and Ulrike Krischke. In our article here, we sketch the state of the art (ch. 1), we deal with some problems of the analysis of Old English plant names (ch. 2), e.g. the delimitation of the word-field plant names, the identification of the plants, errors and problematic spellings in the manuscripts. In ch. 3 we sketch the etymological structure according to chronological layers (Indo-European, Germanic, West-Germanic, Old English) as well as according to the distinction between native words and loan-words; in the latter category, we also mention loan-formations based on Latin models. In ch. 4 we survey the morphological aspects (simplex vs. complex words); among the complex nouns, compounds are by far the largest group (and among those, the noun + noun compounds), but there are also a few suffix formations. We also briefly present some morphological peculiarities, e.g. formations with blocked (unique) morphemes, the question of homonyms, cases of obscuration and of popular etymology. In ch. 5 we outline semantic structures, and in ch. 6, we introduce the structure of the proposed dictionary of the Old English plant names, also providing several specimen entries.

  16. Orthogonal Procrustes Analysis for Dictionary Learning in Sparse Linear Representation

    PubMed Central

    Grossi, Giuliano; Lin, Jianyi

    2017-01-01

    In the sparse representation model, the design of overcomplete dictionaries plays a key role for the effectiveness and applicability in different domains. Recent research has produced several dictionary learning approaches, being proven that dictionaries learnt by data examples significantly outperform structured ones, e.g. wavelet transforms. In this context, learning consists in adapting the dictionary atoms to a set of training signals in order to promote a sparse representation that minimizes the reconstruction error. Finding the best fitting dictionary remains a very difficult task, leaving the question still open. A well-established heuristic method for tackling this problem is an iterative alternating scheme, adopted for instance in the well-known K-SVD algorithm. Essentially, it consists in repeating two stages; the former promotes sparse coding of the training set and the latter adapts the dictionary to reduce the error. In this paper we present R-SVD, a new method that, while maintaining the alternating scheme, adopts the Orthogonal Procrustes analysis to update the dictionary atoms suitably arranged into groups. Comparative experiments on synthetic data prove the effectiveness of R-SVD with respect to well known dictionary learning algorithms such as K-SVD, ILS-DLA and the online method OSDL. Moreover, experiments on natural data such as ECG compression, EEG sparse representation, and image modeling confirm R-SVD’s robustness and wide applicability. PMID:28103283

  17. Personalized Age Progression with Bi-Level Aging Dictionary Learning.

    PubMed

    Shu, Xiangbo; Tang, Jinhui; Li, Zechao; Lai, Hanjiang; Zhang, Liyan; Yan, Shuicheng

    2018-04-01

    Age progression is defined as aesthetically re-rendering the aging face at any future age for an individual face. In this work, we aim to automatically render aging faces in a personalized way. Basically, for each age group, we learn an aging dictionary to reveal its aging characteristics (e.g., wrinkles), where the dictionary bases corresponding to the same index yet from two neighboring aging dictionaries form a particular aging pattern cross these two age groups, and a linear combination of all these patterns expresses a particular personalized aging process. Moreover, two factors are taken into consideration in the dictionary learning process. First, beyond the aging dictionaries, each person may have extra personalized facial characteristics, e.g., mole, which are invariant in the aging process. Second, it is challenging or even impossible to collect faces of all age groups for a particular person, yet much easier and more practical to get face pairs from neighboring age groups. To this end, we propose a novel Bi-level Dictionary Learning based Personalized Age Progression (BDL-PAP) method. Here, bi-level dictionary learning is formulated to learn the aging dictionaries based on face pairs from neighboring age groups. Extensive experiments well demonstrate the advantages of the proposed BDL-PAP over other state-of-the-arts in term of personalized age progression, as well as the performance gain for cross-age face verification by synthesizing aging faces.

  18. U.S.-MEXICO BORDER PROGRAM ARIZONA BORDER STUDY--STANDARD OPERATING PROCEDURE FOR THE GENERATION AND OPERATION OF DATA DICTIONARIES (UA-D-4.0)

    EPA Science Inventory

    The purpose of this SOP is to provide a standard method for the writing of data dictionaries. This procedure applies to the dictionaries used during the Arizona NHEXAS project and the Border study. Keywords: guidelines; data dictionaries.

    The U.S.-Mexico Border Program is spon...

  19. Multimodal Task-Driven Dictionary Learning for Image Classification

    DTIC Science & Technology

    2015-12-18

    1 Multimodal Task-Driven Dictionary Learning for Image Classification Soheil Bahrampour, Student Member, IEEE, Nasser M. Nasrabadi, Fellow, IEEE...Asok Ray, Fellow, IEEE, and W. Kenneth Jenkins, Life Fellow, IEEE Abstract— Dictionary learning algorithms have been suc- cessfully used for both...reconstructive and discriminative tasks, where an input signal is represented with a sparse linear combination of dictionary atoms. While these methods are

  20. Accelerating the reconstruction of magnetic resonance imaging by three-dimensional dual-dictionary learning using CUDA.

    PubMed

    Jiansen Li; Jianqi Sun; Ying Song; Yanran Xu; Jun Zhao

    2014-01-01

    An effective way to improve the data acquisition speed of magnetic resonance imaging (MRI) is using under-sampled k-space data, and dictionary learning method can be used to maintain the reconstruction quality. Three-dimensional dictionary trains the atoms in dictionary in the form of blocks, which can utilize the spatial correlation among slices. Dual-dictionary learning method includes a low-resolution dictionary and a high-resolution dictionary, for sparse coding and image updating respectively. However, the amount of data is huge for three-dimensional reconstruction, especially when the number of slices is large. Thus, the procedure is time-consuming. In this paper, we first utilize the NVIDIA Corporation's compute unified device architecture (CUDA) programming model to design the parallel algorithms on graphics processing unit (GPU) to accelerate the reconstruction procedure. The main optimizations operate in the dictionary learning algorithm and the image updating part, such as the orthogonal matching pursuit (OMP) algorithm and the k-singular value decomposition (K-SVD) algorithm. Then we develop another version of CUDA code with algorithmic optimization. Experimental results show that more than 324 times of speedup is achieved compared with the CPU-only codes when the number of MRI slices is 24.

  1. A Knowledge Dictionary System for Scheduling Support

    DTIC Science & Technology

    1988-10-01

    quantity of the resource is allocated; (g) two (or more) activities that conflict temporall , can only proceed if one or more of the activities are re...does not allow any parameter substitution but merely processes the contents of the file as a series of I:eystrokes. To create a macro, simply type...bytes in the system. When the hardware actually uses such a struc- ture (e.g., the Motorola 68000 series CPU) the OS will almost always present it that

  2. SU-E-J-212: Identifying Bones From MRI: A Dictionary Learnign and Sparse Regression Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruan, D; Yang, Y; Cao, M

    2014-06-01

    Purpose: To develop an efficient and robust scheme to identify bony anatomy based on MRI-only simulation images. Methods: MRI offers important soft tissue contrast and functional information, yet its lack of correlation to electron-density has placed it as an auxiliary modality to CT in radiotherapy simulation and adaptation. An effective scheme to identify bony anatomy is an important first step towards MR-only simulation/treatment paradigm and would satisfy most practical purposes. We utilize a UTE acquisition sequence to achieve visibility of the bone. By contrast to manual + bulk or registration-to identify bones, we propose a novel learning-based approach for improvedmore » robustness to MR artefacts and environmental changes. Specifically, local information is encoded with MR image patch, and the corresponding label is extracted (during training) from simulation CT aligned to the UTE. Within each class (bone vs. nonbone), an overcomplete dictionary is learned so that typical patches within the proper class can be represented as a sparse combination of the dictionary entries. For testing, an acquired UTE-MRI is divided to patches using a sliding scheme, where each patch is sparsely regressed against both bone and nonbone dictionaries, and subsequently claimed to be associated with the class with the smaller residual. Results: The proposed method has been applied to the pilot site of brain imaging and it has showed general good performance, with dice similarity coefficient of greater than 0.9 in a crossvalidation study using 4 datasets. Importantly, it is robust towards consistent foreign objects (e.g., headset) and the artefacts relates to Gibbs and field heterogeneity. Conclusion: A learning perspective has been developed for inferring bone structures based on UTE MRI. The imaging setting is subject to minimal motion effects and the post-processing is efficient. The improved efficiency and robustness enables a first translation to MR-only routine. The scheme generalizes to multiple tissue classes.« less

  3. Stemming Malay Text and Its Application in Automatic Text Categorization

    NASA Astrophysics Data System (ADS)

    Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

    In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.

  4. Inuit interpreters engaged in end-of-life care in Nunavik, Northern Quebec.

    PubMed

    Hordyk, Shawn Renee; Macdonald, Mary Ellen; Brassard, Paul

    2017-01-01

    Inuit interpreters are key players in end-of-life (EOL) care for Nunavik patients and families. This emotionally intensive work requires expertise in French, English and Inuit dialects to negotiate linguistic and cultural challenges. Cultural differences among medical institutions and Inuit communities can lead to value conflicts and moral dilemmas as interpreters navigate how best to transmit messages of care at EOL. Our goal was to understand the experience of Inuit interpreters in the context of EOL care in Nunavik in order to identify training needs. In the context of a larger ethnographic project on EOL care in Nunavik, we met with 24 current and former interpreters from local health centres and Montreal tertiary care contexts. Data included informal and formal interviews focusing on linguistic resources, experiences concerning EOL care, and suggestions for the development of interpretation training. Inuit working as interpreters in Nunavik are hired to provide multiple services of which interpretation plays only a part. Many have no formal training and have few resources (e.g. visual aids, dictionaries) to draw upon during medical consultations. Given the small size of communities, many interpreters personally know their clients and often feel overwhelmed by moral dilemmas when translating EOL information for patients and families. The concept of moral distress is a helpful lens to make sense of their experience, including personal and professional repercussions. Inuit interpreters in Nunavik are working with little training yet in context with multiple linguistic and cultural challenges. Linguistic and cultural resources and focused training on moral dilemmas unique to circumpolar contexts could contribute to improved work conditions and ultimately to patient care.​​​​.

  5. Information Storage and Retrieval...Reports on Text Analysis, Dynamic Indexing, Feedback Searches, Dictionary Construction and File Organization.

    ERIC Educational Resources Information Center

    Salton, Gerald; And Others

    The present report is the twenty-first in a series describing research in information storage and retrieval conducted by the Department of Computer Science at Cornell University. The report covering work carried out by the SMART project for approximately two years (summer 1970 to summer 1972) is separated into five parts: automatic content…

  6. A Review of Enterprise Architecture Use in Defence

    DTIC Science & Technology

    2014-09-01

    dictionary of terms; • architecture description language; • architectural information (pertaining both to specific projects and higher level...UNCLASSIFIED 59 Z39.19 2005 Monolingual Controlled Vocabularies, National Information Standards Organisation, Bethesda: NISO Press, 2005. BABOK 2009...togaf/ Z39.19 2005 ANSI/NISO Z39.19 – Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, Bethesda: NISO

  7. Novel Spectral Representations and Sparsity-Driven Algorithms for Shape Modeling and Analysis

    NASA Astrophysics Data System (ADS)

    Zhong, Ming

    In this dissertation, we focus on extending classical spectral shape analysis by incorporating spectral graph wavelets and sparsity-seeking algorithms. Defined with the graph Laplacian eigenbasis, the spectral graph wavelets are localized both in the vertex domain and graph spectral domain, and thus are very effective in describing local geometry. With a rich dictionary of elementary vectors and forcing certain sparsity constraints, a real life signal can often be well approximated by a very sparse coefficient representation. The many successful applications of sparse signal representation in computer vision and image processing inspire us to explore the idea of employing sparse modeling techniques with dictionary of spectral basis to solve various shape modeling problems. Conventional spectral mesh compression uses the eigenfunctions of mesh Laplacian as shape bases, which are highly inefficient in representing local geometry. To ameliorate, we advocate an innovative approach to 3D mesh compression using spectral graph wavelets as dictionary to encode mesh geometry. The spectral graph wavelets are locally defined at individual vertices and can better capture local shape information than Laplacian eigenbasis. The multi-scale SGWs form a redundant dictionary as shape basis, so we formulate the compression of 3D shape as a sparse approximation problem that can be readily handled by greedy pursuit algorithms. Surface inpainting refers to the completion or recovery of missing shape geometry based on the shape information that is currently available. We devise a new surface inpainting algorithm founded upon the theory and techniques of sparse signal recovery. Instead of estimating the missing geometry directly, our novel method is to find this low-dimensional representation which describes the entire original shape. More specifically, we find that, for many shapes, the vertex coordinate function can be well approximated by a very sparse coefficient representation with respect to the dictionary comprising its Laplacian eigenbasis, and it is then possible to recover this sparse representation from partial measurements of the original shape. Taking advantage of the sparsity cue, we advocate a novel variational approach for surface inpainting, integrating data fidelity constraints on the shape domain with coefficient sparsity constraints on the transformed domain. Because of the powerful properties of Laplacian eigenbasis, the inpainting results of our method tend to be globally coherent with the remaining shape. Informative and discriminative feature descriptors are vital in qualitative and quantitative shape analysis for a large variety of graphics applications. We advocate novel strategies to define generalized, user-specified features on shapes. Our new region descriptors are primarily built upon the coefficients of spectral graph wavelets that are both multi-scale and multi-level in nature, consisting of both local and global information. Based on our novel spectral feature descriptor, we developed a user-specified feature detection framework and a tensor-based shape matching algorithm. Through various experiments, we demonstrate the competitive performance of our proposed methods and the great potential of spectral basis and sparsity-driven methods for shape modeling.

  8. Learning a common dictionary for subject-transfer decoding with resting calibration.

    PubMed

    Morioka, Hiroshi; Kanemura, Atsunori; Hirayama, Jun-ichiro; Shikauchi, Manabu; Ogawa, Takeshi; Ikeda, Shigeyuki; Kawanabe, Motoaki; Ishii, Shin

    2015-05-01

    Brain signals measured over a series of experiments have inherent variability because of different physical and mental conditions among multiple subjects and sessions. Such variability complicates the analysis of data from multiple subjects and sessions in a consistent way, and degrades the performance of subject-transfer decoding in a brain-machine interface (BMI). To accommodate the variability in brain signals, we propose 1) a method for extracting spatial bases (or a dictionary) shared by multiple subjects, by employing a signal-processing technique of dictionary learning modified to compensate for variations between subjects and sessions, and 2) an approach to subject-transfer decoding that uses the resting-state activity of a previously unseen target subject as calibration data for compensating for variations, eliminating the need for a standard calibration based on task sessions. Applying our methodology to a dataset of electroencephalography (EEG) recordings during a selective visual-spatial attention task from multiple subjects and sessions, where the variability compensation was essential for reducing the redundancy of the dictionary, we found that the extracted common brain activities were reasonable in the light of neuroscience knowledge. The applicability to subject-transfer decoding was confirmed by improved performance over existing decoding methods. These results suggest that analyzing multisubject brain activities on common bases by the proposed method enables information sharing across subjects with low-burden resting calibration, and is effective for practical use of BMI in variable environments. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Beyond Readability: Investigating Coherence of Clinical Text for Consumers

    PubMed Central

    Hetzel, Scott; Dalrymple, Prudence; Keselman, Alla

    2011-01-01

    Background A basic tenet of consumer health informatics is that understandable health resources empower the public. Text comprehension holds great promise for helping to characterize consumer problems in understanding health texts. The need for efficient ways to assess consumer-oriented health texts and the availability of computationally supported tools led us to explore the effect of various text characteristics on readers’ understanding of health texts, as well as to develop novel approaches to assessing these characteristics. Objective The goal of this study was to compare the impact of two different approaches to enhancing readability, and three interventions, on individuals’ comprehension of short, complex passages of health text. Methods Participants were 80 university staff, faculty, or students. Each participant was asked to “retell” the content of two health texts: one a clinical trial in the domain of diabetes mellitus, and the other typical Visit Notes. These texts were transformed for the intervention arms of the study. Two interventions provided terminology support via (1) standard dictionary or (2) contextualized vocabulary definitions. The third intervention provided coherence improvement. We assessed participants’ comprehension of the clinical texts through propositional analysis, an open-ended questionnaire, and analysis of the number of errors made. Results For the clinical trial text, the effect of text condition was not significant in any of the comparisons, suggesting no differences in recall, despite the varying levels of support (P = .84). For the Visit Note, however, the difference in the median total propositions recalled between the Coherent and the (Original + Dictionary) conditions was significant (P = .04). This suggests that participants in the Coherent condition recalled more of the original Visit Notes content than did participants in the Original and the Dictionary conditions combined. However, no difference was seen between (Original + Dictionary) and Vocabulary (P = .36) nor Coherent and Vocabulary (P = .62). No statistically significant effect of any document transformation was found either in the open-ended questionnaire (clinical trial: P = .86, Visit Note: P = .20) or in the error rate (clinical trial: P = .47, Visit Note: P = .25). However, post hoc power analysis suggested that increasing the sample size by approximately 6 participants per condition would result in a significant difference for the Visit Note, but not for the clinical trial text. Conclusions Statistically, the results of this study attest that improving coherence has a small effect on consumer comprehension of clinical text, but the task is extremely labor intensive and not scalable. Further research is needed using texts from more diverse clinical domains and more heterogeneous participants, including actual patients. Since comprehensibility of clinical text appears difficult to automate, informatics support tools may most productively support the health care professionals tasked with making clinical information understandable to patients. PMID:22138127

  10. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining.

    PubMed

    Hettne, Kristina M; Williams, Antony J; van Mulligen, Erik M; Kleinjans, Jos; Tkachenko, Valery; Kors, Jan A

    2010-03-23

    Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist.

  11. Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining

    PubMed Central

    2010-01-01

    Background Previously, we developed a combined dictionary dubbed Chemlist for the identification of small molecules and drugs in text based on a number of publicly available databases and tested it on an annotated corpus. To achieve an acceptable recall and precision we used a number of automatic and semi-automatic processing steps together with disambiguation rules. However, it remained to be investigated which impact an extensive manual curation of a multi-source chemical dictionary would have on chemical term identification in text. ChemSpider is a chemical database that has undergone extensive manual curation aimed at establishing valid chemical name-to-structure relationships. Results We acquired the component of ChemSpider containing only manually curated names and synonyms. Rule-based term filtering, semi-automatic manual curation, and disambiguation rules were applied. We tested the dictionary from ChemSpider on an annotated corpus and compared the results with those for the Chemlist dictionary. The ChemSpider dictionary of ca. 80 k names was only a 1/3 to a 1/4 the size of Chemlist at around 300 k. The ChemSpider dictionary had a precision of 0.43 and a recall of 0.19 before the application of filtering and disambiguation and a precision of 0.87 and a recall of 0.19 after filtering and disambiguation. The Chemlist dictionary had a precision of 0.20 and a recall of 0.47 before the application of filtering and disambiguation and a precision of 0.67 and a recall of 0.40 after filtering and disambiguation. Conclusions We conclude the following: (1) The ChemSpider dictionary achieved the best precision but the Chemlist dictionary had a higher recall and the best F-score; (2) Rule-based filtering and disambiguation is necessary to achieve a high precision for both the automatically generated and the manually curated dictionary. ChemSpider is available as a web service at http://www.chemspider.com/ and the Chemlist dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web at http://www.biosemantics.org/chemlist. PMID:20331846

  12. Vehicle Information Exchange Needs for Mobility Applications : Version 2.0

    DOT National Transportation Integrated Search

    2012-08-01

    Connected Vehicle to Vehicle (V2V) safety applications heavily rely on the BSM, which is one of the messages defined in the Society of Automotive standard J2735, Dedicated Short Range Communications (DSRC) Message Set Dictionary, November 2009. The B...

  13. Developing a distributed data dictionary service

    NASA Technical Reports Server (NTRS)

    U'Ren, J.

    2000-01-01

    This paper will explore the use of the Lightweight Directory Access Protocol (LDAP) using the ISO 11179 Data Dictionary Schema as a mechanism for standardizing the structure and communication links between data dictionaries.

  14. Multiple Sparse Representations Classification

    PubMed Central

    Plenge, Esben; Klein, Stefan S.; Niessen, Wiro J.; Meijering, Erik

    2015-01-01

    Sparse representations classification (SRC) is a powerful technique for pixelwise classification of images and it is increasingly being used for a wide variety of image analysis tasks. The method uses sparse representation and learned redundant dictionaries to classify image pixels. In this empirical study we propose to further leverage the redundancy of the learned dictionaries to achieve a more accurate classifier. In conventional SRC, each image pixel is associated with a small patch surrounding it. Using these patches, a dictionary is trained for each class in a supervised fashion. Commonly, redundant/overcomplete dictionaries are trained and image patches are sparsely represented by a linear combination of only a few of the dictionary elements. Given a set of trained dictionaries, a new patch is sparse coded using each of them, and subsequently assigned to the class whose dictionary yields the minimum residual energy. We propose a generalization of this scheme. The method, which we call multiple sparse representations classification (mSRC), is based on the observation that an overcomplete, class specific dictionary is capable of generating multiple accurate and independent estimates of a patch belonging to the class. So instead of finding a single sparse representation of a patch for each dictionary, we find multiple, and the corresponding residual energies provides an enhanced statistic which is used to improve classification. We demonstrate the efficacy of mSRC for three example applications: pixelwise classification of texture images, lumen segmentation in carotid artery magnetic resonance imaging (MRI), and bifurcation point detection in carotid artery MRI. We compare our method with conventional SRC, K-nearest neighbor, and support vector machine classifiers. The results show that mSRC outperforms SRC and the other reference methods. In addition, we present an extensive evaluation of the effect of the main mSRC parameters: patch size, dictionary size, and sparsity level. PMID:26177106

  15. Sparsity and Nullity: Paradigm for Analysis Dictionary Learning

    DTIC Science & Technology

    2016-08-09

    16. SECURITY CLASSIFICATION OF: Sparse models in dictionary learning have been successfully applied in a wide variety of machine learning and...we investigate the relation between the SNS problem and the analysis dictionary learning problem, and show that the SNS problem plays a central role...and may be utilized to solve dictionary learning problems. 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND SUBTITLE 13. SUPPLEMENTARY NOTES 12

  16. Readers' opinions of romantic poetry are consistent with emotional measures based on the Dictionary of Affect in Language.

    PubMed

    Whissell, Cynthia

    2003-06-01

    A principal components analysis of 68 volunteers' subjective ratings of 20 excerpts of Romantic poetry and of Dictionary of Affect scores for the same excerpts produced four components representing Pleasantness, Activation, Romanticism, and Nature. Dictionary measures and subjective ratings of the same constructs loaded on the same factor. Results are interpreted as providing construct validity for the Dictionary of Affect.

  17. Fast dictionary-based reconstruction for diffusion spectrum imaging.

    PubMed

    Bilgic, Berkin; Chatnuntawech, Itthi; Setsompop, Kawin; Cauley, Stephen F; Yendiki, Anastasia; Wald, Lawrence L; Adalsteinsson, Elfar

    2013-11-01

    Diffusion spectrum imaging reveals detailed local diffusion properties at the expense of substantially long imaging times. It is possible to accelerate acquisition by undersampling in q-space, followed by image reconstruction that exploits prior knowledge on the diffusion probability density functions (pdfs). Previously proposed methods impose this prior in the form of sparsity under wavelet and total variation transforms, or under adaptive dictionaries that are trained on example datasets to maximize the sparsity of the representation. These compressed sensing (CS) methods require full-brain processing times on the order of hours using MATLAB running on a workstation. This work presents two dictionary-based reconstruction techniques that use analytical solutions, and are two orders of magnitude faster than the previously proposed dictionary-based CS approach. The first method generates a dictionary from the training data using principal component analysis (PCA), and performs the reconstruction in the PCA space. The second proposed method applies reconstruction using pseudoinverse with Tikhonov regularization with respect to a dictionary. This dictionary can either be obtained using the K-SVD algorithm, or it can simply be the training dataset of pdfs without any training. All of the proposed methods achieve reconstruction times on the order of seconds per imaging slice, and have reconstruction quality comparable to that of dictionary-based CS algorithm.

  18. Password-only authenticated three-party key exchange proven secure against insider dictionary attacks.

    PubMed

    Nam, Junghyun; Choo, Kim-Kwang Raymond; Paik, Juryon; Won, Dongho

    2014-01-01

    While a number of protocols for password-only authenticated key exchange (PAKE) in the 3-party setting have been proposed, it still remains a challenging task to prove the security of a 3-party PAKE protocol against insider dictionary attacks. To the best of our knowledge, there is no 3-party PAKE protocol that carries a formal proof, or even definition, of security against insider dictionary attacks. In this paper, we present the first 3-party PAKE protocol proven secure against both online and offline dictionary attacks as well as insider and outsider dictionary attacks. Our construct can be viewed as a protocol compiler that transforms any 2-party PAKE protocol into a 3-party PAKE protocol with 2 additional rounds of communication. We also present a simple and intuitive approach of formally modelling dictionary attacks in the password-only 3-party setting, which significantly reduces the complexity of proving the security of 3-party PAKE protocols against dictionary attacks. In addition, we investigate the security of the well-known 3-party PAKE protocol, called GPAKE, due to Abdalla et al. (2005, 2006), and demonstrate that the security of GPAKE against online dictionary attacks depends heavily on the composition of its two building blocks, namely a 2-party PAKE protocol and a 3-party key distribution protocol.

  19. Adaptive Greedy Dictionary Selection for Web Media Summarization.

    PubMed

    Cong, Yang; Liu, Ji; Sun, Gan; You, Quanzeng; Li, Yuncheng; Luo, Jiebo

    2017-01-01

    Initializing an effective dictionary is an indispensable step for sparse representation. In this paper, we focus on the dictionary selection problem with the objective to select a compact subset of basis from original training data instead of learning a new dictionary matrix as dictionary learning models do. We first design a new dictionary selection model via l 2,0 norm. For model optimization, we propose two methods: one is the standard forward-backward greedy algorithm, which is not suitable for large-scale problems; the other is based on the gradient cues at each forward iteration and speeds up the process dramatically. In comparison with the state-of-the-art dictionary selection models, our model is not only more effective and efficient, but also can control the sparsity. To evaluate the performance of our new model, we select two practical web media summarization problems: 1) we build a new data set consisting of around 500 users, 3000 albums, and 1 million images, and achieve effective assisted albuming based on our model and 2) by formulating the video summarization problem as a dictionary selection issue, we employ our model to extract keyframes from a video sequence in a more flexible way. Generally, our model outperforms the state-of-the-art methods in both these two tasks.

  20. Fast Dictionary-Based Reconstruction for Diffusion Spectrum Imaging

    PubMed Central

    Bilgic, Berkin; Chatnuntawech, Itthi; Setsompop, Kawin; Cauley, Stephen F.; Yendiki, Anastasia; Wald, Lawrence L.; Adalsteinsson, Elfar

    2015-01-01

    Diffusion Spectrum Imaging (DSI) reveals detailed local diffusion properties at the expense of substantially long imaging times. It is possible to accelerate acquisition by undersampling in q-space, followed by image reconstruction that exploits prior knowledge on the diffusion probability density functions (pdfs). Previously proposed methods impose this prior in the form of sparsity under wavelet and total variation (TV) transforms, or under adaptive dictionaries that are trained on example datasets to maximize the sparsity of the representation. These compressed sensing (CS) methods require full-brain processing times on the order of hours using Matlab running on a workstation. This work presents two dictionary-based reconstruction techniques that use analytical solutions, and are two orders of magnitude faster than the previously proposed dictionary-based CS approach. The first method generates a dictionary from the training data using Principal Component Analysis (PCA), and performs the reconstruction in the PCA space. The second proposed method applies reconstruction using pseudoinverse with Tikhonov regularization with respect to a dictionary. This dictionary can either be obtained using the K-SVD algorithm, or it can simply be the training dataset of pdfs without any training. All of the proposed methods achieve reconstruction times on the order of seconds per imaging slice, and have reconstruction quality comparable to that of dictionary-based CS algorithm. PMID:23846466

  1. KARL: A Knowledge-Assisted Retrieval Language. M.S. Thesis Final Report, 1 Jul. 1985 - 31 Dec. 1987

    NASA Technical Reports Server (NTRS)

    Dominick, Wayne D. (Editor); Triantafyllopoulos, Spiros

    1985-01-01

    Data classification and storage are tasks typically performed by application specialists. In contrast, information users are primarily non-computer specialists who use information in their decision-making and other activities. Interaction efficiency between such users and the computer is often reduced by machine requirements and resulting user reluctance to use the system. This thesis examines the problems associated with information retrieval for non-computer specialist users, and proposes a method for communicating in restricted English that uses knowledge of the entities involved, relationships between entities, and basic English language syntax and semantics to translate the user requests into formal queries. The proposed method includes an intelligent dictionary, syntax and semantic verifiers, and a formal query generator. In addition, the proposed system has a learning capability that can improve portability and performance. With the increasing demand for efficient human-machine communication, the significance of this thesis becomes apparent. As human resources become more valuable, software systems that will assist in improving the human-machine interface will be needed and research addressing new solutions will be of utmost importance. This thesis presents an initial design and implementation as a foundation for further research and development into the emerging field of natural language database query systems.

  2. The Pocket Dictionary: A Textbook for Spelling.

    ERIC Educational Resources Information Center

    Doggett, Maran

    1982-01-01

    Reports on a productive approach to secondary-school spelling instruction--one that emphasizes how and when to use the dictionary. Describes two of the many class activities that cultivate student use of the dictionary. (RL)

  3. Cheap Words: A Paperback Dictionary Roundup.

    ERIC Educational Resources Information Center

    Kister, Ken

    1979-01-01

    Surveys currently available paperback editions in three classes of dictionaries: collegiate, abridged, and pocket. A general discussion distinguishes among the classes and offers seven consumer tips, followed by an annotated listing of dictionaries now available. (SW)

  4. On A Nonlinear Generalization of Sparse Coding and Dictionary Learning.

    PubMed

    Xie, Yuchen; Ho, Jeffrey; Vemuri, Baba

    2013-01-01

    Existing dictionary learning algorithms are based on the assumption that the data are vectors in an Euclidean vector space ℝ d , and the dictionary is learned from the training data using the vector space structure of ℝ d and its Euclidean L 2 -metric. However, in many applications, features and data often originated from a Riemannian manifold that does not support a global linear (vector space) structure. Furthermore, the extrinsic viewpoint of existing dictionary learning algorithms becomes inappropriate for modeling and incorporating the intrinsic geometry of the manifold that is potentially important and critical to the application. This paper proposes a novel framework for sparse coding and dictionary learning for data on a Riemannian manifold, and it shows that the existing sparse coding and dictionary learning methods can be considered as special (Euclidean) cases of the more general framework proposed here. We show that both the dictionary and sparse coding can be effectively computed for several important classes of Riemannian manifolds, and we validate the proposed method using two well-known classification problems in computer vision and medical imaging analysis.

  5. On A Nonlinear Generalization of Sparse Coding and Dictionary Learning

    PubMed Central

    Xie, Yuchen; Ho, Jeffrey; Vemuri, Baba

    2013-01-01

    Existing dictionary learning algorithms are based on the assumption that the data are vectors in an Euclidean vector space ℝd, and the dictionary is learned from the training data using the vector space structure of ℝd and its Euclidean L2-metric. However, in many applications, features and data often originated from a Riemannian manifold that does not support a global linear (vector space) structure. Furthermore, the extrinsic viewpoint of existing dictionary learning algorithms becomes inappropriate for modeling and incorporating the intrinsic geometry of the manifold that is potentially important and critical to the application. This paper proposes a novel framework for sparse coding and dictionary learning for data on a Riemannian manifold, and it shows that the existing sparse coding and dictionary learning methods can be considered as special (Euclidean) cases of the more general framework proposed here. We show that both the dictionary and sparse coding can be effectively computed for several important classes of Riemannian manifolds, and we validate the proposed method using two well-known classification problems in computer vision and medical imaging analysis. PMID:24129583

  6. An analysis dictionary learning algorithm under a noisy data model with orthogonality constraint.

    PubMed

    Zhang, Ye; Yu, Tenglong; Wang, Wenwu

    2014-01-01

    Two common problems are often encountered in analysis dictionary learning (ADL) algorithms. The first one is that the original clean signals for learning the dictionary are assumed to be known, which otherwise need to be estimated from noisy measurements. This, however, renders a computationally slow optimization process and potentially unreliable estimation (if the noise level is high), as represented by the Analysis K-SVD (AK-SVD) algorithm. The other problem is the trivial solution to the dictionary, for example, the null dictionary matrix that may be given by a dictionary learning algorithm, as discussed in the learning overcomplete sparsifying transform (LOST) algorithm. Here we propose a novel optimization model and an iterative algorithm to learn the analysis dictionary, where we directly employ the observed data to compute the approximate analysis sparse representation of the original signals (leading to a fast optimization procedure) and enforce an orthogonality constraint on the optimization criterion to avoid the trivial solutions. Experiments demonstrate the competitive performance of the proposed algorithm as compared with three baselines, namely, the AK-SVD, LOST, and NAAOLA algorithms.

  7. The Universal Book of Astronomy: From the Andromeda Galaxy to the Zone of Avoidance

    NASA Astrophysics Data System (ADS)

    Darling, David

    2003-10-01

    The ultimate guide to the final frontier This alphabetical tour of the universe provides all the history, science, and up-to-the-minute facts needed to explore the skies with authority. Packed with more than 3,000 entries that cover everything from major observatories and space telescopes to biographies of astronomers throughout the ages, it showcases an extraordinary array of newfound wonders, including microquasars, brown dwarfs, and dark energy, as well as a host of individual comets, asteroids, moons, planets, stars, nebulas, and galaxies. Featuring nearly 200 illustrations and eight pages of color photographs, this comprehensive guide provides easy lookup of topics and offers more in-depth information than can be found in existing star guides or astronomy dictionaries. It's an ideal resource for the amateur astronomer or anyone with an interest in the mysteries of the cosmos. David Darling, PhD (Brainerd, MN), is the author of The Complete Book of Spaceflight (0-471-05649-9) and Equations of Eternity, a New York Times Notable Book.

  8. Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes.

    PubMed

    Khalifa, Abdulrahman; Meystre, Stéphane

    2015-12-01

    The 2014 i2b2 natural language processing shared task focused on identifying cardiovascular risk factors such as high blood pressure, high cholesterol levels, obesity and smoking status among other factors found in health records of diabetic patients. In addition, the task involved detecting medications, and time information associated with the extracted data. This paper presents the development and evaluation of a natural language processing (NLP) application conceived for this i2b2 shared task. For increased efficiency, the application main components were adapted from two existing NLP tools implemented in the Apache UIMA framework: Textractor (for dictionary-based lookup) and cTAKES (for preprocessing and smoking status detection). The application achieved a final (micro-averaged) F1-measure of 87.5% on the final evaluation test set. Our attempt was mostly based on existing tools adapted with minimal changes and allowed for satisfying performance with limited development efforts. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Legal Information Sources: An Annotated Bibliography.

    ERIC Educational Resources Information Center

    Conner, Ronald C.

    This 25-page annotated bibliography describes the legal reference materials in the special collection of a medium-sized public library. Sources are listed in 12 categories: cases, dictionaries, directories, encyclopedias, forms, references for the lay person, general, indexes, laws and legislation, legal research aids, periodicals, and specialized…

  10. Applications for the environment : real-time information synthesis (AERIS) eco-signal operations : operational concept.

    DOT National Transportation Integrated Search

    2002-04-01

    The Logical Architecture is based on a Computer Aided Systems Engineering (CASE) model of the requirements for the flow of data and control through the various functions included in Intelligent Transportation Systems (ITS). Data Dictionary is the com...

  11. Business, Economics, Management Information.

    ERIC Educational Resources Information Center

    Kellogg, Edward Zip

    This annotated bibliography includes reference sources pertaining to business, economics, and management that are located in the libraries of the Portland and Gorham campuses of the University of Southern Maine. Specific reference sources are listed under the categories of: (1) indexes and abstracts; (2) dictionaries and encyclopedias, including…

  12. Titles in highly ranked multidisciplinary psychology journals 1966-2011: more words and punctuation marks allow for the communication of more information.

    PubMed

    Whissell, Cynthia

    2013-12-01

    Titles of articles in seven highly ranked multidisciplinary psychology journals for every fifth year between 1966 and 2011 (inclusive) were studied in terms of title length, word length, punctuation density, and word pleasantness, activation, and concreteness (assessed by the Dictionary of Affect in Language). Titles grew longer (by three words) and were more frequently punctuated (by one colon or comma for every other article) between 1966 and 2011. This may reflect the increasing complexity of psychology and satisfy readers' requirements for more specific information. There were significant differences among journals (e.g., titles in the Annual Review of Psychology were scored by the Dictionary of Affect as the most pleasant, and those in Psychological Bulletin as the least pleasant) and among categories of journals (e.g., titles in review journals employed longer words than those in research or association journals). Differences were stable across time and were employed to successfully categorize titles from a validation sample.

  13. A selective annotated bibliography for clinical audiology (1988-2008): reference works.

    PubMed

    Ferrer-Vinent, Susan T; Ferrer-Vinent, Ignacio J

    2009-06-01

    This is the 1st in a series of 3 planned companion articles that present a selected, annotated, and indexed bibliography of clinical audiology publications from 1988 to 2008. Research and preparation of the bibliography were based on published guidelines, professional audiology experience, and professional librarian experience. This article presents reference works (dictionaries, encyclopedias, handbooks, and manuals). The future planned articles will cover other monographs, periodicals, and online resources. Audiologists and librarians can use these lists as a guide when seeking clinical audiology literature.

  14. Sexting--it's in the dictionary.

    PubMed

    Mattey, Beth; Diliberto, Gail Mattey

    2013-03-01

    Sexting has become commonplace in our vocabulary, as commonplace as technology use is to our youth. The role of the school nurse necessitates awareness of issues surrounding sexting along with the capability to proactively educate students, staff and parents on the dangers of sexting. Students are empowered when provided the knowledge that only they control their own image. This article explores current terminology, incidence of sexting among today's youth, legal implications, as well as strategies and resources for schools to assist in dealing with sexting.

  15. University of Glasgow at TREC 2008: Experiments in Blog, Enterprise, and Relevance Feedback Tracks with Terrier

    DTIC Science & Technology

    2008-11-01

    improves our TREC 2007 dictionary -based approach by automatically building an internal opinion dictionary from the collection itself. We measure the opin...detecting opinionated documents. The first approach improves our TREC 2007 dictionary -based approach by automat- ically building an internal opinion... dictionary from the collection itself. The second approach is based on the OpinionFinder tool, which identifies subjective sentences in text. In particular

  16. Robust Multimodal Dictionary Learning

    PubMed Central

    Cao, Tian; Jojic, Vladimir; Modla, Shannon; Powell, Debbie; Czymmek, Kirk; Niethammer, Marc

    2014-01-01

    We propose a robust multimodal dictionary learning method for multimodal images. Joint dictionary learning for both modalities may be impaired by lack of correspondence between image modalities in training data, for example due to areas of low quality in one of the modalities. Dictionaries learned with such non-corresponding data will induce uncertainty about image representation. In this paper, we propose a probabilistic model that accounts for image areas that are poorly corresponding between the image modalities. We cast the problem of learning a dictionary in presence of problematic image patches as a likelihood maximization problem and solve it with a variant of the EM algorithm. Our algorithm iterates identification of poorly corresponding patches and re-finements of the dictionary. We tested our method on synthetic and real data. We show improvements in image prediction quality and alignment accuracy when using the method for multimodal image registration. PMID:24505674

  17. Dictionary Learning on the Manifold of Square Root Densities and Application to Reconstruction of Diffusion Propagator Fields*

    PubMed Central

    Sun, Jiaqi; Xie, Yuchen; Ye, Wenxing; Ho, Jeffrey; Entezari, Alireza; Blackband, Stephen J.

    2013-01-01

    In this paper, we present a novel dictionary learning framework for data lying on the manifold of square root densities and apply it to the reconstruction of diffusion propagator (DP) fields given a multi-shell diffusion MRI data set. Unlike most of the existing dictionary learning algorithms which rely on the assumption that the data points are vectors in some Euclidean space, our dictionary learning algorithm is designed to incorporate the intrinsic geometric structure of manifolds and performs better than traditional dictionary learning approaches when applied to data lying on the manifold of square root densities. Non-negativity as well as smoothness across the whole field of the reconstructed DPs is guaranteed in our approach. We demonstrate the advantage of our approach by comparing it with an existing dictionary based reconstruction method on synthetic and real multi-shell MRI data. PMID:24684004

  18. The Research on Denoising of SAR Image Based on Improved K-SVD Algorithm

    NASA Astrophysics Data System (ADS)

    Tan, Linglong; Li, Changkai; Wang, Yueqin

    2018-04-01

    SAR images often receive noise interference in the process of acquisition and transmission, which can greatly reduce the quality of images and cause great difficulties for image processing. The existing complete DCT dictionary algorithm is fast in processing speed, but its denoising effect is poor. In this paper, the problem of poor denoising, proposed K-SVD (K-means and singular value decomposition) algorithm is applied to the image noise suppression. Firstly, the sparse dictionary structure is introduced in detail. The dictionary has a compact representation and can effectively train the image signal. Then, the sparse dictionary is trained by K-SVD algorithm according to the sparse representation of the dictionary. The algorithm has more advantages in high dimensional data processing. Experimental results show that the proposed algorithm can remove the speckle noise more effectively than the complete DCT dictionary and retain the edge details better.

  19. Academic Text Features and Reading in English as a Second Language

    DTIC Science & Technology

    1989-08-29

    students rely on a number of compensatory reading strategies. Commonly cited strategies include reading assisted by an English dictionary , asking peers for...Armbruster (1985), while concerned with monolingual students, offer informative suggestions that can be extended to breach this gap. They outline...attention to text information is positively affected by signaling devices among monolingual subjects. Goldman (1988) has reported that ESL college stujoi

  20. Learning overcomplete representations from distributed data: a brief review

    NASA Astrophysics Data System (ADS)

    Raja, Haroon; Bajwa, Waheed U.

    2016-05-01

    Most of the research on dictionary learning has focused on developing algorithms under the assumption that data is available at a centralized location. But often the data is not available at a centralized location due to practical constraints like data aggregation costs, privacy concerns, etc. Using centralized dictionary learning algorithms may not be the optimal choice in such settings. This motivates the design of dictionary learning algorithms that consider distributed nature of data as one of the problem variables. Just like centralized settings, distributed dictionary learning problem can be posed in more than one way depending on the problem setup. Most notable distinguishing features are the online versus batch nature of data and the representative versus discriminative nature of the dictionaries. In this paper, several distributed dictionary learning algorithms that are designed to tackle different problem setups are reviewed. One of these algorithms is cloud K-SVD, which solves the dictionary learning problem for batch data in distributed settings. One distinguishing feature of cloud K-SVD is that it has been shown to converge to its centralized counterpart, namely, the K-SVD solution. On the other hand, no such guarantees are provided for other distributed dictionary learning algorithms. Convergence of cloud K-SVD to the centralized K-SVD solution means problems that are solvable by K-SVD in centralized settings can now be solved in distributed settings with similar performance. Finally, cloud K-SVD is used as an example to show the advantages that are attainable by deploying distributed dictionary algorithms for real world distributed datasets.

Top