folklore text corpora: Topics by Science.gov

Sample records for folklore text corpora

Stereotype and Tradition: White Folklore About Blacks (Volumes 1 and 2).

ERIC Educational Resources Information Center

Rosenberg, Neil Vandraegen

The forms of white folklore about blacks in the United States are described. This folklore appears in most folklore genres, including folk speech, proverbs, riddles, beliefs, songs and narratives. Using texts submitted to various folklore archives over a 20-year period, this study analyzes the content and context of a large group of jokes and…
The Food Code in the Yakut Culture: Semantics and Functions

ERIC Educational Resources Information Center

Gabysheva, Luiza Lvovna

2016-01-01

The relevance of researching the issue of a specific cultural meaning for a word in a folklore text is based on its being insufficiently studied and due to the importance for solving the problem of the folklore language semantic features. Yakut nominations for dairy products, which are the key words in the language of the Sakha people's folklore,…
Automatic Dictionary Expansion Using Non-parallel Corpora

NASA Astrophysics Data System (ADS)

Rapp, Reinhard; Zock, Michael

Automatically generating bilingual dictionaries from parallel, manually translated texts is a well established technique that works well in practice. However, parallel texts are a scarce resource. Therefore, it is desirable also to be able to generate dictionaries from pairs of comparable monolingual corpora. For most languages, such corpora are much easier to acquire, and often in considerably larger quantities. In this paper we present the implementation of an algorithm which exploits such corpora with good success. Based on the assumption that the co-occurrence patterns between different languages are related, it expands a small base lexicon. For improved performance, it also realizes a novel interlingua approach. That is, if corpora of more than two languages are available, the translations from one language to another can be determined not only directly, but also indirectly via a pivot language.
Mining Quality Phrases from Massive Text Corpora

PubMed Central

Liu, Jialu; Shang, Jingbo; Wang, Chi; Ren, Xiang; Han, Jiawei

2015-01-01

Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. The framework requires only limited training but the quality of phrases so generated is close to human judgment. Moreover, the method is scalable: both computation time and required space grow linearly as corpus size increases. Our experiments on large text corpora demonstrate the quality and efficiency of the new method. PMID:26705375
Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

PubMed

Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie

2013-01-16

The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields significantly better results. Before applying text-mining techniques, one must pay careful attention to the structure of the analyzed corpora. While the importance of data cleaning has been known for low-level text characteristics (e.g., encoding and spelling), high-level and difficult-to-quantify corpus characteristics, such as naturally occurring redundancy, can also hurt text mining. Fingerprinting enables text-mining techniques to leverage available data in the EHR corpus, while avoiding the bias introduced by redundancy.
Use of English Corpora as a Primary Resource to Teach English to the Bengali Learners

ERIC Educational Resources Information Center

Dash, Niladri Sekhar

2011-01-01

In this paper we argue in favour of teaching English as a second language to the Bengali learners with direct utilisation of English corpora. The proposed strategy is meant to be assisted with computer and is based on data, information, and examples retrieved from the present-day English corpora developed with various text samples composed by…
A token centric part-of-speech tagger for biomedical text.

PubMed

Barrett, Neil; Weber-Jahnke, Jens

2014-05-01

Difficulties with part-of-speech (POS) tagging of biomedical text is accessing and annotating appropriate training corpora. These difficulties may result in POS taggers trained on corpora that differ from the tagger's target biomedical text (cross-domain tagging). In such cases where training and target corpora differ tagging accuracy decreases. This paper presents a POS tagger for cross-domain tagging called TcT. TcT estimates a tag's likelihood for a given token by combining token collocation probabilities and the token's tag probabilities calculated using a Naive Bayes classifier. We compared TcT to three POS taggers used in the biomedical domain (mxpost, Brill and TnT). We trained each tagger on a non-biomedical corpus and evaluated it on biomedical corpora. TcT was more accurate in cross-domain tagging than mxpost, Brill and TnT (respective averages 83.9, 81.0, 79.5 and 78.8). Our analysis of tagger performance suggests that lexical differences between corpora have more effect on tagging accuracy than originally considered by previous research work. Biomedical POS tagging algorithms may be modified to improve their cross-domain tagging accuracy without requiring extra training or large training data sets. Future work should reexamine POS tagging methods for biomedical text. This differs from the work to date that has focused on retraining existing POS taggers. Copyright © 2014 Elsevier B.V. All rights reserved.
Proficiency Level--A Fuzzy Variable in Computer Learner Corpora

ERIC Educational Resources Information Center

Carlsen, Cecilie

2012-01-01

This article focuses on the proficiency level of texts in Computer Learner Corpora (CLCs). A claim is made that proficiency levels are often poorly defined in CLC design, and that the methods used for level assignment of corpus texts are not always adequate. Proficiency level can therefore, best be described as a fuzzy variable in CLCs,…
Using the Longman Mini-concordancer on Tagged and Parsed Corpora, with Special Reference to Their Use as an Aid to Grammar Learning.

ERIC Educational Resources Information Center

Qiao, Hong Liang; Sussex, Roland

1996-01-01

Presents methods for using the Longman Mini-Concordancer on tagged and parsed corpora rather than plain text corpora. The article discusses several aspects with models to be applied in the classroom as an aid to grammar learning. This paper suggests exercises suitable for teaching English to both native and nonnative speakers. (13 references)…
Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

PubMed Central

2013-01-01

Background Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. Results We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. Conclusions We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts. PMID:23631733
Extracting Useful Semantic Information from Large Scale Corpora of Text

ERIC Educational Resources Information Center

Mendoza, Ray Padilla, Jr.

2012-01-01

Extracting and representing semantic information from large scale corpora is at the crux of computer-assisted knowledge generation. Semantic information depends on collocation extraction methods, mathematical models used to represent distributional information, and weighting functions which transform the space. This dissertation provides a…
Sentence alignment using feed forward neural network.

PubMed

Fattah, Mohamed Abdel; Ren, Fuji; Kuroiwa, Shingo

2006-12-01

Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English-Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.
A linear-RBF multikernel SVM to classify big text corpora.

PubMed

Romero, R; Iglesias, E L; Borrajo, L

2015-01-01

Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers.
Lexical bundles in an advanced INTOCSU writing class and engineering texts: A functional analysis

NASA Astrophysics Data System (ADS)

Alquraishi, Mohammed Abdulrahman

The purpose of this study is to investigate the functions of lexical bundles in two corpora: a corpus of engineering academic texts and a corpus of IEP advanced writing class texts. This study is concerned with the nature of formulaic language in Pathway IEPs and engineering texts, and whether those types of texts show similar or distinctive formulaic functions. Moreover, the study looked into lexical bundles found in an engineering 1.26 million-word corpus and an ESL 65000-word corpus using a concordancing program. The study then analyzed the functions of those lexical bundles and compared them statistically using chi-square tests. Additionally, the results of this investigation showed 236 unique frequent lexical bundles in the engineering corpus and 37 bundles in the pathway corpus. Also, the study identified several differences between the density and functions of lexical bundles in the two corpora. These differences were evident in the distribution of functions of lexical bundles and the minimal overlap of lexical bundles found in the two corpora. The results of this study call for more attention to formulaic language at ESP and EAP programs.
Building gold standard corpora for medical natural language processing tasks.

PubMed

Deleger, Louise; Li, Qi; Lingren, Todd; Kaiser, Megan; Molnar, Katalin; Stoutenborough, Laura; Kouril, Michal; Marsolo, Keith; Solti, Imre

2012-01-01

We present the construction of three annotated corpora to serve as gold standards for medical natural language processing (NLP) tasks. Clinical notes from the medical record, clinical trial announcements, and FDA drug labels are annotated. We report high inter-annotator agreements (overall F-measures between 0.8467 and 0.9176) for the annotation of Personal Health Information (PHI) elements for a de-identification task and of medications, diseases/disorders, and signs/symptoms for information extraction (IE) task. The annotated corpora of clinical trials and FDA labels will be publicly released and to facilitate translational NLP tasks that require cross-corpora interoperability (e.g. clinical trial eligibility screening) their annotation schemas are aligned with a large scale, NIH-funded clinical text annotation project.
Using Teacher-Developed Corpora in the CBI Classroom

ERIC Educational Resources Information Center

Salsbury, Tom; Crummer, Crista

2008-01-01

This article argues for the use of teacher-generated corpora in content-based courses. Using a content course for engineering and architecture students as an example, the article explains how a corpus consisting of texts from textbooks and journal articles helped students learn grammar, vocabulary, and writing. The article explains how the corpus…
The ACODEA Framework: Developing Segmentation and Classification Schemes for Fully Automatic Analysis of Online Discussions

ERIC Educational Resources Information Center

Mu, Jin; Stegmann, Karsten; Mayfield, Elijah; Rose, Carolyn; Fischer, Frank

2012-01-01

Research related to online discussions frequently faces the problem of analyzing huge corpora. Natural Language Processing (NLP) technologies may allow automating this analysis. However, the state-of-the-art in machine learning and text mining approaches yields models that do not transfer well between corpora related to different topics. Also,…
Finding abbreviations in biomedical literature: three BioC-compatible modules and four BioC-formatted corpora.

PubMed

Islamaj Doğan, Rezarta; Comeau, Donald C; Yeganova, Lana; Wilbur, W John

2014-01-01

BioC is a recently created XML format to share text data and annotations, and an accompanying input/output library to promote interoperability of data and tools for natural language processing of biomedical text. This article reports the use of BioC to address a common challenge in processing biomedical text information-that of frequent entity name abbreviation. We selected three different abbreviation definition identification modules, and used the publicly available BioC code to convert these independent modules into BioC-compatible components that interact seamlessly with BioC-formatted data, and other BioC-compatible modules. In addition, we consider four manually annotated corpora of abbreviations in biomedical text: the Ab3P corpus of 1250 PubMed abstracts, the BIOADI corpus of 1201 PubMed abstracts, the old MEDSTRACT corpus of 199 PubMed(®) citations and the Schwartz and Hearst corpus of 1000 PubMed abstracts. Annotations in these corpora have been re-evaluated by four annotators and their consistency and quality levels have been improved. We converted them to BioC-format and described the representation of the annotations. These corpora are used to measure the three abbreviation-finding algorithms and the results are given. The BioC-compatible modules, when compared with their original form, have no difference in their efficiency, running time or any other comparable aspects. They can be conveniently used as a common pre-processing step for larger multi-layered text-mining endeavors. Database URL: Code and data are available for download at the BioC site: http://bioc.sourceforge.net. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Use of Co-occurrences for Temporal Expressions Annotation

NASA Astrophysics Data System (ADS)

Craveiro, Olga; Macedo, Joaquim; Madeira, Henrique

The annotation or extraction of temporal information from text documents is becoming increasingly important in many natural language processing applications such as text summarization, information retrieval, question answering, etc.. This paper presents an original method for easy recognition of temporal expressions in text documents. The method creates semantically classified temporal patterns, using word co-occurrences obtained from training corpora and a pre-defined seed keywords set, derived from the used language temporal references. A participation on a Portuguese named entity evaluation contest showed promising effectiveness and efficiency results. This approach can be adapted to recognize other type of expressions or languages, within other contexts, by defining the suitable word sets and training corpora.
Corpora of Vietnamese texts: lexical effects of intended audience and publication place.

PubMed

Pham, Giang; Kohnert, Kathryn; Carney, Edward

2008-02-01

This article has two primary aims. The first is to introduce a new Vietnamese text-based corpus. The Corpora of Vietnamese Texts (CVT; Tang, 2006a) consists of approximately 1 million words drawn from newspapers and children's literature, and is available online at www.vnspeechtherapy.com/vi/CVT. The second aim is to investigate potential differences in lexical frequency and distributional characteristics in the CVT on the basis of place of publication (Vietnam or Western countries) and intended audience: adult-directed texts (newspapers) or child-directed texts (children's literature). We found clear differences between adult- and child-directed texts, particularly in the distributional frequencies of pronouns or kinship terms, which were more frequent in children's literature. Within child- and adult-directed texts, lexical characteristics did not differ on the basis of place of publication. Implications of these findings for future research are discussed.

Focus on Folklore.

ERIC Educational Resources Information Center

Mullican, James S., Ed.

1977-01-01

This issue of the "Indiana English Journal" is devoted to various facets of folklore. Topics of articles are folklore museums as resource sites for teaching; American folklore and the English classroom; writing about folklore in the freshman English class; some folklore and related materials for composition classes; developing teaching materials…
Metaphor Identification in Large Texts Corpora

PubMed Central

Neuman, Yair; Assaf, Dan; Cohen, Yohai; Last, Mark; Argamon, Shlomo; Howard, Newton; Frieder, Ophir

2013-01-01

Identifying metaphorical language-use (e.g., sweet child) is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms’ performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus. PMID:23658625
The Case of Perrin and Thomson: An Example of the Use of a Mini-Corpus

ERIC Educational Resources Information Center

Banks, David

2005-01-01

Although recent trends have been towards large corpora, there is a valid place for the study of small corpora. This article is an example of one such study using a corpus of late 19th century texts, consisting of 1783 words in French by Perrin, and 2824 words in English by Thomson. Perrin uses more first person pronouns in a wider range of…
Folklore: A Bridge over Troubled Waters.

ERIC Educational Resources Information Center

Solomon, Carol

1971-01-01

The use of a folklore unit in a high school English class is described. The major activity of the unit was the student's individual folklore project. For two weeks prior to the unit and throughout a week of introduction on aspects of folklore, each student worked at home on an individual folklore project. Among the aspects of folklore discussed…
Between the Cracks of History: Essays on Teaching and Illustrating Folklore. Publications of the Texas Folklore Society: 55.

ERIC Educational Resources Information Center

Abernethy, Francis Edward, Ed.; Satterwhite, Carolyn Fiedler, Ed.

This book is composed of 21 essays that define and illustrate the folklore of Texas. Following the introduction, the six essays concerned with defining are: "Classroom Definitions of Folklore" (F. E. Abernethy); "Defining Folklore for My Students" (Joyce Roach); "Folklore and Cinema" (Jim Harris); "Toward a…
Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora

DTIC Science & Technology

2001-01-01

monolingual dictionary - derived list of canonical roots would resolve ambiguity re- garding which is the appropriate target. � Many of the errors are...system and set of algorithms for automati- cally inducing stand-alone monolingual part-of-speech taggers, base noun-phrase bracketers, named-entity...corpora has tended to focus on their use in translation model training for MT rather than on monolingual applications. One exception is bilin- gual parsing
Computational methods to extract meaning from text and advance theories of human cognition.

PubMed

McNamara, Danielle S

2011-01-01

Over the past two decades, researchers have made great advances in the area of computational methods for extracting meaning from text. This research has to a large extent been spurred by the development of latent semantic analysis (LSA), a method for extracting and representing the meaning of words using statistical computations applied to large corpora of text. Since the advent of LSA, researchers have developed and tested alternative statistical methods designed to detect and analyze meaning in text corpora. This research exemplifies how statistical models of semantics play an important role in our understanding of cognition and contribute to the field of cognitive science. Importantly, these models afford large-scale representations of human knowledge and allow researchers to explore various questions regarding knowledge, discourse processing, text comprehension, and language. This topic includes the latest progress by the leading researchers in the endeavor to go beyond LSA. Copyright © 2010 Cognitive Science Society, Inc.
Taming Big Data: An Information Extraction Strategy for Large Clinical Text Corpora.

PubMed

Gundlapalli, Adi V; Divita, Guy; Carter, Marjorie E; Redd, Andrew; Samore, Matthew H; Gupta, Kalpana; Trautner, Barbara

2015-01-01

Concepts of interest for clinical and research purposes are not uniformly distributed in clinical text available in electronic medical records. The purpose of our study was to identify filtering techniques to select 'high yield' documents for increased efficacy and throughput. Using two large corpora of clinical text, we demonstrate the identification of 'high yield' document sets in two unrelated domains: homelessness and indwelling urinary catheters. For homelessness, the high yield set includes homeless program and social work notes. For urinary catheters, concepts were more prevalent in notes from hospitalized patients; nursing notes accounted for a majority of the high yield set. This filtering will enable customization and refining of information extraction pipelines to facilitate extraction of relevant concepts for clinical decision support and other uses.
Lessons in Living: Incorporating Folklore into Young Children's Lives.

ERIC Educational Resources Information Center

McLean, Deborah L.

One avenue for authentic exploration of different cultures is to incorporate folktales and folklore into early childhood curriculum. Universal themes are found as common threads in the folklore of many cultures, and folktales and folklore contribute to learning about each culture's rich heritage. Folklore and folktales teach young children about…
Stories That Must Not Die. Volume Four.

ERIC Educational Resources Information Center

Sauvageau, Juan

Fourth in a series of bilingual (Spanish and English) texts intended to promote interest in bilingual/bicultural programs and to preserve the colorful folklore of the Southwest, this text contains 10 traditional tales from the area. Accompanied by black and white illustrations, the tales concern local legends and personalities ("Thunder…
Portable automatic text classification for adverse drug reaction detection via multi-corpus training.

PubMed

Sarker, Abeed; Gonzalez, Graciela

2015-02-01

Automatic detection of adverse drug reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media-where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing (NLP) approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Portable Automatic Text Classification for Adverse Drug Reaction Detection via Multi-corpus Training

PubMed Central

Gonzalez, Graciela

2014-01-01

Objective Automatic detection of Adverse Drug Reaction (ADR) mentions from text has recently received significant interest in pharmacovigilance research. Current research focuses on various sources of text-based information, including social media — where enormous amounts of user posted data is available, which have the potential for use in pharmacovigilance if collected and filtered accurately. The aims of this study are: (i) to explore natural language processing approaches for generating useful features from text, and utilizing them in optimized machine learning algorithms for automatic classification of ADR assertive text segments; (ii) to present two data sets that we prepared for the task of ADR detection from user posted internet data; and (iii) to investigate if combining training data from distinct corpora can improve automatic classification accuracies. Methods One of our three data sets contains annotated sentences from clinical reports, and the two other data sets, built in-house, consist of annotated posts from social media. Our text classification approach relies on generating a large set of features, representing semantic properties (e.g., sentiment, polarity, and topic), from short text nuggets. Importantly, using our expanded feature sets, we combine training data from different corpora in attempts to boost classification accuracies. Results Our feature-rich classification approach performs significantly better than previously published approaches with ADR class F-scores of 0.812 (previously reported best: 0.770), 0.538 and 0.678 for the three data sets. Combining training data from multiple compatible corpora further improves the ADR F-scores for the in-house data sets to 0.597 (improvement of 5.9 units) and 0.704 (improvement of 2.6 units) respectively. Conclusions Our research results indicate that using advanced NLP techniques for generating information rich features from text can significantly improve classification accuracies over existing benchmarks. Our experiments illustrate the benefits of incorporating various semantic features such as topics, concepts, sentiments, and polarities. Finally, we show that integration of information from compatible corpora can significantly improve classification performance. This form of multi-corpus training may be particularly useful in cases where data sets are heavily imbalanced (e.g., social media data), and may reduce the time and costs associated with the annotation of data in the future. PMID:25451103
Concept annotation in the CRAFT corpus.

PubMed

Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E

2012-07-09

Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
Concept annotation in the CRAFT corpus

PubMed Central

2012-01-01

Background Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. Results This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. Conclusions As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml. PMID:22776079
Folklore as Culture: Linking Life to Language. Dimension: Languages 79. Proceedings of the Pre-Conference Workshop at the Southern Conference on Language Teaching (15th, Atlanta, Georgia, 1979).

ERIC Educational Resources Information Center

Morain, Genelle, Ed.

Papers from a workshop on folklore in second language teaching include: "Folklore in the Classroom: Lifting the Lid on the Magic Coffer" (Genelle Morain); "An American Folklore Quiz" (Genelle Morain); "Who's Afraid of Dr. Faust?" (Gerhard H. Weiss); "German Folklore Bibliography for Teachers" (Gerhard H.…
Encoding Standards for Linguistic Corpora.

ERIC Educational Resources Information Center

Ide, Nancy

The demand for extensive reusability of large language text collections for natural languages processing research requires development of standardized encoding formats. Such formats must be capable of representing different kinds of information across the spectrum of text types and languages, capable of representing different levels of…
An annotated corpus with nanomedicine and pharmacokinetic parameters

PubMed Central

Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

2017-01-01

A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. PMID:29066897
Automatic extraction of property norm-like data from large text corpora.

PubMed

Kelly, Colin; Devereux, Barry; Korhonen, Anna

2014-01-01

Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.
Text mixing shapes the anatomy of rank-frequency distributions

NASA Astrophysics Data System (ADS)

Williams, Jake Ryland; Bagrow, James P.; Danforth, Christopher M.; Dodds, Peter Sheridan

2015-05-01

Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law, which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this "law" of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora since the late 1990s have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and noncore lexica. Here we present and defend an alternative hypothesis that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection, we find emphatic empirical support for the universality of our claim.
Text mixing shapes the anatomy of rank-frequency distributions.

PubMed

Williams, Jake Ryland; Bagrow, James P; Danforth, Christopher M; Dodds, Peter Sheridan

2015-05-01

Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law, which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this "law" of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora since the late 1990s have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and noncore lexica. Here we present and defend an alternative hypothesis that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection, we find emphatic empirical support for the universality of our claim.

Women and the Study of Folklore.

ERIC Educational Resources Information Center

Jordan, Rosan A.; De Caro, F. A.

1986-01-01

Presents a critical overview of academic writing on women and folklore, organized in three categories: (1) literature on images of women in verbal folklore, and the role of negative images in shaping attitudes; (2) research on womens' oral genres and performance and female use of folklore; and (3) studies of women as folk performers and artists.…
Folklore: A Foundation for Literary Study.

ERIC Educational Resources Information Center

Galda, S. L.; Pellegrini, A. D.

1981-01-01

Discusses folklore as the basis for literary study. Discusses two major theoretical positions on folklore universals--behaviorism and structuralism--and applies the two theories to literary analysis. (FL)
Hong Kong Papers in Linguistics and Language Teaching, 16.

ERIC Educational Resources Information Center

Nakhoul, Liz, Ed.; And Others

1993-01-01

Articles and reports in this issue include the following: "Co-text or No Text: A Study of an Adapted Cloze Technique" (Dave Coniam); "Small-Corpora Concordancing in ESL Teaching and Learning" (Bruce K.C. Ma); "Interdisciplinary Dimensions of Debate" (S. Byron, L. Goldstein, D. Murphy, E. Roberts); "Can English…
TELLTALE: Experiments in a Dynamic Hypertext Environment for Degraded and Multilingual Data.

ERIC Educational Resources Information Center

Pearce, Claudia; Nicholas, Charles

1996-01-01

Presents experimentation results for the TELLTALE system, a dynamic hypertext environment that provides full-text search from a hypertext-style user interface for text corpora that may be garbled by OCR (optical character recognition) or transmission errors, and that may contain languages other than English. (Author/LRW)
Abstracts versus Full Texts and Patents: A Quantitative Analysis of Biomedical Entities

NASA Astrophysics Data System (ADS)

Müller, Bernd; Klinger, Roman; Gurulingappa, Harsha; Mevissen, Heinz-Theodor; Hofmann-Apitius, Martin; Fluck, Juliane; Friedrich, Christoph M.

In information retrieval, named entity recognition gives the opportunity to apply semantic search in domain specific corpora. Recently, more full text patents and journal articles became freely available. As the information distribution amongst the different sections is unknown, an analysis of the diversity is of interest.
Visualizing the semantic content of large text databases using text maps

NASA Technical Reports Server (NTRS)

Combs, Nathan

1993-01-01

A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.
The Application of Hermeneutical Analysis to Research on the Cold War in Soviet Animation Media Texts from the Second Half of the 1940s

ERIC Educational Resources Information Center

Fedorov, A. V.

2015-01-01

The Cold War era, which spawned a mutual ideological confrontation between communist and capitalist countries, left its mark on all categories of media texts, including cartoons and animations. Cartoons were used by the authorities as tools for delivering the necessary confrontational ideological content in an attractive folkloric, fairy-tale…
Ontology design patterns to disambiguate relations between genes and gene products in GENIA

PubMed Central

2011-01-01

Motivation Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences. Results We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications. Availability Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/. PMID:22166341
The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of text.

PubMed

Altszyler, Edgar; Ribeiro, Sidarta; Sigman, Mariano; Fernández Slezak, Diego

2017-11-01

Computer-based dreams content analysis relies on word frequencies within predefined categories in order to identify different elements in text. As a complementary approach, we explored the capabilities and limitations of word-embedding techniques to identify word usage patterns among dream reports. These tools allow us to quantify words associations in text and to identify the meaning of target words. Word-embeddings have been extensively studied in large datasets, but only a few studies analyze semantic representations in small corpora. To fill this gap, we compared Skip-gram and Latent Semantic Analysis (LSA) capabilities to extract semantic associations from dream reports. LSA showed better performance than Skip-gram in small size corpora in two tests. Furthermore, LSA captured relevant word associations in dream collection, even in cases with low-frequency words or small numbers of dreams. Word associations in dreams reports can thus be quantified by LSA, which opens new avenues for dream interpretation and decoding. Copyright © 2017 Elsevier Inc. All rights reserved.
Looking at Citations: Using Corpora in English for Academic Purposes.

ERIC Educational Resources Information Center

Thompson, Paul; Tribble, Chris

2001-01-01

Presents a classification scheme and the results of applying this scheme to the coding of academic texts in a corpus. The texts are doctoral theses from agricultural botany and agricultural economics departments. Results lead to a comparison of the citation practices of writers in different disciplines and the different rhetorical practices of…
Unsupervised Medical Entity Recognition and Linking in Chinese Online Medical Text

PubMed Central

Gan, Liang; Cheng, Mian; Wu, Quanyuan

2018-01-01

Online medical text is full of references to medical entities (MEs), which are valuable in many applications, including medical knowledge-based (KB) construction, decision support systems, and the treatment of diseases. However, the diverse and ambiguous nature of the surface forms gives rise to a great difficulty for ME identification. Many existing solutions have focused on supervised approaches, which are often task-dependent. In other words, applying them to different kinds of corpora or identifying new entity categories requires major effort in data annotation and feature definition. In this paper, we propose unMERL, an unsupervised framework for recognizing and linking medical entities mentioned in Chinese online medical text. For ME recognition, unMERL first exploits a knowledge-driven approach to extract candidate entities from free text. Then, the categories of the candidate entities are determined using a distributed semantic-based approach. For ME linking, we propose a collaborative inference approach which takes full advantage of heterogenous entity knowledge and unstructured information in KB. Experimental results on real corpora demonstrate significant benefits compared to recent approaches with respect to both ME recognition and linking. PMID:29849994
"Mississippi Trial, 1955": Tangling with Text through Reading, Discussion, and Writing

ERIC Educational Resources Information Center

Grierson, Sirpa; Thursby, Jacqueline S.; Dean, Deborah; Crowe, Chris

2007-01-01

The authors proffer practical critical-reading strategies for teaching "Mississippi Trial, 1955" to increase students' vocabulary, comprehension, and background knowledge of historical eras. They use nonfiction, a PBS documentary, the Web, folklore, and picture books among other tools for inciting thoughtful discussion and writing.
Using machine learning to disentangle homonyms in large text corpora.

PubMed

Roll, Uri; Correia, Ricardo A; Berger-Tal, Oded

2018-06-01

Systematic reviews are an increasingly popular decision-making tool that provides an unbiased summary of evidence to support conservation action. These reviews bridge the gap between researchers and managers by presenting a comprehensive overview of all studies relating to a particular topic and identify specifically where and under which conditions an effect is present. However, several technical challenges can severely hinder the feasibility and applicability of systematic reviews, for example, homonyms (terms that share spelling but differ in meaning). Homonyms add noise to search results and cannot be easily identified or removed. We developed a semiautomated approach that can aid in the classification of homonyms among narratives. We used a combination of automated content analysis and artificial neural networks to quickly and accurately sift through large corpora of academic texts and classify them to distinct topics. As an example, we explored the use of the word reintroduction in academic texts. Reintroduction is used within the conservation context to indicate the release of organisms to their former native habitat; however, a Web of Science search for this word returned thousands of publications in which the term has other meanings and contexts. Using our method, we automatically classified a sample of 3000 of these publications with over 99% accuracy, relative to a manual classification. Our approach can be used easily with other homonyms and can greatly facilitate systematic reviews or similar work in which homonyms hinder the harnessing of large text corpora. Beyond homonyms we see great promise in combining automated content analysis and machine-learning methods to handle and screen big data for relevant information in conservation science. © 2017 Society for Conservation Biology.
HierarchicalTopics: visually exploring large text collections using topic hierarchies.

PubMed

Dou, Wenwen; Yu, Li; Wang, Xiaoyu; Ma, Zhiqiang; Ribarsky, William

2013-12-01

Analyzing large textual collections has become increasingly challenging given the size of the data available and the rate that more data is being generated. Topic-based text summarization methods coupled with interactive visualizations have presented promising approaches to address the challenge of analyzing large text corpora. As the text corpora and vocabulary grow larger, more topics need to be generated in order to capture the meaningful latent themes and nuances in the corpora. However, it is difficult for most of current topic-based visualizations to represent large number of topics without being cluttered or illegible. To facilitate the representation and navigation of a large number of topics, we propose a visual analytics system--HierarchicalTopic (HT). HT integrates a computational algorithm, Topic Rose Tree, with an interactive visual interface. The Topic Rose Tree constructs a topic hierarchy based on a list of topics. The interactive visual interface is designed to present the topic content as well as temporal evolution of topics in a hierarchical fashion. User interactions are provided for users to make changes to the topic hierarchy based on their mental model of the topic space. To qualitatively evaluate HT, we present a case study that showcases how HierarchicalTopics aid expert users in making sense of a large number of topics and discovering interesting patterns of topic groups. We have also conducted a user study to quantitatively evaluate the effect of hierarchical topic structure. The study results reveal that the HT leads to faster identification of large number of relevant topics. We have also solicited user feedback during the experiments and incorporated some suggestions into the current version of HierarchicalTopics.
Benchmarking infrastructure for mutation text mining

PubMed Central

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.

PubMed

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Recognizing chemicals in patents: a comparative analysis.

PubMed

Habibi, Maryam; Wiegandt, David Luis; Schmedding, Florian; Leser, Ulf

2016-01-01

Recently, methods for Chemical Named Entity Recognition (NER) have gained substantial interest, driven by the need for automatically analyzing todays ever growing collections of biomedical text. Chemical NER for patents is particularly essential due to the high economic importance of pharmaceutical findings. However, NER on patents has essentially been neglected by the research community for long, mostly because of the lack of enough annotated corpora. A recent international competition specifically targeted this task, but evaluated tools only on gold standard patent abstracts instead of full patents; furthermore, results from such competitions are often difficult to extrapolate to real-life settings due to the relatively high homogeneity of training and test data. Here, we evaluate the two state-of-the-art chemical NER tools, tmChem and ChemSpot, on four different annotated patent corpora, two of which consist of full texts. We study the overall performance of the tools, compare their results at the instance level, report on high-recall and high-precision ensembles, and perform cross-corpus and intra-corpus evaluations. Our findings indicate that full patents are considerably harder to analyze than patent abstracts and clearly confirm the common wisdom that using the same text genre (patent vs. scientific) and text type (abstract vs. full text) for training and testing is a pre-requisite for achieving high quality text mining results.
Folklore Epistemology: How Does Traditional Folklore Contribute to Children's Thinking and Concept Development?

ERIC Educational Resources Information Center

Agbenyega, Joseph S.; Tamakloe, Deborah E.; Klibthong, Sunanta

2017-01-01

This research utilised a "stimulated recall" methodology [Calderhead, J. 1981. "Stimulated Recall: A Method for Research on Teaching." "British Journal of Educational Psychology" 51: 211-217] to explore the potential of African folklore, specifically Ghanaian folk stories in the development of children's reflective…
Folklore in the Classroom. Workbook.

ERIC Educational Resources Information Center

Allen, Barbara; And Others

Written by experts in the field of folklore for laymen, this three-part volume is intended to help teachers of English, social studies, mathematics and science, home economics, the arts, and other subject areas to become more knowledgeable about folklore and to inject this knowledge into their existing curricula. The first part, on introducing…
Building Collections: Folklore

ERIC Educational Resources Information Center

Krapp, JoAnn Vergona

2005-01-01

Folklore, the oldest form of storytelling, reflects the culture of a country, hence its nonfiction classification. Through these tales, one senses the values, the humor, and the lifestyles of its peoples. A powerful genre, folklore is the foundation on which high fantasy is created, epic films are produced, and a single story is passed from one…

"Sis Cat" as Ethnographer: Self-Presentation and Self-Inscription in Zora Neale Hurston's "Mules and Men."

ERIC Educational Resources Information Center

Boxwell, D. A.

1992-01-01

Examines Zora Neale Hurston's work, particularly her collection of folklore and ethnography of the American South, "Mules and Men." Looks at the author's role, the ways the ethnographer inscribes herself into the text, and speculates about Hurston's understanding of the limits of the impersonal researcher. (JB)
Empirical data on corpus design and usage in biomedical natural language processing.

PubMed

Cohen, K Bretonnel; Fox, Lynne; Ogren, Philip V; Hunter, Lawrence

2005-01-01

This paper describes the design of six publicly available biomedical corpora. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have implications for the design of the next generation of biomedical corpora.
(Text) Mining the LANDscape: Themes and Trends over 40 years of Landscape and Urban Planning

Treesearch

Paul H. Gobster

2014-01-01

In commemoration of the journal's 40th anniversary, the co-editor explores themes and trends covered by Landscape and Urban Planning and its parent journals through a qualitative comparison of co-occurrence term maps generated from the text corpora of its abstracts across the four decadal periods of publication.Cluster maps generated from the...
How Hierarchical Topics Evolve in Large Text Corpora.

PubMed

Cui, Weiwei; Liu, Shixia; Wu, Zhuofeng; Wei, Hao

2014-12-01

Using a sequence of topic trees to organize documents is a popular way to represent hierarchical and evolving topics in text corpora. However, following evolving topics in the context of topic trees remains difficult for users. To address this issue, we present an interactive visual text analysis approach to allow users to progressively explore and analyze the complex evolutionary patterns of hierarchical topics. The key idea behind our approach is to exploit a tree cut to approximate each tree and allow users to interactively modify the tree cuts based on their interests. In particular, we propose an incremental evolutionary tree cut algorithm with the goal of balancing 1) the fitness of each tree cut and the smoothness between adjacent tree cuts; 2) the historical and new information related to user interests. A time-based visualization is designed to illustrate the evolving topics over time. To preserve the mental map, we develop a stable layout algorithm. As a result, our approach can quickly guide users to progressively gain profound insights into evolving hierarchical topics. We evaluate the effectiveness of the proposed method on Amazon's Mechanical Turk and real-world news data. The results show that users are able to successfully analyze evolving topics in text data.
U-Compare: share and compare text mining tools with UIMA.

PubMed

Kano, Yoshinobu; Baumgartner, William A; McCrohon, Luke; Ananiadou, Sophia; Cohen, K Bretonnel; Hunter, Lawrence; Tsujii, Jun'ichi

2009-08-01

Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources. http://u-compare.org/
Block-suffix shifting: fast, simultaneous medical concept set identification in large medical record corpora.

PubMed

Liu, Ying; Lita, Lucian Vlad; Niculescu, Radu Stefan; Mitra, Prasenjit; Giles, C Lee

2008-11-06

Owing to new advances in computer hardware, large text databases have become more prevalent than ever.Automatically mining information from these databases proves to be a challenge due to slow pattern/string matching techniques. In this paper we present a new, fast multi-string pattern matching method based on the well known Aho-Chorasick algorithm. Advantages of our algorithm include:the ability to exploit the natural structure of text, the ability to perform significant character shifting, avoiding backtracking jumps that are not useful, efficiency in terms of matching time and avoiding the typical "sub-string" false positive errors.Our algorithm is applicable to many fields with free text, such as the health care domain and the scientific document field. In this paper, we apply the BSS algorithm to health care data and mine hundreds of thousands of medical concepts from a large Electronic Medical Record (EMR) corpora simultaneously and efficiently. Experimental results show the superiority of our algorithm when compared with the top of the line multi-string matching algorithms.
Zapateado technique as an injury risk in Mexican folkloric and Spanish dance: an analysis of execution, ground reaction force, and muscle strength.

PubMed

Echegoyen, Soledad; Aoyama, Takeshi; Rodríguez, Cristina

2013-06-01

Zapateado is a repetitive percussive footwork in dance. This percussive movement, and the differences in technique, may be risk factors for injury. A survey on zapateado dance students found a rate of 1.5 injuries/1,000 exposures. Knee injuries are more frequent than in Spanish dancers than folkloric dancers. The aim of this research was to study the relationship between technique and ground reaction force between zapateado on Spanish and Mexican folkloric dancers. Ten female dance students (age 22.4 ± 4 yrs), six Spanish dancers and four Mexican folkloric dancers, were considered. Each student performed zapateado with a flat foot, wearing high-heeled shoes during 5 seconds on a force platform. Videotapes were taken on a lateral plane, and knee and hip angles in each movement phase were measured with Dartfish software. Additionally, knee and ankle flexor and extensor strength was measured with a dynamometer. Ground reaction forces were lower for Spanish dancers than Mexican folkloric dancers. Spanish dancers had less knee flexion when the foot contacted to the ground than did Mexican folkloric dancers. On Spanish dancers, the working leg had more motion in relation to hip and knee angles than was seen in folkloric dancers. The ankle extensors were stronger on folkloric dancers, and there were no differences for the other muscle groups. Knee flexion at foot contact and muscle strength imbalance could be risk factors for injuries. It is suggested that the technique in Spanish dance in Mexico be reviewed, although more studies are required to define more risk factors.
Wide coverage biomedical event extraction using multiple partially overlapping corpora

PubMed Central

2013-01-01

Background Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes. Results We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011. Conclusions The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora. PMID:23731785
Folklore in ESL/EFL Curriculum Materials.

ERIC Educational Resources Information Center

Pedersen, E. Martin

It is argued that folklore can and should have a primary place in curriculum for English as a Second Language (ESL) and English as a Foreign Language (EFL). Folklore has the following advantages--it: is a form of literature in which language, arts, and culture intersect; fosters understanding and acceptance of the foreign language and culture; can…
Beyond Mulan: Rediscovering the Heroines of Chinese Folklore.

ERIC Educational Resources Information Center

Li, Suzanne D.

2000-01-01

Notes how sadly the Disney treatment of the story of Mulan reduced both the character Mulan and the story's broad appeal. Presents and critiques four picture book versions of the Mulan legend. Discusses 16 picture books of original folklore based on authentic Chinese sources. Concludes with criteria for evaluating Chinese folklore in picture…
Piled Higher and Deeper: The Folklore of Campus Life.

ERIC Educational Resources Information Center

Bronner, Simon J.

This book examines the composition and context of folklore on college campuses, contrasting its more individual character today with its communal traditions in times past, and interpreting what these traditions reveal about the role of students in American society and culture. An introductory section examines the role of folklore in higher…
Three-Dimensional Dispaly Of Document Set

DOEpatents

Lantrip, David B.; Pennock, Kelly A.; Pottier, Marc C.; Schur, Anne; Thomas, James J.; Wise, James A.

2003-06-24

A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may be transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts' effort.
Three-dimensional display of document set

DOEpatents

Lantrip, David B [Oxnard, CA; Pennock, Kelly A [Richland, WA; Pottier, Marc C [Richland, WA; Schur, Anne [Richland, WA; Thomas, James J [Richland, WA; Wise, James A [Richland, WA

2006-09-26

A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may e transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts' effort.
Three-dimensional display of document set

DOEpatents

Lantrip, David B [Oxnard, CA; Pennock, Kelly A [Richland, WA; Pottier, Marc C [Richland, WA; Schur, Anne [Richland, WA; Thomas, James J [Richland, WA; Wise, James A [Richland, WA

2001-10-02

A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may be transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts' effort.
Three-dimensional display of document set

DOEpatents

Lantrip, David B [Oxnard, CA; Pennock, Kelly A [Richland, WA; Pottier, Marc C [Richland, WA; Schur, Anne [Richland, WA; Thomas, James J [Richland, WA; Wise, James A [Richland, WA; York, Jeremy [Bothell, WA

2009-06-30

A method for spatializing text content for enhanced visual browsing and analysis. The invention is applied to large text document corpora such as digital libraries, regulations and procedures, archived reports, and the like. The text content from these sources may be transformed to a spatial representation that preserves informational characteristics from the documents. The three-dimensional representation may then be visually browsed and analyzed in ways that avoid language processing and that reduce the analysts' effort.
Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora

PubMed Central

Al-Thubaity, Abdulmohsen; Alqifari, Reem

2014-01-01

Despite the accessibility of numerous online corpora, students and researchers engaged in the fields of Natural Language Processing (NLP), corpus linguistics, and language learning and teaching may encounter situations in which they need to develop their own corpora. Several commercial and free standalone corpora processing systems are available to process such corpora. In this study, we first propose a framework for the evaluation of standalone corpora processing systems and then use it to evaluate seven freely available systems. The proposed framework considers the usability, functionality, and performance of the evaluated systems while taking into consideration their suitability for Arabic corpora. While the results show that most of the evaluated systems exhibited comparable usability scores, the scores for functionality and performance were substantially different with respect to support for the Arabic language and N-grams profile generation. The results of our evaluation will help potential users of the evaluated systems to choose the system that best meets their needs. More importantly, the results will help the developers of the evaluated systems to enhance their systems and developers of new corpora processing systems by providing them with a reference framework. PMID:25610910
Proposed framework for the evaluation of standalone corpora processing systems: an application to Arabic corpora.

PubMed

Al-Thubaity, Abdulmohsen; Al-Khalifa, Hend; Alqifari, Reem; Almazrua, Manal

2014-01-01

Despite the accessibility of numerous online corpora, students and researchers engaged in the fields of Natural Language Processing (NLP), corpus linguistics, and language learning and teaching may encounter situations in which they need to develop their own corpora. Several commercial and free standalone corpora processing systems are available to process such corpora. In this study, we first propose a framework for the evaluation of standalone corpora processing systems and then use it to evaluate seven freely available systems. The proposed framework considers the usability, functionality, and performance of the evaluated systems while taking into consideration their suitability for Arabic corpora. While the results show that most of the evaluated systems exhibited comparable usability scores, the scores for functionality and performance were substantially different with respect to support for the Arabic language and N-grams profile generation. The results of our evaluation will help potential users of the evaluated systems to choose the system that best meets their needs. More importantly, the results will help the developers of the evaluated systems to enhance their systems and developers of new corpora processing systems by providing them with a reference framework.
[Witchcraft medicine and folklore in Wushierbingfang ('Prescriptions for fifty-two diseases')].

PubMed

Jia, Hai-yan

2010-03-01

One important characteristic of early stage of TCM is the intermixture of witches medicine and folklore. A few witch prescriptions in Wushierbingfang ('Prescriptions for fifty-two diseases') indicated the residual traces of the mixture of witch and medicine in the medical literatures. The witch prescriptions recorded in Wushierbingfang ('Prescriptions for fifty-two diseases') could be divided into supplication, Yu-step, exorcism, Nuo ritual and peach wood charms etc. Witchcraft developed into folklore and the application of witchcraft sometimes manifested as the form of folklore, which were also reflected in the records of ('Prescriptions for fifty-two diseases').
Mortal Subtext in O. E. Mandelstam's Poem "Oh, How We Love to Be a Hypocrite": Folklore Reality

ERIC Educational Resources Information Center

Dudareva, Marianna A.; Milovanova, Irina S.; Anisina, Yulia V.; Shorkina, Elena N.

2017-01-01

The article dwells upon the problem scarcely investigated in literary studies: a folklore tradition in O. Mandelstam's poetry. The researchers studied manifestation of mythological tradition in the poet's artistic world and revealed different archetypal models but they paid no attention to folklore elements. Only folklorists and ethnographers,…
Influencia India en el Folklor Mexicano (The Indian Influence on Mexican Folklore).

ERIC Educational Resources Information Center

Leon Soto, Eron de

1972-01-01

This paper discusses the influence of Indian culture on the creation of Mexican folklore to the end that the inclusion of such knowledge in classes where students are studying Spanish as a second language will make those classes less formal, more interesting, and more meaningful. The author provides many examples of Indian cultural traditions…

A new universality class in corpus of texts; A statistical physics study

NASA Astrophysics Data System (ADS)

Najafi, Elham; Darooneh, Amir H.

2018-05-01

Text can be regarded as a complex system. There are some methods in statistical physics which can be used to study this system. In this work, by means of statistical physics methods, we reveal new universal behaviors of texts associating with the fractality values of words in a text. The fractality measure indicates the importance of words in a text by considering distribution pattern of words throughout the text. We observed a power law relation between fractality of text and vocabulary size for texts and corpora. We also observed this behavior in studying biological data.
Human language reveals a universal positivity bias

PubMed Central

Dodds, Peter Sheridan; Clark, Eric M.; Desu, Suma; Frank, Morgan R.; Reagan, Andrew J.; Williams, Jake Ryland; Mitchell, Lewis; Harris, Kameron Decker; Kloumann, Isabel M.; Bagrow, James P.; Megerdoomian, Karine; McMahon, Matthew T.; Tivnan, Brian F.; Danforth, Christopher M.

2015-01-01

Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts. PMID:25675475
The Nearly Forgotten Malay Folklore: Shall We Start with the Software?

ERIC Educational Resources Information Center

Abd Rahim, Normaliza

2014-01-01

The study focuses on the nearly forgotten Malay folklore in Malaysia. The objectives of the study were to identify and discuss the types of Malay folklore among primary school learners. The samples of the study were 100 male and female students at schools in Selangor. The samples were picked at random from several schools and they were given…
31 CFR 358.6 - What is the procedure for converting bearer corpora and detached bearer coupons to book-entry?

Code of Federal Regulations, 2011 CFR

2011-07-01

... bearer corpora and detached bearer coupons to book-entry? 358.6 Section 358.6 Money and Finance: Treasury... PUBLIC DEBT REGULATIONS GOVERNING BOOK-ENTRY CONVERSION OF BEARER CORPORA AND DETACHED BEARER COUPONS § 358.6 What is the procedure for converting bearer corpora and detached bearer coupons to book-entry...
U-Compare: share and compare text mining tools with UIMA

PubMed Central

Kano, Yoshinobu; Baumgartner, William A.; McCrohon, Luke; Ananiadou, Sophia; Cohen, K. Bretonnel; Hunter, Lawrence; Tsujii, Jun'ichi

2009-01-01

Summary: Due to the increasing number of text mining resources (tools and corpora) available to biologists, interoperability issues between these resources are becoming significant obstacles to using them effectively. UIMA, the Unstructured Information Management Architecture, is an open framework designed to aid in the construction of more interoperable tools. U-Compare is built on top of the UIMA framework, and provides both a concrete framework for out-of-the-box text mining and a sophisticated evaluation platform allowing users to run specific tools on any target text, generating both detailed statistics and instance-based visualizations of outputs. U-Compare is a joint project, providing the world's largest, and still growing, collection of UIMA-compatible resources. These resources, originally developed by different groups for a variety of domains, include many famous tools and corpora. U-Compare can be launched straight from the web, without needing to be manually installed. All U-Compare components are provided ready-to-use and can be combined easily via a drag-and-drop interface without any programming. External UIMA components can also simply be mixed with U-Compare components, without distinguishing between locally and remotely deployed resources. Availability: http://u-compare.org/ Contact: kano@is.s.u-tokyo.ac.jp PMID:19414535
A survey on annotation tools for the biomedical literature.

PubMed

Neves, Mariana; Leser, Ulf

2014-03-01

New approaches to biomedical text mining crucially depend on the existence of comprehensive annotated corpora. Such corpora, commonly called gold standards, are important for learning patterns or models during the training phase, for evaluating and comparing the performance of algorithms and also for better understanding the information sought for by means of examples. Gold standards depend on human understanding and manual annotation of natural language text. This process is very time-consuming and expensive because it requires high intellectual effort from domain experts. Accordingly, the lack of gold standards is considered as one of the main bottlenecks for developing novel text mining methods. This situation led the development of tools that support humans in annotating texts. Such tools should be intuitive to use, should support a range of different input formats, should include visualization of annotated texts and should generate an easy-to-parse output format. Today, a range of tools which implement some of these functionalities are available. In this survey, we present a comprehensive survey of tools for supporting annotation of biomedical texts. Altogether, we considered almost 30 tools, 13 of which were selected for an in-depth comparison. The comparison was performed using predefined criteria and was accompanied by hands-on experiences whenever possible. Our survey shows that current tools can support many of the tasks in biomedical text annotation in a satisfying manner, but also that no tool can be considered as a true comprehensive solution.
A system for de-identifying medical message board text.

PubMed

Benton, Adrian; Hill, Shawndra; Ungar, Lyle; Chung, Annie; Leonard, Charles; Freeman, Cristin; Holmes, John H

2011-06-09

There are millions of public posts to medical message boards by users seeking support and information on a wide range of medical conditions. It has been shown that these posts can be used to gain a greater understanding of patients' experiences and concerns. As investigators continue to explore large corpora of medical discussion board data for research purposes, protecting the privacy of the members of these online communities becomes an important challenge that needs to be met. Extant entity recognition methods used for more structured text are not sufficient because message posts present additional challenges: the posts contain many typographical errors, larger variety of possible names, terms and abbreviations specific to Internet posts or a particular message board, and mentions of the authors' personal lives. The main contribution of this paper is a system to de-identify the authors of message board posts automatically, taking into account the aforementioned challenges. We demonstrate our system on two different message board corpora, one on breast cancer and another on arthritis. We show that our approach significantly outperforms other publicly available named entity recognition and de-identification systems, which have been tuned for more structured text like operative reports, pathology reports, discharge summaries, or newswire.
nala: text mining natural language mutation mentions

PubMed Central

Cejuela, Juan Miguel; Bojchevski, Aleksandar; Uhlig, Carsten; Bekmukhametov, Rustem; Kumar Karn, Sanjeev; Mahmuti, Shpend; Baghudana, Ashish; Dubey, Ankit; Satagopam, Venkata P.; Rost, Burkhard

2017-01-01

Abstract Motivation: The extraction of sequence variants from the literature remains an important task. Existing methods primarily target standard (ST) mutation mentions (e.g. ‘E6V’), leaving relevant mentions natural language (NL) largely untapped (e.g. ‘glutamic acid was substituted by valine at residue 6’). Results: We introduced three new corpora suggesting named-entity recognition (NER) to be more challenging than anticipated: 28–77% of all articles contained mentions only available in NL. Our new method nala captured NL and ST by combining conditional random fields with word embedding features learned unsupervised from the entire PubMed. In our hands, nala substantially outperformed the state-of-the-art. For instance, we compared all unique mentions in new discoveries correctly detected by any of three methods (SETH, tmVar, or nala). Neither SETH nor tmVar discovered anything missed by nala, while nala uniquely tagged 33% mentions. For NL mentions the corresponding value shot up to 100% nala-only. Availability and Implementation: Source code, API and corpora freely available at: http://tagtog.net/-corpora/IDP4+. Contact: nala@rostlab.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28200120
Quelques aspects du folklore de la region Roannaise autour de 1950 (Some Aspects of Folklore of the Roanne Region about 1950).

ERIC Educational Resources Information Center

Long, Jacqueline

1971-01-01

This article examines several aspects of folklore characteristic of the region of Roanne, France, during the 1950's. The town of Roanne, located between Clermont Ferrand and Lyon on the Loire River, is described in terms of its festive activities during serveral key holidays. The erosion of various customs and traditions, an inevitable result of…
Afro-American Folklore: A Unique American Experience. Selected Proceedings of the Annual Conference on Minority Studies (3rd, April, 1975), Volume 4.

ERIC Educational Resources Information Center

Carter, George E., Ed.; Parker, James R., Ed.

The articles in this document emphasize the positive, unique aspects of Afro-American folklore. Etta Moten Barnett concentrates on the changes in African music in response to the new geographical and cultural influences in America. Frank Suggs, Jr. describes a strategy for introducing black folklore and black music into the elementary school in an…
Public Domain Generic Tools: An Overview.

ERIC Educational Resources Information Center

Erjavec, Tomaz

This paper presents an introduction to language engineering software, especially for computerized language and text corpora. The focus of the paper is on small and relatively independent pieces of software designed for specific, often low-level language analysis tasks, and on tools in the public domain. Discussion begins with the application of…
Long-range correlations and burstiness in written texts: Universal and language-specific aspects

NASA Astrophysics Data System (ADS)

Constantoudis, Vassilios; Kalimeri, Maria; Diakonos, Fotis; Karamanos, Konstantinos; Papadimitriou, Constantinos; Chatzigeorgiou, Manolis; Papageorgiou, Harris

2016-08-01

Recently, methods from the statistical physics of complex systems have been applied successfully to identify universal features in the long-range correlations (LRCs) of written texts. However, in real texts, these universal features are being intermingled with language-specific influences. This paper aims at the characterization and further understanding of the interplay between universal and language-specific effects on the LRCs in texts. To this end, we apply the language-sensitive mapping of written texts to word-length series (wls) and analyse large parallel (of same content) corpora from 10 languages classified to four families (Romanic, Germanic, Greek and Uralic). The autocorrelation functions of the wls reveal tiny but persistent LRCs decaying at large scales following a power-law with a language-independent exponent ˜0.60-0.65. The impact of language is displayed in the amplitude of correlations where a relative standard deviation >40% among the analyzed languages is observed. The classification to language families seems to play a significant role since, the Finnish and Germanic languages exhibit more correlations than the Greek and Roman families. To reveal the origins of the LRCs, we focus on the long words and perform burst and correlation analysis in their positions along the corpora. We find that the universal features are linked more to the correlations of the inter-long word distances while the language-specific aspects are related more to their distributions.
Human attitudes towards herpetofauna: The influence of folklore and negative values on the conservation of amphibians and reptiles in Portugal

PubMed Central

2012-01-01

Background Human values and folklore of wildlife strongly influence the effectiveness of conservation efforts. These values and folklore may also vary with certain demographic characteristics such as gender, age, or education. Reptiles and amphibians are among the least appreciated of vertebrates and are victims of many negative values and wrong ideas resulting from the direct interpretation of folklore. We try to demonstrate how these values and folklore can affect the way people relate to them and also the possible conservation impacts on these animals. Methods A questionnaire survey distributed to 514 people in the district of Évora, Portugal, was used to obtain data regarding the hypothesis that the existence of wrong ideas and negative values contributes to the phenomenon of human-associated persecution of these animals. A structural equation model was specified in order to confirm the hypothesis about the possible relationships between the presence of perceptions and negative values about amphibians and reptiles and persecution and anti-conservation attitudes. Sociodemographic variables were also added. Results The results of the model suggest that the presence of folklore and negative values clearly predicts persecution and anti-conservation attitudes towards amphibians and reptiles. Also, the existence of folklore varies sociodemographically, but negative values concerning these animals are widespread in the population. Conclusions With the use of structural equation models, this work is a contribution to the study of how certain ideas and values can directly influence human attitudes towards herpetofauna and how they can be a serious conservation issue. PMID:22316318
Human attitudes towards herpetofauna: the influence of folklore and negative values on the conservation of amphibians and reptiles in Portugal.

PubMed

Ceríaco, Luis Mp

2012-02-08

Human values and folklore of wildlife strongly influence the effectiveness of conservation efforts. These values and folklore may also vary with certain demographic characteristics such as gender, age, or education. Reptiles and amphibians are among the least appreciated of vertebrates and are victims of many negative values and wrong ideas resulting from the direct interpretation of folklore. We try to demonstrate how these values and folklore can affect the way people relate to them and also the possible conservation impacts on these animals. A questionnaire survey distributed to 514 people in the district of Évora, Portugal, was used to obtain data regarding the hypothesis that the existence of wrong ideas and negative values contributes to the phenomenon of human-associated persecution of these animals. A structural equation model was specified in order to confirm the hypothesis about the possible relationships between the presence of perceptions and negative values about amphibians and reptiles and persecution and anti-conservation attitudes. Sociodemographic variables were also added. The results of the model suggest that the presence of folklore and negative values clearly predicts persecution and anti-conservation attitudes towards amphibians and reptiles. Also, the existence of folklore varies sociodemographically, but negative values concerning these animals are widespread in the population. With the use of structural equation models, this work is a contribution to the study of how certain ideas and values can directly influence human attitudes towards herpetofauna and how they can be a serious conservation issue.
AN ANTHOLOGY OF KRIO FOLKLORE AND LITERATURE, WITH NOTES AND INTERLINEAR TRANSLATION IN ENGLISH. VOLUME 1 OF THE KRIO LANGUAGE OF SIERRA LEONE, WEST AFRICA, IN TWO VOLUMES.

ERIC Educational Resources Information Center

TURNER, LORENZO D.

MOST OF THE KRIO FOLKLORE AND LITERATURE TRANSCRIBED IN THIS VOLUME HAS NEVER BEFORE APPEARED IN PRINT. IT IS INTENDED FOR THE PEOPLE OF SIERRA LEONE THEMSELVES, AS WELL AS FOR PERSONS WHO WISH TO LEARN MORE ABOUT WEST AFRICAN CULTURE AND THE KRIO LANGUAGE. INCLUDED IN THE FOLKLORE RECORDED HERE ARE PROVERBS, RIDDLES, AND FOLK TALES GATHERED IN…
Synonym extraction and abbreviation expansion with ensembles of semantic spaces.

PubMed

Henriksson, Aron; Moen, Hans; Skeppstedt, Maria; Daudaravičius, Vidas; Duneld, Martin

2014-02-05

Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, often resulting in low coverage. Although models of distributional semantics applied to large corpora provide a potential means of supporting development of such resources, their ability to isolate synonymy from other semantic relations is limited. Their application in the clinical domain has also only recently begun to be explored. Combining distributional models and applying them to different types of corpora may lead to enhanced performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. A combination of two distributional models - Random Indexing and Random Permutation - employed in conjunction with a single corpus outperforms using either of the models in isolation. Furthermore, combining semantic spaces induced from different types of corpora - a corpus of clinical text and a corpus of medical journal articles - further improves results, outperforming a combination of semantic spaces induced from a single source, as well as a single semantic space induced from the conjoint corpus. A combination strategy that simply sums the cosine similarity scores of candidate terms is generally the most profitable out of the ones explored. Finally, applying simple post-processing filtering rules yields substantial performance gains on the tasks of extracting abbreviation-expansion pairs, but not synonyms. The best results, measured as recall in a list of ten candidate terms, for the three tasks are: 0.39 for abbreviations to long forms, 0.33 for long forms to abbreviations, and 0.47 for synonyms. This study demonstrates that ensembles of semantic spaces can yield improved performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. This notion, which merits further exploration, allows different distributional models - with different model parameters - and different types of corpora to be combined, potentially allowing enhanced performance to be obtained on a wide range of natural language processing tasks.
Synonym extraction and abbreviation expansion with ensembles of semantic spaces

PubMed Central

2014-01-01

Background Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, often resulting in low coverage. Although models of distributional semantics applied to large corpora provide a potential means of supporting development of such resources, their ability to isolate synonymy from other semantic relations is limited. Their application in the clinical domain has also only recently begun to be explored. Combining distributional models and applying them to different types of corpora may lead to enhanced performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. Results A combination of two distributional models – Random Indexing and Random Permutation – employed in conjunction with a single corpus outperforms using either of the models in isolation. Furthermore, combining semantic spaces induced from different types of corpora – a corpus of clinical text and a corpus of medical journal articles – further improves results, outperforming a combination of semantic spaces induced from a single source, as well as a single semantic space induced from the conjoint corpus. A combination strategy that simply sums the cosine similarity scores of candidate terms is generally the most profitable out of the ones explored. Finally, applying simple post-processing filtering rules yields substantial performance gains on the tasks of extracting abbreviation-expansion pairs, but not synonyms. The best results, measured as recall in a list of ten candidate terms, for the three tasks are: 0.39 for abbreviations to long forms, 0.33 for long forms to abbreviations, and 0.47 for synonyms. Conclusions This study demonstrates that ensembles of semantic spaces can yield improved performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs. This notion, which merits further exploration, allows different distributional models – with different model parameters – and different types of corpora to be combined, potentially allowing enhanced performance to be obtained on a wide range of natural language processing tasks. PMID:24499679
Early Development of Demonstratives in Pre-Qin Chinese

ERIC Educational Resources Information Center

Deng, Lin

2011-01-01

This dissertation offers a new dynamic account of the evolution of the demonstrative system in pre-Qin Chinese based on a comprehensive linguistic analysis of the phonological, morphological, syntactic, semantic, and pragmatic aspects of demonstratives attested in two corpora of excavated texts, i.e. the oracle-bone inscriptions dated to the late…
Kratylos: A Tool for Sharing Interlinearized and Lexical Data in Diverse Formats

ERIC Educational Resources Information Center

Kaufman, Daniel; Finkel, Raphael

2018-01-01

In this paper we present Kratylos, at www.kratylos.org/, a web application that creates searchable multimedia corpora from data collections in diverse formats, including collections of interlinearized glossed text (IGT) and dictionaries. There exists a crucial lacuna in the electronic ecology that supports language documentation and linguistic…
Citation Matching in Sanskrit Corpora Using Local Alignment

NASA Astrophysics Data System (ADS)

Prasad, Abhinandan S.; Rao, Shrisha

Citation matching is the problem of finding which citation occurs in a given textual corpus. Most existing citation matching work is done on scientific literature. The goal of this paper is to present methods for performing citation matching on Sanskrit texts. Exact matching and approximate matching are the two methods for performing citation matching. The exact matching method checks for exact occurrence of the citation with respect to the textual corpus. Approximate matching is a fuzzy string-matching method which computes a similarity score between an individual line of the textual corpus and the citation. The Smith-Waterman-Gotoh algorithm for local alignment, which is generally used in bioinformatics, is used here for calculating the similarity score. This similarity score is a measure of the closeness between the text and the citation. The exact- and approximate-matching methods are evaluated and compared. The methods presented can be easily applied to corpora in other Indic languages like Kannada, Tamil, etc. The approximate-matching method can in particular be used in the compilation of critical editions and plagiarism detection in a literary work.

Childbirth in ancient Rome: from traditional folklore to obstetrics.

PubMed

Todman, Donald

2007-04-01

In ancient Rome, childbirth was a hazardous event for both mother and child with high rates of infant and maternal mortality. Traditional Roman medicine centred on folklore and religious practices, but with the development of Hippocratic medicine came significant advances in the care of women during pregnancy and confinement. Midwives or obstetrices played an important role and applied rational scientific practices to improve outcomes. This evolution from folklore to obstetrics was a pivotal point in the history of childbirth.
Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

PubMed

Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor

2015-01-01

Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the ShARe/CLEF (https://sites.google.com/site/shareclefehealth/data) and i2b2 (https://i2b2.org/NLP/DataSets/) corpora needs to be requested with the individual corpus providers.
Transfer learning for biomedical named entity recognition with neural networks.

PubMed

Giorgi, John M; Bader, Gary D

2018-06-01

The explosive increase of biomedical literature has made information extraction an increasingly important tool for biomedical research. A fundamental task is the recognition of biomedical named entities in text (BNER) such as genes/proteins, diseases, and species. Recently, a domain-independent method based on deep learning and statistical word embeddings, called long short-term memory network-conditional random field (LSTM-CRF), has been shown to outperform state-of-the-art entity-specific BNER tools. However, this method is dependent on gold-standard corpora (GSCs) consisting of hand-labeled entities, which tend to be small but highly reliable. An alternative to GSCs are silver-standard corpora (SSCs), which are generated by harmonizing the annotations made by several automatic annotation systems. SSCs typically contain more noise than GSCs but have the advantage of containing many more training examples. Ideally, these corpora could be combined to achieve the benefits of both, which is an opportunity for transfer learning. In this work, we analyze to what extent transfer learning improves upon state-of-the-art results for BNER. We demonstrate that transferring a deep neural network (DNN) trained on a large, noisy SSC to a smaller, but more reliable GSC significantly improves upon state-of-the-art results for BNER. Compared to a state-of-the-art baseline evaluated on 23 GSCs covering four different entity classes, transfer learning results in an average reduction in error of approximately 11%. We found transfer learning to be especially beneficial for target data sets with a small number of labels (approximately 6000 or less). Source code for the LSTM-CRF is available athttps://github.com/Franck-Dernoncourt/NeuroNER/ and links to the corpora are available athttps://github.com/BaderLab/Transfer-Learning-BNER-Bioinformatics-2018/. john.giorgi@utoronto.ca. Supplementary data are available at Bioinformatics online.
A Learner Corpus-Based Study on Verb Errors of Turkish EFL Learners

ERIC Educational Resources Information Center

Can, Cem

2017-01-01

As learner corpora have presently become readily accessible, it is practicable to examine interlanguage errors and carry out error analysis (EA) on learner-generated texts. The data available in a learner corpus enable researchers to investigate authentic learner errors and their respective frequencies in terms of types and tokens as well as…
Working with Corpora in the Translation Classroom

ERIC Educational Resources Information Center

Krüger, Ralph

2012-01-01

This article sets out to illustrate possible applications of electronic corpora in the translation classroom. Starting with a survey of corpus use within corpus-based translation studies, the didactic value of corpora in the translation classroom and their epistemic value in translation teaching and practice will be elaborated. A typology of…
Using Monolingual and Bilingual Corpora in Lexicography

ERIC Educational Resources Information Center

Miangah, Tayebeh Mosavi

2009-01-01

Constructing and exploiting different types of corpora are among computer applications exposed to the researchers in different branches of science including lexicography. In lexicography, different types of corpora may be of great help in finding the most appropriate uses of words and expressions by referring to numerous examples and citations.…
CUILESS2016: a clinical corpus applying compositional normalization of text mentions.

PubMed

Osborne, John D; Neu, Matthew B; Danila, Maria I; Solorio, Thamar; Bethard, Steven J

2018-01-10

Traditionally text mention normalization corpora have normalized concepts to single ontology identifiers ("pre-coordinated concepts"). Less frequently, normalization corpora have used concepts with multiple identifiers ("post-coordinated concepts") but the additional identifiers have been restricted to a defined set of relationships to the core concept. This approach limits the ability of the normalization process to express semantic meaning. We generated a freely available corpus using post-coordinated concepts without a defined set of relationships that we term "compositional concepts" to evaluate their use in clinical text. We annotated 5397 disorder mentions from the ShARe corpus to SNOMED CT that were previously normalized as "CUI-less" in the "SemEval-2015 Task 14" shared task because they lacked a pre-coordinated mapping. Unlike the previous normalization method, we do not restrict concept mappings to a particular set of the Unified Medical Language System (UMLS) semantic types and allow normalization to occur to multiple UMLS Concept Unique Identifiers (CUIs). We computed annotator agreement and assessed semantic coverage with this method. We generated the largest clinical text normalization corpus to date with mappings to multiple identifiers and made it freely available. All but 8 of the 5397 disorder mentions were normalized using this methodology. Annotator agreement ranged from 52.4% using the strictest metric (exact matching) to 78.2% using a hierarchical agreement that measures the overlap of shared ancestral nodes. Our results provide evidence that compositional concepts can increase semantic coverage in clinical text. To our knowledge we provide the first freely available corpus of compositional concept annotation in clinical text.
ACHP | Working Together to Build a More Inclusive Preservation Program

Science.gov Websites

goal is to better position folklorists and folklore methodologies as central forces in historic . Folklore methodologies can help engage the local community and elicit their voices. As the National Park
Learner Corpora: The Missing Link in EAP Pedagogy

ERIC Educational Resources Information Center

Gilquin, Gaetanelle; Granger, Sylviane; Paquot, Magali

2007-01-01

This article deals with the place of learner corpora, i.e. corpora containing authentic language data produced by learners of a foreign/second language, in English for academic purposes (EAP) pedagogy and sets out to demonstrate that they have a valuable contribution to make to the field. Following an initial brief introduction to corpus-based…
Corpora and Language Assessment: The State of the Art

ERIC Educational Resources Information Center

Park, Kwanghyun

2014-01-01

This article outlines the current state of and recent developments in the use of corpora for language assessment and considers future directions with a special focus on computational methodology. Because corpora began to make inroads into language assessment in the 1990s, test developers have increasingly used them as a reference resource to…
From Pedagogically Relevant Corpora to Authentic Language Learning Contents

ERIC Educational Resources Information Center

Braun, Sabine

2005-01-01

The potential of corpora for language learning and teaching has been widely acknowledged and their ready availability on the Web has facilitated access for a broad range of users, including language teachers and learners. However, the integration of corpora into general language learning and teaching practice has so far been disappointing. In this…
The Importance of Corpora in Translation Studies: A Practical Case

ERIC Educational Resources Information Center

Bermúdez Bausela, Montserrat

2016-01-01

This paper deals with the use of corpora in Translation Studies, particularly with the so-called "'ad hoc' corpus" or "translator's corpus" as a working tool both in the classroom and for the professional translator. We believe that corpora are an inestimable source not only for terminology and phraseology extraction (cf. Maia,…
Conventions for sign and speech transcription of child bimodal bilingual corpora in ELAN.

PubMed

Chen Pichler, Deborah; Hochgesang, Julie A; Lillo-Martin, Diane; de Quadros, Ronice Müller

2010-01-01

This article extends current methodologies for the linguistic analysis of sign language acquisition to cases of bimodal bilingual acquisition. Using ELAN, we are transcribing longitudinal spontaneous production data from hearing children of Deaf parents who are learning either American Sign Language (ASL) and American English (AE), or Brazilian Sign Language (Libras, also referred to as Língua de Sinais Brasileira/LSB in some texts) and Brazilian Portuguese (BP). Our goal is to construct corpora that can be mined for a wide range of investigations on various topics in acquisition. Thus, it is important that we maintain consistency in transcription for both signed and spoken languages. This article documents our transcription conventions, including the principles behind our approach. Using this document, other researchers can chose to follow similar conventions or develop new ones using our suggestions as a starting point.
Conventions for sign and speech transcription of child bimodal bilingual corpora in ELAN

PubMed Central

Chen Pichler, Deborah; Hochgesang, Julie A.; Lillo-Martin, Diane; de Quadros, Ronice Müller

2011-01-01

This article extends current methodologies for the linguistic analysis of sign language acquisition to cases of bimodal bilingual acquisition. Using ELAN, we are transcribing longitudinal spontaneous production data from hearing children of Deaf parents who are learning either American Sign Language (ASL) and American English (AE), or Brazilian Sign Language (Libras, also referred to as Língua de Sinais Brasileira/LSB in some texts) and Brazilian Portuguese (BP). Our goal is to construct corpora that can be mined for a wide range of investigations on various topics in acquisition. Thus, it is important that we maintain consistency in transcription for both signed and spoken languages. This article documents our transcription conventions, including the principles behind our approach. Using this document, other researchers can chose to follow similar conventions or develop new ones using our suggestions as a starting point. PMID:21625371
Naming Disney's Dwarfs.

ERIC Educational Resources Information Center

Sidwell, Robert T.

1980-01-01

Discusses Disney's version of the folkloric dwarfs in his production of "Snow White" and weighs the Disney rendition of the dwarf figure against the corpus of traits and behaviors pertaining to dwarfs in traditional folklore. Concludes that Disney's dwarfs are "anthropologically true." (HOD)
Menstruation in Ulysses.

PubMed

Mullin, Katherine

2008-01-01

This article investigates James Joyce's fascination with a wide variety of medical texts, sexual folklores, religious beliefs, and persistent superstitions about menstruation. That fascination finds its way into Ulysses, which draws upon a number of intertexts to inform a curiosity about the female body most strikingly articulated by Bloom, Molly, and Gerty MacDowell. These intertexts are not simply imported into the novel but are dismantled and interrogated, as Joyce exposes, rather than endorses, clichés of essential femininity.
Reduction corporoplasty.

PubMed

Hakky, Tariq S; Martinez, Daniel; Yang, Christopher; Carrion, Rafael E

2015-01-01

Here we present the first video demonstration of reduction corporoplasty in the management of phallic disfigurement in a 17 year old man with a history sickle cell disease and priapism. Surgical management of aneurysmal dilation of the corpora has yet to be defined in the literature. We preformed bilateral elliptical incisions over the lateral corpora as management of aneurysmal dilation of the corpora to correct phallic disfigurement. The patient tolerated the procedure well and has resolution of his corporal disfigurement. Reduction corporoplasty using bilateral lateral elliptical incisions in the management of aneurysmal dilation of the corpora is a safe an feasible operation in the management of phallic disfigurement.
A Sample Corpus Integration in Language Teacher Education through Coursebook Evaluation

ERIC Educational Resources Information Center

Asik, Asuman

2017-01-01

The use of corpora has an increased interest in language teaching in the past two decades. Many corpora have been utilized for several purposes in language classrooms directly or indirectly. In spite of the increasing awareness towards the use of corpora and the corpus tools, language teacher education programs still do not include corpus…
How Can We Use Corpus Wordlists for Language Learning? Interfaces between Computer Corpora and Expert Intervention

ERIC Educational Resources Information Center

Chen, Yu-Hua; Bruncak, Radovan

2015-01-01

With the advances in technology, wordlists retrieved from computer corpora have become increasingly popular in recent years. The lexical items in those wordlists are usually selected, according to a set of robust frequency and dispersion criteria, from large corpora of authentic and naturally occurring language. Corpus wordlists are of great value…
Folkloric Art in Egyptian Schools.

ERIC Educational Resources Information Center

Osman, Siham

1983-01-01

Theories in art education with a western origin have been applied in Egypt to support the revival of folkloric art. There are three important phases in the teaching of a unit on applique, a decorative craft dating back to the earliest Egyptian history. (AM)

[An outline medical history of Taiwan (I): the period of folklore medicine and witch doctor].

PubMed

Li, C

1997-01-01

The paper makes a correlated analysis on the origin of health folklore between Chinese in mainland and Taiwan island. After quoting literatures written by authors living in the Qing dynasty in Taiwan, this paper analyses health condition among aboriginals of Taiwan during the witchcraft age. Along with the increasing immigration from China mainland to Taiwan island, health of folklore and gods from China mainland were introduced into Taiwan, hence the period of witch doctor in Taiwan, featuring the correlation of both. Though modern medicine in Taiwan is so advanced, yet there are still witch doctors elsewhere.
Information Systems for the Museum of Japanese History, Archaeology and Folklore

NASA Astrophysics Data System (ADS)

Terui, Takehiko

General idea and outline of museums of Japanese history, archaeology and folklore are introduced, and the relationship between exhibits and information in them is described. Then the information systems of these museums are explained in some detail. As an example, the author describes the information systems for the museum of Japanese history, archaeology and folklore by comparing the computer system with the traditional manual system. Japanese language processing and image handling derived from the systems are also described. Significance and problems of nationwide information network linking these museums each other, and problems of staffs in the information sections are mentioned.
Progestogen treatments for cycle management in a sheep model of assisted conception affect the growth patterns, the expression of luteinizing hormone receptors, and the progesterone secretion of induced corpora lutea.

PubMed

Letelier, Claudia; García-Fernández, Rosa Ana; Contreras-Solis, Ignacio; Sanchez, María Angeles; Garcia-Palencia, Pilar; Sanchez, Belen; Gonzalez-Bulnes, Antonio; Flores, Juana María

2010-03-01

To determine, in a sheep model, the effect of a short-term progestative treatment on growth dynamics and functionality of induced corpora lutea. Observational, model study. Public university. Sixty adult female sheep. Synchronization and induction of ovulation with progestogens and prostaglandin analogues; ovarian ultrasonography, blood sampling, and ovariectomy. Determination of pituitary function and morphologic characteristics, expression of luteinizing hormone (LH) receptors, and progesterone secretion of corpora lutea. The use of progestative pretreatments for assisted conception affect the growth patterns, the expression of LH receptors, and the progesterone secretion of induced corpora lutea. The current study indicates, in a sheep model, the existence of deleterious effects from progestogens on functionality of induced corpora lutea. Copyright 2010 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
A Corpus-Based EAP Course for NNS Doctoral Students: Moving from Available Specialized Corpora to Self-Compiled Corpora

ERIC Educational Resources Information Center

Lee, David; Swales, John

2006-01-01

This paper presents a discussion of an experimental, innovative course in corpus-informed EAP for doctoral students. Participants were given access to specialized corpora of academic writing and speaking, instructed in the tools of the trade (web- and PC-based concordancers) and gradually inducted into the skills needed to best exploit the data…
The Use of General and Specialized Corpora as Reference Sources for Academic English Writing: A Case Study

ERIC Educational Resources Information Center

Chang, Ji-Yeon

2014-01-01

Corpora have been suggested as valuable sources for teaching English for academic purposes (EAP). Since previous studies have mainly focused on corpus use in classroom settings, more research is needed to reveal how students react to using corpora on their own and what should be provided to help them become autonomous corpus users, considering…
Reduction Corporoplasty

PubMed Central

Hakky, Tariq S.; Martinez, Daniel; Yang, Christopher; Carrion, Rafael E.

2015-01-01

Objective Here we present the first video demonstration of reduction corporoplasty in the management of phallic disfigurement in a 17 year old man with a history sickle cell disease and priapism. Introduction Surgical management of aneurysmal dilation of the corpora has yet to be defined in the literature. Materials and Methods: We preformed bilateral elliptical incisions over the lateral corpora as management of aneurysmal dilation of the corpora to correct phallic disfigurement. Results The patient tolerated the procedure well and has resolution of his corporal disfigurement. Conclusions Reduction corporoplasty using bilateral lateral elliptical incisions in the management of aneurysmal dilation of the corpora is a safe an feasible operation in the management of phallic disfigurement. PMID:26005988
Astronomical Context of Georgian Folklore

NASA Astrophysics Data System (ADS)

Jijelava1, Badri; Holbrook, Jarita; Simonia, Irakli

2016-10-01

Objectives: The religious Ancient megalithic monuments are accordingly o/riente to the ancient Gods - The Sun, Moon, luminaries. The aim of this work to research the ethnographic data, current folklore and based on the results, harmonize the ancient Gods and the orientations of the religious megalithic complexes. Methods/Statistical Analysis: We harmonized the ethnographical, folklore and historical information and restoration of ancient celestial sphere (using special astronomy application) and identified the correlations between the some acronychal or helical rising/set of luminaries and orientations of megalithic objects. Such connections are stored in a folklore. Findings: This technique of investigations gives us more clear understanding of ancient universe. Using this method, we can receive additional information about the ancient Gods - Luminaries, clarify current mythology, date the megalithic complex. Application/Improvements: This method of investigation - Harmonization cultural astronomy and archae or astronomy with the archeological investigations will be more fruitful, because it gives us reliable information concerning the ancient culture, ancient religion and ancient people.
Folklore and Fantasy--Mix or Match?

ERIC Educational Resources Information Center

Weber, Rosemary

While folklore, fairytales, and fantasy vary in definition, they possess the common elements of supernatural beings, strange locales, and imaginative content. Folk tales, originally intended for all ages, were meant to convey lessons about moral behavior and group values; good was rewarded and evil punished. In contemporary literature, high…
Gonadotropin binding sites in human ovarian follicles and corpora lutea during the menstrual cycle

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shima, K.; Kitayama, S.; Nakano, R.

Gonadotropin binding sites were localized by autoradiography after incubation of human ovarian sections with /sup 125/I-labeled gonadotropins. The binding sites for /sup 125/I-labeled human follicle-stimulating hormone (/sup 125/I-hFSH) were identified in the granulosa cells and in the newly formed corpora lutea. The /sup 125/I-labeled human luteinizing hormone (/sup 125/I-hLH) binding to the thecal cells increased during follicular maturation, and a dramatic increase was preferentially observed in the granulosa cells of the large preovulatory follicle. In the corpora lutea, the binding of /sup 125/I-hLH increased from the early luteal phase and decreased toward the late luteal phase. The changes in 3more » beta-hydroxysteroid dehydrogenase activity in the corpora lutea corresponded to the /sup 125/I-hLH binding. Thus, the changes in gonadotropin binding sites in the follicles and corpora lutea during the menstrual cycle may help in some important way to regulate human ovarian function.« less
Family Heritage: History and Folklore.

ERIC Educational Resources Information Center

Long, Susan

1993-01-01

As a means of integrating Appalachian culture and folklore into the curriculum, a fifth-grade social studies unit has students create a personal history book by studying the origin and history of their own name, developing their own memory stories, developing a family tree, studying family artifacts and old photographs, and interviewing family…
Folklore and the College Selection Process Revisited

ERIC Educational Resources Information Center

Caruso, Pete

2012-01-01

This paper is a response to Clinton F. Conrad's article, "Beyond the Folklore." Conrad's strategy for assessing undergraduate quality echoes the sentiments espoused by many admission and college counseling professionals over the years at various workshops for students and families that focus on navigating the process. As transcendent as the…
Literatura Oral Hispanica (Hispanic Oral Literature).

ERIC Educational Resources Information Center

McAlpine, Dave

As part of a class in Hispanic Oral Literature, students collected pieces of folklore from various Hispanic residents in the region known as "Siouxland" in Iowa. Consisting of some of the folklore recorded from the residents, this paper includes 18 "cuentos y leyendas" (tales and legends), 48 "refranes" (proverbs), 17…
The Galileo Legend as Scientific Folklore.

ERIC Educational Resources Information Center

Lessl, Thomas M.

1999-01-01

Examines the various ways in which the legend of Galileo's persecution by the Roman Catholic Church diverges from scholarly readings of the Galileo affair. Finds five distinct themes of scientific ideology in the 40 accounts examined. Assesses the part that folklore plays in building and sustaining a professional ideology for the modern scientific…
South Florida Folk Arts: A Teacher's Guide.

ERIC Educational Resources Information Center

Bucuvalas, Tina

Folklore and folk arts encompass the body of traditional knowledge learned and artifacts produced outside of formal institutions as a result of participation in folk groups. A great portion of daily life and culture is folk. Folklore and folk arts acquire distinctly local characteristics through the influences of geography, history, or talented…
Trends in Folklore Studies Development in the Research and Education Space at Ukrainian and Foreign Universities

ERIC Educational Resources Information Center

Vovk, Myroslava

2017-01-01

Trends in development of folklore studies in the research and education space at Ukrainian and foreign universities have been analyzed. They are fundamentalization, synthesis of academic science and educational practice, professionalization, institutionalization, humanitarization, anthropoligization, interdisciplinarity. It has been defined that…
Hispanic Tradition-Folkloric Music and Dance.

ERIC Educational Resources Information Center

Trujillo, Lorenzo A.

The Hispanic folkloric tradition of Colorado and New Mexico had its beginning in the 1500's and 1600's when the area was colonized by the Spaniards. The "manito" (used by Hispanics in the Southwest to refer to descendants of the area's Spanish colonials) culture has maintained a strong sense of ethnic identity because of geographic…
Confronting Common Folklore: Catching a Cold

ERIC Educational Resources Information Center

Keeley, Page

2012-01-01

Almost every child has experienced the sniffly, stuffy, and achy congestion of the common cold. In addition, many have encountered the "old wives tales" that forge a link between personal actions and coming down with this common respiratory infection. Much of this health folklore has been passed down from generation to generation (e.g., getting a…
Student Worlds, Student Words: Teaching Writing through Folklore.

ERIC Educational Resources Information Center

Simons, Elizabeth Radin

Encouraging teachers of middle and secondary school students to learn to write using their own folklore, each chapter in this book presents a 1- to 3-week unit of study including background information, student activities, transcripts of discussions, and suggested readings for both teachers and students. After an introduction (Knowing Our Insides…
American Folk Music and Folklore Recordings 1985: A Selected List.

ERIC Educational Resources Information Center

Library of Congress, Washington, DC. American Folklife Center.

Thirty outstanding records and tapes of traditional music and folklore which were released in 1985 are described in this illustrated booklet. All of these recordings are annotated with liner notes or accompanying booklets relating the recordings to the performers, their communities, genres, styles, or other pertinent information. The items are…
Mexican-American Folklore: An Approach to the Research Paper.

ERIC Educational Resources Information Center

Seale, Jan

Having freshman English students at Pan American University in the Rio Grande valley of Texas focus on Mexican-American folklore themes for research papers has proved to be successful in motivating students and in activating their ethnic interests and cultural pride. Steps involved in preparing these research papers include choosing a topic which…

"There Are Other Ways To Get Happy": African American Urban Folklore. Working Papers #2.

ERIC Educational Resources Information Center

McGregory, Jerrilyn

"There are other ways to get happy," the slogan signifying "Say no to drugs!" is gaining attention within the African American community in the Philadelphia (Pennsylvania) area. "There are other ways to get happy" comes from learning about and understanding traditional elements of African American folklore. For those…
Nho Lobo: Folk Tales of the Cape Verdean People. Teacher's Guide.

ERIC Educational Resources Information Center

Nyhan, Patricia; Almeida, Raymond A.

The teacher's guide presents two Cape Verdean folktales, background information, discussion questions, and activity suggestions for grades 4-6. The objective is to teach students about Cape Verde and its culture through folklore. The guide contains five sections. Section I offers a description of Cape Verdean folklore, describes five ways folklore…
Cultural, Ethnographical and Religious Context of Georgian Folklore

NASA Astrophysics Data System (ADS)

Jijelava, Badri; Holbrook, Jarita; Simonia, Irakli

2017-05-01

The culture of Georgia is rooted in the ancient religions which is traceable in the ethnographic data of the present day. The people who inhabited this area worshipped the star Arcturus in the Bootes constellation. This connection to Arcturus is reflected in the local folklore about the ploughman, oxen, dog and wolf.
Folklore for Teachers: Deutsche Volkskunde im Sprachunterricht (German Folklore in Language Instruction).

ERIC Educational Resources Information Center

Weber, Berta N.

1971-01-01

Cultural study provides an invaluable tool for the motivation and enrichment of work in the language classroom. The teacher of German, having decided to embark on a culture study program, must not, however, make the mistake of concentrating on the past, nor of letting current political boundaries restrict his approach; rather, he will find that…
Against Her Kind: The Phenomenom of Women against Women in Ovia Cult Worship

ERIC Educational Resources Information Center

Yakubu, Anthonia Makwemoisa

2014-01-01

This paper addresses the incidence of 'Women against Women' in Nigerian folklore. Much has been written on Nigerian folklore, but mainly from within the mortal axis, as reflected in many folktales that cut across different communities in Nigeria. However, it has been observed that this gender phenomenon extends to the supernatural realm, where…
Early Years Education and the Value for Money Folklore

ERIC Educational Resources Information Center

Campbell-Barr, Verity

2012-01-01

This article is intended as a contribution to the debate on the role of human capital in determining value for money in early years education. The article explores how the idea that early years education offers value for money has become folklore amongst policymakers and more widely. However, drawing on both interview data and existing literature…
Dissemination of Values and Culture through the E-Folklore

ERIC Educational Resources Information Center

Rahim, Normaliza Abd; Affendi, Nik Rafidah Nik Muhammad; Pawi, Awang Azman Awang

2017-01-01

This study focuses on the values and culture in the e-folklore. The objectives of the study were to identify and discuss the values in the song lyric "The Stork and the Mouse Deer." The song was taken from phone application in the compilation of the "Kingfisher stories" copyrighted by Dewan Bahasa and Pustaka. The e-folklore…
Cross domains Arabic named entity recognition system

NASA Astrophysics Data System (ADS)

Al-Ahmari, S. Saad; Abdullatif Al-Johar, B.

2016-07-01

Named Entity Recognition (NER) plays an important role in many Natural Language Processing (NLP) applications such as; Information Extraction (IE), Question Answering (QA), Text Clustering, Text Summarization and Word Sense Disambiguation. This paper presents the development and implementation of domain independent system to recognize three types of Arabic named entities. The system works based on a set of domain independent grammar-rules along with Arabic part of speech tagger in addition to gazetteers and lists of trigger words. The experimental results shown, that the system performed as good as other systems with better results in some cases of cross-domains corpora.
Primary diffuse large B-cell lymphoma of the corpora cavernosa presented as a perineal mass

PubMed Central

Carlos, González-Satué; Ivanna, Valverde Vilamala; Gustavo, Tapia Melendo; Joan, Areal Calama; Javier, Sanchez Macias; Luis, Ibarz Servio

2012-01-01

Primary male genital lymphomas may appear rarely in testis, and exceptionally in the penis and prostate, but there is not previous evidence of a lymphoma arising from the corpora cavernosa. We report the first case in the literature of a primary diffuse cell B lymphoma of the corpora cavernosa presented with low urinary tract symptoms, perineal pain and palpable mass. Diagnosis was based on trucut biopsy, histopathological studies and computed tomographic images. PMID:22919138
Plays of America from American Folklore for Young Actors Grades 7-12. Young Actors Series.

ERIC Educational Resources Information Center

McCullough, L. E.

Designed to combine with studies in various disciplines, such as history, costume, language, dance, music, and social studies, the 10 plays in this book are drawn from the fount of American folklore and popular culture. They range from European, African, and Asian folktales recast in New World settings to the mythology associated with real-life…
Ships in Russian Literature: Folklore Aesthetics

ERIC Educational Resources Information Center

Dudareva, Marianna A.; Pogukaeva, Anna V.; Polyantseva, Evgeniya A.; Karpova, Yulia V.

2017-01-01

The paper studies a genesis of the ship image in the Russian literature and folklore, an idea of "other kingdom" in the Russian literature poetics of the 19-20 centuries. An emphasis is put on the issues related to the metaphor of a ship, a boat in the artistic world of Lermontov, Turgenev, Dostoevsky and in the poetry of the early 20th…
A World Full of Stories. An Annotated Bibliography of Folk Literature. Traditional Literature and Folklore in Library and Storytelling Programs.

ERIC Educational Resources Information Center

Johnson, Paul Anthony, Ed.

The first volume of a projected series entitled "Traditional Literature and Folklore in Library and Storytelling Programs," this annotated bibliography was produced by graduate students in the Traditional Literature and Oral Narration class at the University of Hawaii at Manoa. The bibliography is designed to provide librarians and…
Tibetans and Tibetan Americans: Helping K-8 School Librarians and Educators Understand Their History, Culture, and Literature.

ERIC Educational Resources Information Center

Bruno, Frank Alan; Beilke, Patricia F.

2001-01-01

Provides a review and listing of literature for K-8 school librarians and teachers that focuses on the geography, history, and culture of Tibet and the diverse experiences and folklore of Tibetans. Includes references, other recommended works, and an annotated bibliography divided into folklore, biography, culture and history, fiction, videos, and…
[A customized method for information extraction from unstructured text data in the electronic medical records].

PubMed

Bao, X Y; Huang, W J; Zhang, K; Jin, M; Li, Y; Niu, C Z

2018-04-18

There is a huge amount of diagnostic or treatment information in electronic medical record (EMR), which is a concrete manifestation of clinicians actual diagnosis and treatment details. Plenty of episodes in EMRs, such as complaints, present illness, past history, differential diagnosis, diagnostic imaging, surgical records, reflecting details of diagnosis and treatment in clinical process, adopt Chinese description of natural language. How to extract effective information from these Chinese narrative text data, and organize it into a form of tabular for analysis of medical research, for the practical utilization of clinical data in the real world, is a difficult problem in Chinese medical data processing. Based on the EMRs narrative text data in a tertiary hospital in China, a customized information extracting rules learning, and rule based information extraction methods is proposed. The overall method consists of three steps, which includes: (1) Step 1, a random sample of 600 copies (including the history of present illness, past history, personal history, family history, etc.) of the electronic medical record data, was extracted as raw corpora. With our developed Chinese clinical narrative text annotation platform, the trained clinician and nurses marked the tokens and phrases in the corpora which would be extracted (with a history of diabetes as an example). (2) Step 2, based on the annotated corpora clinical text data, some extraction templates were summarized and induced firstly. Then these templates were rewritten using regular expressions of Perl programming language, as extraction rules. Using these extraction rules as basic knowledge base, we developed extraction packages in Perl, for extracting data from the EMRs text data. In the end, the extracted data items were organized in tabular data format, for later usage in clinical research or hospital surveillance purposes. (3) As the final step of the method, the evaluation and validation of the proposed methods were implemented in the National Clinical Service Data Integration Platform, and we checked the extraction results using artificial verification and automated verification combined, proved the effectiveness of the method. For all the patients with diabetes as diagnosed disease in the Department of Endocrine in the hospital, the medical history episode of these patients showed that, altogether 1 436 patients were dismissed in 2015, and a history of diabetes medical records extraction results showed that the recall rate was 87.6%, the accuracy rate was 99.5%, and F-Score was 0.93. For all the 10% patients (totally 1 223 patients) with diabetes by the dismissed dates of August 2017 in the same department, the extracted diabetes history extraction results showed that the recall rate was 89.2%, the accuracy rate was 99.2%, F-Score was 0.94. This study mainly adopts the combination of natural language processing and rule-based information extraction, and designs and implements an algorithm for extracting customized information from unstructured Chinese electronic medical record text data. It has better results than existing work.
Morphometric Correlates of the Ovary and Ovulatory Corpora in the Bowhead Whale, Balaena mysticetus.

PubMed

Tarpley, Raymond J; Hillmann, Daniel J; George, John C; Zeh, Judith E; Suydam, Robert S

2016-06-01

Gross morphology and morphometry of the bowhead whale ovary, including ovulatory corpora, were investigated in 50 whales from the Chukchi and Beaufort seas off the coast of Alaska. Using the presence of ovarian corpora to define sexual maturity, 23 sexually immature whales (7.6-14.2 m total body length) and 27 sexually mature whales (14.2-17.7 m total body length) were identified. Ovary pair weights ranged from 0.38 to 2.45 kg and 2.92 to 12.02 kg for sexually immature and sexually mature whales, respectively. In sexually mature whales, corpora lutea (CLs) and/or large corpora albicantia (CAs) projected beyond ovary surfaces. CAs became increasingly less interruptive of the surface contour as they regressed, while remaining identifiable within transverse sections of the ovarian cortex. CLs formed large globular bodies, often with a central lumen, featuring golden parenchymas enfolded within radiating fibrous cords. CAs, sometimes vesicular, featured a dense fibrous core with outward fibrous projections through the former luteal tissue. CLs (never more than one per ovary pair) ranged from 6.7 to 15.0 cm in diameter in 13 whales. Fetuses were confirmed in nine of the 13 whales, with the associated CLs ranging from 8.3 to 15.0 cm in diameter. CLs from four whales where a fetus was not detected ranged from 6.7 to 10.6 cm in diameter. CA totals ranged from 0 to 22 for any single ovary, and from 1 to 41 for an ovary pair. CAs measured from 0.3 to 6.3 cm in diameter, and smaller corpora were more numerous, suggesting an accumulating record of ovulation. Neither the left nor the right ovary dominated in the production of corpora. Anat Rec, 299:769-797, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
1970 MLA Abstracts of Articles in Scholarly Journals, Volume I: General, English, American, Medieval and Neo-Latin, Celtic Literatures; and Folklore.

ERIC Educational Resources Information Center

Fisher, John H., Comp.; Achtert, Walter S., Comp.

The first volume of an annual series following the arrangement of the "MLA International Bibliography" includes sections on General, English, American, Medieval and Neo-Latin, Celtic literatures, and Folklore. A classified collection of 1,744 brief abstracts of journalarticles on the modern languages and literatures to be used in conjunction with…
Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions

NASA Astrophysics Data System (ADS)

Zhang, Xiao; Mitra, Prasenjit; Klippel, Alexander; Maceachren, Alan

Researchers from the cognitive and spatial sciences are studying text descriptions of movement patterns in order to examine how humans communicate and understand spatial information. In particular, route directions offer a rich source of information on how cognitive systems conceptualize movement patterns by segmenting them into meaningful parts. Route directions are composed using a plethora of cognitive spatial organization principles: changing levels of granularity, hierarchical organization, incorporation of cognitively and perceptually salient elements, and so forth. Identifying such information in text documents automatically is crucial for enabling machine-understanding of human spatial language. The benefits are: a) creating opportunities for large-scale studies of human linguistic behavior; b) extracting and georeferencing salient entities (landmarks) that are used by human route direction providers; c) developing methods to translate route directions to sketches and maps; and d) enabling queries on large corpora of crawled/analyzed movement data. In this paper, we introduce our approach and implementations that bring us closer to the goal of automatically processing linguistic route directions. We report on research directed at one part of the larger problem, that is, extracting the three most critical parts of route directions and movement patterns in general: origin, destination, and route parts. We use machine-learning based algorithms to extract these parts of routes, including, for example, destination names and types. We prove the effectiveness of our approach in several experiments using hand-tagged corpora.
Isolation and structure elucidation of neuropeptides of the AKH/RPCH family in long-horned grasshoppers (Ensifera).

PubMed

Gäde, G

1992-11-01

An identical neuropeptide was isolated by reversed-phase high-performance liquid chromatography from the corpora cardiaca of the king cricket, Libanasidus vittatus, and the two armoured ground crickets, Heterodes namaqua and Acanthoproctus cervinus. The crude gland extracts had adipokinetic activity in migratory locusts, hypertrehalosaemic activity in American cockroaches and a slight hypertrehalosaemic, but no adipokinetic, effect in armoured ground crickets. The primary structure of this neuropeptide was determined by pulsed-liquid phase sequencing employing Edman chemistry after enzymically deblocking the N-terminal 5-oxopyrrolidine-2-carboxylic acid residue. The C-terminus was also blocked, as indicated by the lack of digestion by carboxypeptidase A. The peptide was assigned the structure [symbol: see text]Glu-Leu-Asn-Phe-Ser-Thr-Gly-TrpNH2, previously designated Scg-AKH-II. The corpora cardiaca of the cricket Gryllodes sigillatus contained a neuropeptide which differed in retention time from the one isolated from the king and armoured ground crickets. The structure was assigned as [symbol: see text]Glu-Val-Asn-Phe-Ser-Thr-Gly-TrpNH2, previously designated Grb-AKH. This octapeptide caused hyperlipaemia in its donor species. The presence of the same peptide, Scg-AKH-II, in the two primitive infraorders of Ensifera, and the different peptide, Grb-AKH, in the most advanced infraorder of Ensifera, supports the evolutionary trends assigned formerly from morphological and physiological evidence.
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC

PubMed Central

Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich

2015-01-01

Objective To create a multilingual gold-standard corpus for biomedical concept recognition. Materials and methods We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. Results The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. Discussion The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. Conclusion To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. PMID:25948699
A comparison of the chemical constituents of Barbadian medicinal plants within their respective plant families with established drug compounds and phytochemicals used to treat communicable and non-communicable diseases.

PubMed

Cohall, D; Carrington, S

2012-01-01

Barbados has a strong base in the practice of folklore botanical medicines. Consistent with the rest of the Caribbean region, the practice is criticized due to lack of evidence on the efficacy and safety testing. The objectives of this review article are i) to categorize and identify plants by their possible indications and their scientific classification and ii) to determine if the chemical constituents of the plants will be able to provide some insight into their possible uses in folklore medicine based on existing scientific research on their chemical constituents and also by their classification. A review of the folklore botanical medicines of Barbados was done. Plants were primarily grouped based on their use to treat particular communicable and non-communicable diseases. Plants were then secondarily grouped based on their families. The chemical profiles of the plants were then compared to established drug compounds currently approved for the conventional treatment of illnesses and also to established phytochemicals. The extensive literature review identified phytochemical compounds in particular plants used in Barbadian folklore medicine. Sixty-six per cent of reputed medicinal plants contain pharmacologically active phytochemicals; fifty-one per cent of these medicinal plants contain phytochemicals with activities consistent with their reported use. Folklore botanical medicine is well grounded on investigation of the scientific rationale. The research showed that fifty-one per cent of the identified medicinal plants have chemical compounds which have been identified to be responsible for its associated medicinal activity. To a lesser extent, approved drug compounds from drug regulatory bodies with similar chemical structure to the bioactive compounds in the plants proved to validate the use of some of these plants to treat illnesses.

Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text.

PubMed

Carrell, David; Malin, Bradley; Aberdeen, John; Bayer, Samuel; Clark, Cheryl; Wellner, Ben; Hirschman, Lynette

2013-01-01

Secondary use of clinical text is impeded by a lack of highly effective, low-cost de-identification methods. Both, manual and automated methods for removing protected health information, are known to leave behind residual identifiers. The authors propose a novel approach for addressing the residual identifier problem based on the theory of Hiding In Plain Sight (HIPS). HIPS relies on obfuscation to conceal residual identifiers. According to this theory, replacing the detected identifiers with realistic but synthetic surrogates should collectively render the few 'leaked' identifiers difficult to distinguish from the synthetic surrogates. The authors conducted a pilot study to test this theory on clinical narrative, de-identified by an automated system. Test corpora included 31 oncology and 50 family practice progress notes read by two trained chart abstractors and an informaticist. Experimental results suggest approximately 90% of residual identifiers can be effectively concealed by the HIPS approach in text containing average and high densities of personal identifying information. This pilot test suggests HIPS is feasible, but requires further evaluation. The results need to be replicated on larger corpora of diverse origin under a range of detection scenarios. Error analyses also suggest areas where surrogate generation techniques can be refined to improve efficacy. If these results generalize to existing high-performing de-identification systems with recall rates of 94-98%, HIPS could increase the effective de-identification rates of these systems to levels above 99% without further advancements in system recall. Additional and more rigorous assessment of the HIPS approach is warranted.
The Impact of Misspelled Words on Automated Computer Scoring: A Case Study of Scientific Explanations

NASA Astrophysics Data System (ADS)

Ha, Minsu; Nehm, Ross H.

2016-06-01

Automated computerized scoring systems (ACSSs) are being increasingly used to analyze text in many educational settings. Nevertheless, the impact of misspelled words (MSW) on scoring accuracy remains to be investigated in many domains, particularly jargon-rich disciplines such as the life sciences. Empirical studies confirm that MSW are a pervasive feature of human-generated text and that despite improvements, spell-check and auto-replace programs continue to be characterized by significant errors. Our study explored four research questions relating to MSW and text-based computer assessments: (1) Do English language learners (ELLs) produce equivalent magnitudes and types of spelling errors as non-ELLs? (2) To what degree do MSW impact concept-specific computer scoring rules? (3) What impact do MSW have on computer scoring accuracy? and (4) Are MSW more likely to impact false-positive or false-negative feedback to students? We found that although ELLs produced twice as many MSW as non-ELLs, MSW were relatively uncommon in our corpora. The MSW in the corpora were found to be important features of the computer scoring models. Although MSW did not significantly or meaningfully impact computer scoring efficacy across nine different computer scoring models, MSW had a greater impact on the scoring algorithms for naïve ideas than key concepts. Linguistic and concept redundancy in student responses explains the weak connection between MSW and scoring accuracy. Lastly, we found that MSW tend to have a greater impact on false-positive feedback. We discuss the implications of these findings for the development of next-generation science assessments.
Approaching the Linguistic Complexity

NASA Astrophysics Data System (ADS)

Drożdż, Stanisław; Kwapień, Jarosław; Orczyk, Adam

We analyze the rank-frequency distributions of words in selected English and Polish texts. We compare scaling properties of these distributions in both languages. We also study a few small corpora of Polish literary texts and find that for a corpus consisting of texts written by different authors the basic scaling regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scaling regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, based on the British National Corpus, we consider the rank-frequency distributions of the grammatically basic forms of words (lemmas) tagged with their proper part of speech. We find that these distributions do not scale if each part of speech is analyzed separately. The only part of speech that independently develops a trace of scaling is verbs.
1970 MLA International Bibliography of Books and Articles on the Modern Languages and Literatures, Volume I: General, English, American, Medieval and Neo-Latin, Celtic Literatures; and Folklore.

ERIC Educational Resources Information Center

Meserole, Harrison T., Comp.

Volume 1 of the four-volume, international bibliography contains over 11,140 entries referring to books, Festschriften, analyzed collections, and articles which focus on General, English, American, medieval and neo-Latin, and Celtic literatures. A section of folklore is also included. The section on general literature includes: (1) aesthetics, (2)…
Selections from Aleut Folklore: The Aleuts of the Eighteenth Century, Social Studies Unit, Book V.

ERIC Educational Resources Information Center

Partnow, Patricia H., Ed.

Three short stories and one song are to be used as part of the social studies unit, The Aleuts of the Eighteenth Century, to give an idea of the nature and uses of Aleutian folklore. Most have simple plots and teach something about proper behavior. "A Sparrow Story," told by an Aleut woman who lived in Unalaska, relates the story of an…
"He Says You're Going To Play the Giant": Ethnographic Perspectives on a Cambodian Arts Class in Philadelphia. Philadelphia Folklore Project Working Papers #8.

ERIC Educational Resources Information Center

Westerman, William

This project began when the Philadelphia Folklore Project (PFP) initiated a residency partnership with the Samuel S. Fleisher Art Memorial in traditional Cambodian arts. The PFP anticipated raising issues that might help in the understanding of the cultural dynamics and elements that were likely to shape and effect the residency. The PFP imagined…
Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources

PubMed Central

2013-01-01

Motivation The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e. corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance. In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs. Results In general, machine learning approaches (ML-Tag) for PGN tagging show higher F1-measure performance against the BioCreative-II and Jnlpba GSCs (exact matching), whereas the lexicon based approaches (LexTag) in combination with disambiguation methods show better results on FsuPrge and PennBio. The ML-Tag solutions balance precision and recall, whereas the LexTag solutions have different precision and recall profiles at the same F1-measure across all corpora. Higher recall is achieved with larger lexical resources, which also introduce more noise (false positive results). The ML-Tag solutions certainly perform best, if the test corpus is from the same GSC as the training corpus. As expected, the false negative errors characterize the test corpora and – on the other hand – the profiles of the false positive mistakes characterize the tagging solutions. Lex-Tag solutions that are based on a large terminological resource in combination with false positive filtering produce better results, which, in addition, provide concept identifiers from a knowledge source in contrast to ML-Tag solutions. Conclusion The standard ML-Tag solutions achieve high performance, but not across all corpora, and thus should be trained using several different corpora to reduce possible biases. The LexTag solutions have different profiles for their precision and recall performance, but with similar F1-measure. This result is surprising and suggests that they cover a portion of the most common naming standards, but cope differently with the term variability across the corpora. The false positive filtering applied to LexTag solutions does improve the results by increasing their precision without compromising significantly their recall. The harmonisation of the annotation schemes in combination with standardized lexical resources in the tagging solutions will enable their comparability and will pave the way for a shared standard. PMID:24112383
A randomized controlled trial of Turkish folklore dance on the physical performance, balance, depression and quality of life in older women.

PubMed

Eyigor, Sibel; Karapolat, Hale; Durmaz, Berrin; Ibisoglu, Ugur; Cakir, Serap

2009-01-01

The present study has been carried out to investigate the effects of group-based Turkish folkloric dances on physical performance, balance, depression and quality of life (QoL) in 40 healthy adult elderly females over the age of 65 years. Subjects were randomly allocated into Group 1 (folkloric dance-based exercise) and Group 2 (control). A 8-week dance-based exercise program was performed. Outcome measures included a 20-m walk test, a 6-min walk test, stair climbing and chair rise time, Berg balance scale (BBS), the Medical Outcomes Study (MOS) 36-item short form health survey (SF-36), and geriatric depression scale (GDS) questionnaires. In Group 1 statistically significant improvements were found in most of the physical performance tests, BBS and some SF-36 subscales after the exercise (p<0.05). In the Group 2 there was no clinically significant change in the variables. Comparing the groups, significant improvements in favor of Group 1 have emerged in most of the functional performance tests, in some of the SF-36 subscales and BBS score (p<0.05). We achieved improvements in physical performance, balance and QoL in elderly females. Application of folkloric dance specific to countries as an exercise program for elderly people may be helpful.
Folklore information from Assam for family planning and birth control.

PubMed

Tiwari, K C; Majumder, R; Bhattacharjee, S

1982-11-01

The author collected folklore information on herbal treatments to control fertility from different parts of Assam, India. Temporary methods of birth control include Cissampelos pareira L. in combination with Piper nigrum L., root of Mimosa pudica L. and Hibiscus rosa-sinensis L. Plants used for permanent sterilization include Plumbago zeylanica L., Heliotropium indicum L., Salmalia malabrica, Hibiscus rosa-sinensis L., Plumeria rubra L., Bambusa rundinacea. Abortion is achieved through use of Osbeckia nepalensis or Carica papaya L. in combination with resin from Ferula narthex Boiss. It is concluded that there is tremendous scope for the collection of folklore about medicine, family planning agents, and other treatments from Assam and surrounding areas. Such a project requires proper understanding between the survey team and local people, tactful behavior, and a significant amount of time. Monetary rewards can also be helpful for obtaining information from potential respondents.
Feature-level sentiment analysis by using comparative domain corpora

NASA Astrophysics Data System (ADS)

Quan, Changqin; Ren, Fuji

2016-06-01

Feature-level sentiment analysis (SA) is able to provide more fine-grained SA on certain opinion targets and has a wider range of applications on E-business. This study proposes an approach based on comparative domain corpora for feature-level SA. The proposed approach makes use of word associations for domain-specific feature extraction. First, we assign a similarity score for each candidate feature to denote its similarity extent to a domain. Then we identify domain features based on their similarity scores on different comparative domain corpora. After that, dependency grammar and a general sentiment lexicon are applied to extract and expand feature-oriented opinion words. Lastly, the semantic orientation of a domain-specific feature is determined based on the feature-oriented opinion lexicons. In evaluation, we compare the proposed method with several state-of-the-art methods (including unsupervised and semi-supervised) using a standard product review test collection. The experimental results demonstrate the effectiveness of using comparative domain corpora.
[Erectile function and ablative surgery of penile tumors].

PubMed

Pisani, E; Austoni, E; Trinchieri, A; Ceresoli, A; Mantovani, F; Colombo, F; Mastromarino, G; Vecchio, D; Canclini, L; Fenice, O

1994-02-01

The Authors try to show the possibility to combine radical excision with minimal invasiveness in the surgery of penile cancer. The focal point of every therapeutic decision is correct clinical staging. Unfortunately there's some confusion in the two international staging systems (TNM and Jackson's classification). In fact it's not clear the anatomical difference between epithelioma of the glans infiltrating corpus spongiosum and subcoronary epithelioma of the shaft infiltrating the corpora cavernosa. It's obvious that the infiltration of the corpora cavernosa is a far more aggressive oncological manifestation than that of tumour infiltrating the corpus spongiosum. So we consider Jackson's classification more congenial. In terms of surgery this anatomical independence makes it easy to consider the corpora cavernosa as a distinct entity, so they remain perfectly functional when separated from the glandulo-spongio-urethral unit with its vasculo-nervous bundle. This makes conservation of the erectile function, when clinical staging show us that the tumour is not infiltrating the corpora cavernosa. The Authors show their results, which seem to be rather good.
Hemodynamics of erection in man

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shirai, M.; Ishii, N.

1981-02-01

Inquiry was made into the theory that closure of the efferent vein from the corpora cavernosa is essential for erection of the human penis. To determine whether the venous closure is indeed a prerequisite to human penile erection, two tests were carried out in men: (1) direct infusion in 133Xe into corpora cavernosa and (2) performance of carvernosography. In each case, penile erection was induced by providing the subject with sexual stimulation. The behavioral changes were studied through the 133Xe clearance curve and the contrast medium, respectively. When the penis remained flaccid, the 133Xe clearance curve followed a gentle pathmore » and the contrast medium could be noted within the penis for a relatively long period. However, on erection with sexual stimulation, the 133Xe clearance curve fell rapidly instead of following the gentle course expected in the case of venous closure. Also, the contrast medium quickly flowed out of the corpora cavernosa. The human penis therefore can well erect without closure of the efferent vein from the corpora cavernosa.« less
Mining Consumer Health Vocabulary from Community-Generated Text

PubMed Central

Vydiswaran, V.G. Vinod; Mei, Qiaozhu; Hanauer, David A.; Zheng, Kai

2014-01-01

Community-generated text corpora can be a valuable resource to extract consumer health vocabulary (CHV) and link them to professional terminologies and alternative variants. In this research, we propose a pattern-based text-mining approach to identify pairs of CHV and professional terms from Wikipedia, a large text corpus created and maintained by the community. A novel measure, leveraging the ratio of frequency of occurrence, was used to differentiate consumer terms from professional terms. We empirically evaluated the applicability of this approach using a large data sample consisting of MedLine abstracts and all posts from an online health forum, MedHelp. The results show that the proposed approach is able to identify synonymous pairs and label the terms as either consumer or professional term with high accuracy. We conclude that the proposed approach provides great potential to produce a high quality CHV to improve the performance of computational applications in processing consumer-generated health text. PMID:25954426
"[N]ot subject to our sense” : Margaret Cavendish's fusion of Renaissance science, magic and fairy lore.

PubMed

Walters, Lisa

2010-01-01

This article explores Margaret Cavendish's depictions of alchemy, witchcraft and fairy lore in her scientific treatise Philosophical Letters and in fictional texts from Natures Pictures and Poems and Fancies. Though Cavendish was a dedicated materialist, she appropriates theories of magic from early modern science and folklore into her materialist epistemology. As Cavendish draws upon a fusion of early modern conceptions of magic, she creates a radical theory of matter which not only challenges patriarchy and binary oppositions, but also explores the plurality and mystery that can exist within an infinitely complex material world.
Incorporating linguistic knowledge for learning distributed word representations.

PubMed

Wang, Yan; Liu, Zhiyuan; Sun, Maosong

2015-01-01

Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining.
Incorporating Linguistic Knowledge for Learning Distributed Word Representations

PubMed Central

Wang, Yan; Liu, Zhiyuan; Sun, Maosong

2015-01-01

Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining. PMID:25874581
Combining Language Corpora with Experimental and Computational Approaches for Language Acquisition Research

ERIC Educational Resources Information Center

Monaghan, Padraic; Rowland, Caroline F.

2017-01-01

Historically, first language acquisition research was a painstaking process of observation, requiring the laborious hand coding of children's linguistic productions, followed by the generation of abstract theoretical proposals for how the developmental process unfolds. Recently, the ability to collect large-scale corpora of children's language…
Automatic Construction of English/Chinese Parallel Corpora.

ERIC Educational Resources Information Center

Yang, Christopher C.; Li, Kar Wing

2003-01-01

Discussion of multilingual corpora and cross-lingual information retrieval focuses on research that constructed English/Chinese parallel corpus automatically from the World Wide Web. Presents an alignment method which is based on dynamic programming to identify one-to-one Chinese and English title pairs and discusses results of experiments…
Using Corpora in EFL Classrooms: The Case Study of IELTS Preparation

ERIC Educational Resources Information Center

Smirnova, Elizaveta A.

2017-01-01

This article describes the gathered experience in using corpora in an IELTS preparation course. The practice demonstrates an attempt to reduce negative washback effects occurring when preparation courses just concentrate on the test format neglecting the importance of development of learners' language skills and general study skills. Some…
Linguistic Corpora and Lexicography.

ERIC Educational Resources Information Center

Meijs, Willem

1996-01-01

Overviews the development of corpus linguistics, reviews the use of corpora in modern lexicography, and presents central issues in ongoing work aimed at broadening the scope of lexicographical use of corpus data. Focuses on how the field has developed in relation to the production of new monolingual English dictionaries by major British…

Corpora in Language Teaching and Learning

ERIC Educational Resources Information Center

Boulton, Alex

2017-01-01

This timeline looks at explicit uses of corpora in foreign or second language (L2) teaching and learning, i.e. what happens when end-users explore corpus data, whether directly via concordancers or integrated into CALL programs, or indirectly with prepared printed materials. The underlying rationale is that such contact provides the massive…
Nora: A Vocabulary Discovery Tool for Concept Extraction.

PubMed

Divita, Guy; Carter, Marjorie E; Durgahee, B S Begum; Pettey, Warren E; Redd, Andrew; Samore, Matthew H; Gundlapalli, Adi V

2015-01-01

Coverage of terms in domain-specific terminologies and ontologies is often limited in controlled medical vocabularies. Creating and augmenting such terminologies is resource intensive. We developed Nora as an interactive tool to discover terminology from text corpora; the output can then be employed to refine and enhance natural language processing-based concept extraction tasks. Nora provides a visualization of chains of words foraged from word frequency indexes from a text corpus. Domain experts direct and curate chains that contain relevant terms, which are further curated to identify lexical variants. A test of Nora demonstrated an increase of a domain lexicon in homelessness and related psychosocial factors by 38%, yielding an additional 10% extracted concepts.
The CHEMDNER corpus of chemicals and drugs and its annotation principles.

PubMed

Krallinger, Martin; Rabal, Obdulia; Leitner, Florian; Vazquez, Miguel; Salgado, David; Lu, Zhiyong; Leaman, Robert; Lu, Yanan; Ji, Donghong; Lowe, Daniel M; Sayle, Roger A; Batista-Navarro, Riza Theresa; Rak, Rafal; Huber, Torsten; Rocktäschel, Tim; Matos, Sérgio; Campos, David; Tang, Buzhou; Xu, Hua; Munkhdalai, Tsendsuren; Ryu, Keun Ho; Ramanan, S V; Nathan, Senthil; Žitnik, Slavko; Bajec, Marko; Weber, Lutz; Irmer, Matthias; Akhondi, Saber A; Kors, Jan A; Xu, Shuo; An, Xin; Sikdar, Utpal Kumar; Ekbal, Asif; Yoshioka, Masaharu; Dieb, Thaer M; Choi, Miji; Verspoor, Karin; Khabsa, Madian; Giles, C Lee; Liu, Hongfang; Ravikumar, Komandur Elayavilli; Lamurias, Andre; Couto, Francisco M; Dai, Hong-Jie; Tsai, Richard Tzong-Han; Ata, Caglar; Can, Tolga; Usié, Anabel; Alves, Rui; Segura-Bedmar, Isabel; Martínez, Paloma; Oyarzabal, Julen; Valencia, Alfonso

2015-01-01

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.
The CHEMDNER corpus of chemicals and drugs and its annotation principles

PubMed Central

2015-01-01

The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/ PMID:25810773
Deep learning with word embeddings improves biomedical named entity recognition.

PubMed

Habibi, Maryam; Weber, Leon; Neves, Mariana; Wiegandt, David Luis; Leser, Ulf

2017-07-15

Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/ . habibima@informatik.hu-berlin.de. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Jointly learning word embeddings using a corpus and a knowledge base

PubMed Central

Bollegala, Danushka; Maehara, Takanori; Kawarabayashi, Ken-ichi

2018-01-01

Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks. PMID:29529052
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.

PubMed

Kors, Jan A; Clematide, Simon; Akhondi, Saber A; van Mulligen, Erik M; Rebholz-Schuhmann, Dietrich

2015-09-01

To create a multilingual gold-standard corpus for biomedical concept recognition. We selected text units from different parallel corpora (Medline abstract titles, drug labels, biomedical patent claims) in English, French, German, Spanish, and Dutch. Three annotators per language independently annotated the biomedical concepts, based on a subset of the Unified Medical Language System and covering a wide range of semantic groups. To reduce the annotation workload, automatically generated preannotations were provided. Individual annotations were automatically harmonized and then adjudicated, and cross-language consistency checks were carried out to arrive at the final annotations. The number of final annotations was 5530. Inter-annotator agreement scores indicate good agreement (median F-score 0.79), and are similar to those between individual annotators and the gold standard. The automatically generated harmonized annotation set for each language performed equally well as the best annotator for that language. The use of automatic preannotations, harmonized annotations, and parallel corpora helped to keep the manual annotation efforts manageable. The inter-annotator agreement scores provide a reference standard for gauging the performance of automatic annotation techniques. To our knowledge, this is the first gold-standard corpus for biomedical concept recognition in languages other than English. Other distinguishing features are the wide variety of semantic groups that are being covered, and the diversity of text genres that were annotated. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Deep learning with word embeddings improves biomedical named entity recognition

PubMed Central

Habibi, Maryam; Weber, Leon; Neves, Mariana; Wiegandt, David Luis; Leser, Ulf

2017-01-01

Abstract Motivation: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. Results: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. Availability and implementation: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/. Contact: habibima@informatik.hu-berlin.de PMID:28881963
Dangers of the vagina.

PubMed

Beit-Hallahmi, B

1985-12-01

Beliefs, myths, and literary expressions of men's fear of female genitals are reviewed. Both clinical evidence and folklore provide evidence that men imagine female genitals not only as a source of pleasure and attraction, but also as a source of danger in a very physical sense. The vagina dentata myth has many versions, including some modern ones, and its message is always the same: an awesome danger emanating from a woman's body. The prevalence of such feelings in folklore and in literature is noted.
“Early baby teeth”: Folklore and facts

PubMed Central

Maheswari, N. Uma; Kumar, B. P.; Karunakaran; Kumaran, S. Thanga

2012-01-01

Variations in the newborns’ oral cavity have been an enduring interest to the pediatric dentist. The occurrence of natal and neonatal teeth is a rare anomaly, which for centuries has been associated with diverse superstitions among many different ethnic groups. Natal teeth are more frequent than neonatal teeth, the ratio being approximately 3:1. The purpose of this case report is to review the literature related to the natal teeth folklore and misconceptions and discuss their possible etiology and treatment. PMID:23066283
Annotation of Korean Learner Corpora for Particle Error Detection

ERIC Educational Resources Information Center

Lee, Sun-Hee; Jang, Seok Bae; Seo, Sang-Kyu

2009-01-01

In this study, we focus on particle errors and discuss an annotation scheme for Korean learner corpora that can be used to extract heuristic patterns of particle errors efficiently. We investigate different properties of particle errors so that they can be later used to identify learner errors automatically, and we provide resourceful annotation…
Some Benefits of Corpora as a Language Learning Tool

ERIC Educational Resources Information Center

Marjanovic, Tatjana

2012-01-01

What this paper is meant to do is share illustrations and insights into how English learners and teachers alike can benefit from using corpora in their work. Arguments are made for their multifaceted possibilities as grammatical, lexical and discourse pools suitable for discovering ways of the language, be they regularities or idiosyncrasies. The…
Learning in Parallel: Using Parallel Corpora to Enhance Written Language Acquisition at the Beginning Level

ERIC Educational Resources Information Center

Bluemel, Brody

2014-01-01

This article illustrates the pedagogical value of incorporating parallel corpora in foreign language education. It explores the development of a Chinese/English parallel corpus designed specifically for pedagogical application. The corpus tool was created to aid language learners in reading comprehension and writing development by making foreign…
Corpora Processing and Computational Scaffolding for a Web-Based English Learning Environment: The CANDLE Project

ERIC Educational Resources Information Center

Liou, Hsien-Chin; Chang, Jason S; Chen, Hao-Jan; Lin, Chih-Cheng; Liaw, Meei-Ling; Gao, Zhao-Ming; Jang, Jyh-Shing Roger; Yeh, Yuli; Chuang, Thomas C.; You, Geeng-Neng

2006-01-01

This paper describes the development of an innovative web-based environment for English language learning with advanced data-driven and statistical approaches. The project uses various corpora, including a Chinese-English parallel corpus ("Sinorama") and various natural language processing (NLP) tools to construct effective English…
Evaluating Bilingual and Monolingual Dictionaries for L2 Learners.

ERIC Educational Resources Information Center

Hunt, Alan

1997-01-01

A discussion of dictionaries and their use for second language (L2) learning suggests that lack of computerized modern language corpora can adversely affect bilingual dictionaries, commonly used by L2 learners, and shows how use of such corpora has benefitted two contemporary monolingual L2 learner dictionaries (1995 editions of the Longman…
NASA's online machine aided indexing system

NASA Technical Reports Server (NTRS)

Silvester, June P.; Genuardi, Michael T.; Klingbiel, Paul H.

1993-01-01

This report describes the NASA Lexical Dictionary, a machine aided indexing system used online at the National Aeronautics and Space Administration's Center for Aerospace Information (CASI). This system is comprised of a text processor that is based on the computational, non-syntactic analysis of input text, and an extensive 'knowledge base' that serves to recognize and translate text-extracted concepts. The structure and function of the various NLD system components are described in detail. Methods used for the development of the knowledge base are discussed. Particular attention is given to a statistically-based text analysis program that provides the knowledge base developer with a list of concept-specific phrases extracted from large textual corpora. Production and quality benefits resulting from the integration of machine aided indexing at CASI are discussed along with a number of secondary applications of NLD-derived systems including on-line spell checking and machine aided lexicography.
Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments.

PubMed

Keuleers, Emmanuel; Balota, David A

2015-01-01

This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.
Applications of Latent Variable Models in Modeling Influence and Decision Making

DTIC Science & Technology

2013-04-01

by normalization φw,k ← φw,kP K φn,k . 3.3 Empirical study We studied the DIM with four text corpora: three collections of scientific articles and a...both provided funds for travel to NIPS 2011. School Foremost, I owe my advisor, David Blei, many thanks for his mentorship and support for the past four ...and former graduate students in my research lab and our broader research group , who have helped me in this program in various ways. They have served
Exploring Learner Language through Corpora: Comparing and Interpreting Corpus Frequency Information

ERIC Educational Resources Information Center

Gablasova, Dana; Brezina, Vaclav; McEnery, Tony

2017-01-01

This article contributes to the debate about the appropriate use of corpus data in language learning research. It focuses on frequencies of linguistic features in language use and their comparison across corpora. The majority of corpus-based second language acquisition studies employ a comparative design in which either one or more second language…
Is There a Core General Vocabulary? Introducing the "New General Service List"

ERIC Educational Resources Information Center

Brezina, Vaclav; Gablasova, Dana

2015-01-01

The current study presents a "New General Service List (new-GSL)", which is a result of robust comparison of four language corpora ("LOB," "BNC," "BE06," and "EnTenTen12") of the total size of over 12 billion running words. The four corpora were selected to represent a variety of corpus sizes and…

Lexical Awareness and Development through Data Driven Learning: Attitudes and Beliefs of EFL Learners

ERIC Educational Resources Information Center

Asik, Asuman; Vural, Arzu Sarlanoglu; Akpinar, Kadriye Dilek

2016-01-01

Data-driven learning (DDL) has become an innovative approach developed from corpus linguistics. It plays a significant role in the progression of foreign language pedagogy, since it offers learners plentiful authentic corpora examples that make them analyze language rules with the help of online corpora and concordancers. The present study…
Application of Learner Corpora to Second Language Learning and Teaching: An Overview

ERIC Educational Resources Information Center

Xu, Qi

2016-01-01

The paper gives an overview of learner corpora and their application to second language learning and teaching. It is proposed that there are four core components in learner corpus research, namely, corpus linguistics expertise, a good background in linguistic theory, knowledge of SLA theory, and a good understanding of foreign language teaching…
Training L2 Writers to Reference Corpora as a Self-Correction Tool

ERIC Educational Resources Information Center

Quinn, Cynthia

2015-01-01

Corpora have the potential to support the L2 writing process at the discourse level in contrast to the isolated dictionary entries that many intermediate writers rely on. To take advantage of this resource, learners need to be trained, which involves practising corpus research and referencing skills as well as learning to make data-based…
Speeding up parallel processing

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1988-01-01

In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
[Single and combining effects of Calculus Bovis and zolpidem on inhibitive neurotransmitter of rat striatum corpora].

PubMed

Liu, Ping; He, Xinrong; Guo, Mei

2010-04-01

To investigate the correlation effects between single or combined administration of Calculus Bovis or zolpidem and changes of inhibitive neurotransmitter in rat striatum corpora. Sampling from rat striatum corpora was carried out through microdialysis. The content of two inhibitive neurotransmitters in rat corpus striatum- glycine (Gly) and gama aminobutyric acid (GABA), was determined by HPLC, which involved pre-column derivation with orthophthaladehyde, reversed-phase gradient elution and fluorescence detection. GABA content of rat striatum corpora in Calculus Bovis group was significantly increased compared with saline group (P < 0.01). GABA content of zolpidem group and Calculus Boris plus zolpidem group were increased largely compared with saline group as well (P < 0.05). GABA content of Calculus Bovis group was higher than combination group (P < 0.05). GABA content of zolpidem group was not significantly different from combination group. Gly content of Calculus Bovis or zolpidem group was markedly increased compared with saline group or combination group (P < 0.05). Contents of two inhibitive neurotransmitters in rat striatum corpora were all significantly increased in Calculus Bovis group, zolpidem group and combination group. The magnitude of increase was lower in combination group than in Calculus Bovis group and Zolpidem group, suggesting that Calculus Bovis promoted encephalon inhibition is more powerful than zolpidem. The increase in two inhibitive neurotransmitters did not show reinforcing effect in combination group, suggesting that Calculus Bovis and zolpidem may compete the same receptors. Therefore, combination of Calculus Bovis containing drugs and zolpidem has no clinical significance. Calculus Bovis shouldn't as an aperture-opening drugs be used for resuscitation therapy.
Stretched Verb Collocations with "Give": Their Use and Translation into Spanish Using the BNC and CREA Corpora

ERIC Educational Resources Information Center

Molina-Plaza, Silvia; de Gregorio-Godeo, Eduardo

2010-01-01

Within the context of on-going research, this paper explores the pedagogical implications of contrastive analyses of multiword units in English and Spanish based on electronic corpora as a CALL resource. The main tenets of collocations from a contrastive perspective--and the points of contact and departure between both languages--are discussed…
The Effect of the Integration of Corpora in Reading Comprehension Classrooms on English as a Foreign Language Learners' Vocabulary Development

ERIC Educational Resources Information Center

Gordani, Yahya

2013-01-01

This study used a randomized pretest-posttest control group design to examine the effect of the integration of corpora in general English courses on the students' vocabulary development. To enhance the learners' lexical repertoire and thereby improve their reading comprehension, an online corpus-based approach was integrated into 42 hours of…
MAXIECPC: Theoretical Background and Descriptive Research on General Statistics, Frequency Words and Keywords

ERIC Educational Resources Information Center

Calzada Pérez, María

2013-01-01

The present paper revolves around MaxiECPC, one of the various sub-corpora that make up ECPC (the European Comparable and Parallel Corpora), an electronic archive of speeches delivered at different parliaments (i.e. the European Parliament-EP; the Spanish Congreso de los Diputados-CD; and the British House of Commons-HC) from 1996 to 2009. In…
Task Effects on Linguistic Complexity and Accuracy: A Large-Scale Learner Corpus Analysis Employing Natural Language Processing Techniques

ERIC Educational Resources Information Center

Alexopoulou, Theodora; Michel, Marije; Murakami, Akira; Meurers, Detmar

2017-01-01

Large-scale learner corpora collected from online language learning platforms, such as the EF-Cambridge Open Language Database (EFCAMDAT), provide opportunities to analyze learner data at an unprecedented scale. However, interpreting the learner language in such corpora requires a precise understanding of tasks: How does the prompt and input of a…
Analyzing Idioms and Their Frequency in Three Advanced ILI Textbooks: A Corpus-Based Study

ERIC Educational Resources Information Center

Alavi, Sepideh; Rajabpoor, Aboozar

2015-01-01

The present study aimed at identifying and quantifying the idioms used in three ILI "Advanced" level textbooks based on three different English corpora; MICASE, BNC and the Brown Corpus, and comparing the frequencies of the idioms across the three corpora. The first step of the study involved searching the books to find multi-word…
Social Media and Language Processing: How Facebook and Twitter Provide the Best Frequency Estimates for Studying Word Recognition

ERIC Educational Resources Information Center

Herdagdelen, Amaç; Marelli, Marco

2017-01-01

Corpus-based word frequencies are one of the most important predictors in language processing tasks. Frequencies based on conversational corpora (such as movie subtitles) are shown to better capture the variance in lexical decision tasks compared to traditional corpora. In this study, we show that frequencies computed from social media are…
The Application of Corpora in Teaching Grammar: The Case of English Relative Clause

ERIC Educational Resources Information Center

Sahragard, Rahman; Kushki, Ali; Ansaripour, Ehsan

2013-01-01

The study was conducted to see if the provision of implementing corpora on English relative clauses would prove useful for Iranian EFL learners or not. Two writing classes were held for the participants of intermediate level. A record of 15 writing samples produced by each participant was kept in the form of a portfolio. Participants' portfolios…
Learners' Writing Skills in French: Corpus Consultation and Learner Evaluation

ERIC Educational Resources Information Center

O'Sullivan, Ide; Chambers, Angela

2006-01-01

While the use of corpora and concordancing in the language-learning environment began as early as 1969 (McEnery & Wilson, 1997, p. 12), it was the work in the 1980s of Tim Johns (1986) and others which brought it to public attention. Important developments occurred in the 1990s, beginning with publications advocating the use of corpora and…
Integrating Learner Corpora and Natural Language Processing: A Crucial Step towards Reconciling Technological Sophistication and Pedagogical Effectiveness

ERIC Educational Resources Information Center

Granger, Sylviane; Kraif, Olivier; Ponton, Claude; Antoniadis, Georges; Zampa, Virginie

2007-01-01

Learner corpora, electronic collections of spoken or written data from foreign language learners, offer unparalleled access to many hitherto uncovered aspects of learner language, particularly in their error-tagged format. This article aims to demonstrate the role that the learner corpus can play in CALL, particularly when used in conjunction with…
Immunocytochemical distribution of locustamyoinhibiting peptide (Lom-MIP) in the nervous system of Locusta migratoria.

PubMed

Schoofs, L; Veelaert, D; Broeck, J V; De Loof, A

1996-07-05

Locustamyoinhibiting peptide (Lom-MIP) is one of the 4 identified myoinhibiting neuropeptides, isolated from brain-corpora cardiaca-corpora allata-suboesophageal ganglion complexes of the locust, Locusta migratoria. An antiserum was raised against Lom-MIP for use in immunohistochemistry. Locustamyoinhibiting peptide-like immunoreactivity (Lom-MIP-LI) was visualized in the nervous system and peripheral organs of Locusta migratoria by means of the peroxidase-antiperoxidase method. A total of 12 specific immunoreactive neurons was found in the brain. Processes of these neurons innervate the protocerebral bridge the central body complex and distinct neuropil areas in the proto- and tritocerebrum but not in the deuterocerebrum nor in the optic lobes. The glandular cells of the corpora cardiaca, known to produce adipokinetic hormones, are contacted by Lom-MIP-LI fibers. The corpora allata were innervated by the nervus corporis allati I containing immunoreactive fibers. Lom-MIP-LI cell bodies were also found in the subesophageal ganglion, the metathoracic ganglion and the abdominal ganglia I-IV. In peripheral muscles, Lom-MIP-LI fibers innervate the heart, the oviduct, and the hindgut. In the salivary glands, Lom-MIP-LI was detected in the intracellular ductule of the parietal cells. Possible functions of Lom-MIP are discussed.
Ovarian response to pregnant kare serum gonadotrophin and prostaglandin F(2) proportional, variant in Africander and Mashona cows.

PubMed

Holness, D H; Hale, D H; McCabe, C T

1980-11-01

Oestrus was synchronised in ten Africander and eight Mashona mature dry cows by two injections of prostaglandin F(2) proportional, variant (PG) 11 days apart. Half the cows of each breed received an injection of 3000 i.u. pregnant mare serum gonadotrophin (PMSG) two days prior to the second PG injection. All cows were observed for the incidence of cestrus, and blood samples were taken at intervals for progesterone assay. Cows were slaughtered 11 days after the second PG injection and their reproductive tracts examined. Treatment with PMSG increased numbers both of corpora lutea and of follicles more than 10 mm in diameter. When numbers of corpora lutea and follicles were considered together, the response to treatment was significant in the Africanders (P<0,01) and markedly greater than that of Kashona cows. The concentration of progesterone in plasma on the day before slaughter was significantly correlated with the mass of corpora lutea (P<0,001), total mass of ovaries (P<0,001), but not with numbers of corpora lutea. It is suggested that generally Africander cows may secrete lower levels of follicle stimulating hormone and oestrogen than kashona cows during normal cyclic sexual activity.
Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation.

PubMed

Ferraro, Jeffrey P; Daumé, Hal; Duvall, Scott L; Chapman, Wendy W; Harkema, Henk; Haug, Peter J

2013-01-01

Natural language processing (NLP) tasks are commonly decomposed into subtasks, chained together to form processing pipelines. The residual error produced in these subtasks propagates, adversely affecting the end objectives. Limited availability of annotated clinical data remains a barrier to reaching state-of-the-art operating characteristics using statistically based NLP tools in the clinical domain. Here we explore the unique linguistic constructions of clinical texts and demonstrate the loss in operating characteristics when out-of-the-box part-of-speech (POS) tagging tools are applied to the clinical domain. We test a domain adaptation approach integrating a novel lexical-generation probability rule used in a transformation-based learner to boost POS performance on clinical narratives. Two target corpora from independent healthcare institutions were constructed from high frequency clinical narratives. Four leading POS taggers with their out-of-the-box models trained from general English and biomedical abstracts were evaluated against these clinical corpora. A high performing domain adaptation method, Easy Adapt, was compared to our newly proposed method ClinAdapt. The evaluated POS taggers drop in accuracy by 8.5-15% when tested on clinical narratives. The highest performing tagger reports an accuracy of 88.6%. Domain adaptation with Easy Adapt reports accuracies of 88.3-91.0% on clinical texts. ClinAdapt reports 93.2-93.9%. ClinAdapt successfully boosts POS tagging performance through domain adaptation requiring a modest amount of annotated clinical data. Improving the performance of critical NLP subtasks is expected to reduce pipeline error propagation leading to better overall results on complex processing tasks.
Inhibition of Cyclic GMP Export by Multidrug Resistance Protein 4: A New Strategy to Treat Erectile Dysfunction?

PubMed

Boydens, Charlotte; Pauwels, Bart; Vanden Daele, Laura; Van de Voorde, Johan

2017-04-01

Intracellular cyclic guanosine monophosphate (cGMP) concentrations are regulated by degradation enzymes (phosphodiesterases) and by active transport across the plasma membrane by multidrug resistance proteins (MRPs) 4 and 5. To evaluate the functional effect of MRP-4 inhibition and the role of MRP-4-mediated cGMP export in mouse corpora cavernosa. Isometric tension of mouse corpora cavernosa was measured after cumulative addition of MK-571, an inhibitor of MRP-4, or sildenafil, a phosphodiesterase type 5 inhibitor. In addition, the effect of MRP-4 inhibition on cGMP-independent and cGMP-dependent relaxations was studied. In vivo intracavernosal pressure and mean arterial pressure measurements were performed after intracavernosal injection of MK-571. The effect of MRP-4 inhibition on cGMP content was determined using an enzyme immunoassay kit. Measurement of the effect of MK-571 on cGMP content, relaxant responses of mouse corpora cavernosa to cGMP-independent and cGMP-dependent vasodilating substances, and determination of the ratio of intracavernosal pressure to mean arterial pressure after intracavernosal injection of MK-571. MK-571 and sildenafil relaxed the corpora cavernosa concentration dependently, with sildenafil being the more potent relaxing compound. Furthermore, MK-571 enhanced relaxing responses to cGMP-dependent substances, such as sodium nitroprusside, sildenafil, acetylcholine, and electrical field stimulation, with the latter even under in vitro diabetic conditions. In contrast, cGMP-independent relaxations were not altered by MRP-4 inhibition. Intracavernosal administration of MK-571 significantly increased intracavernosal pressure, with minimal effect on mean arterial pressure. The cGMP analysis showed that MRP-4 inhibition was accompanied by increased cGMP levels. MRP-4, at least when targeted locally in the penis or when combined with a phosphodiesterase type 5 inhibitor, might be a valuable alternative strategy for the treatment of (diabetic) erectile dysfunction. This study is the first to demonstrate an in vitro direct relaxant and an in vivo pro-erectile effect of the MRP-4 inhibitor, MK-571, on mouse corpora cavernosa. However, the functional effect of MRP-5-mediated export in mouse corpora cavernosa was not explored, which has been suggested to play the predominant role in cGMP export. Inhibition of MRP-4 increases basal and stimulated levels of cGMP, leading to corpora cavernosa relaxation and penile erection. Therefore, in addition to degradation of cGMP, export of cGMP by MRP-4 could contribute substantially to regulating cGMP levels in mouse corpora cavernosa. Boydens C, Pauwels B, Vanden Daele L, Van de Voorde J. Inhibition of Cyclic GMP Export by Multidrug Resistance Protein 4: A New Strategy to Treat Erectile Dysfunction? J Sex Med 2017;14:502-509. Copyright © 2017 International Society for Sexual Medicine. Published by Elsevier Inc. All rights reserved.
On the unsupervised analysis of domain-specific Chinese texts

PubMed Central

Deng, Ke; Bol, Peter K.; Li, Kate J.; Liu, Jun S.

2016-01-01

With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method. PMID:27185919
The Nature and Scope of Student Search Strategies in Using a Web Derived Corpus for Writing

ERIC Educational Resources Information Center

Franken, Margaret

2014-01-01

The use of online language corpora in L2 teaching and learning is gaining momentum largely because corpora are an easily accessed source of language input that potentially provide rich and authentic lexico-grammatical data. This can be of particular use for students' writing as its incorporation can enhance the appearance of native-like fluency.…

Learning from Learners: A Non-Standard Direct Approach to the Teaching of Writing Skills in EFL in a University Context

ERIC Educational Resources Information Center

Fuster-Márquez, Miguel; Gregori-Signes, Carmen

2018-01-01

Corpora have been used in English as a foreign language materials for decades, and native corpora have been present in the classroom by means of direct approaches such as Data-Driven Learning (Johns, T., and P. King 1991. "'Should you be Persuaded'- Two Samples of Data-Driven Learning Materials." In "Classroom Concordancing,"…
Domain Adaptation of Translation Models for Multilingual Applications

DTIC Science & Technology

2009-04-01

expansion effect that corpus (or dictionary ) based trans- lation introduces - however, this effect is maintained even with monolingual query expansion [12...every day; bilingual web pages are harvested as parallel corpora as the quantity of non-English data on the web increases; online dictionaries of...approach is to customize translation models to a domain, by automatically selecting the resources ( dictionaries , parallel corpora) that are best for
From Folklore to Scientific Evidence: Breast-Feeding and Wet-Nursing in Islam and the Case of Non-Puerperal Lactation

PubMed Central

Moran, Lia; Gilad, Jacob

2007-01-01

Breast-feeding practice has an important medical and socio-cultural role. It has many anthropological aspects concerning the “power structures” that find their expression in breast-feeding and the practices that formed around it, both socially, scientifically, and legally-speaking. Breast-feeding has been given much attention by religions and taboos, folklore, and misconception abound around it making it a topic of genuine curiosity. This paper aims at expanding the spectrum of folklore associated with breast-feeding. The paper deals with historical, religious, and folkloristic aspects of breast-feeding, especially wet-nursing, in Islam and focuses on an intriguing Islamic tale on breast-feeding - lactation by non-pregnant women (or non-puerperal lactation). Apparently, accounts of non-puerperal lactation are not restricted to Islam but have been documented in various societies and religions throughout centuries. Two medical situations - hyperprolactinemia and induced lactation, appear as possible explanations for this phenomenon. This serves as an excellent example for the value of utilizing contemporary scientific knowledge in order to elucidate the origin, anthropology and evolvement of ancient myth and superstition. PMID:23675050
Sentiment analysis of political communication: combining a dictionary approach with crowdcoding.

PubMed

Haselmayer, Martin; Jenny, Marcelo

2017-01-01

Sentiment is important in studies of news values, public opinion, negative campaigning or political polarization and an explosive expansion of digital textual data and fast progress in automated text analysis provide vast opportunities for innovative social science research. Unfortunately, tools currently available for automated sentiment analysis are mostly restricted to English texts and require considerable contextual adaption to produce valid results. We present a procedure for collecting fine-grained sentiment scores through crowdcoding to build a negative sentiment dictionary in a language and for a domain of choice. The dictionary enables the analysis of large text corpora that resource-intensive hand-coding struggles to cope with. We calculate the tonality of sentences from dictionary words and we validate these estimates with results from manual coding. The results show that the crowdbased dictionary provides efficient and valid measurement of sentiment. Empirical examples illustrate its use by analyzing the tonality of party statements and media reports.
Development of the penis during the human fetal period (13 to 36 weeks after conception).

PubMed

Gallo, Carla B M; Costa, Waldemar S; Furriel, Angelica; Bastos, Ana L; Sampaio, Francisco J B

2013-11-01

We analyzed the development of the area of the penis and erectile structures (corpora cavernosa and corpus spongiosum) and the thickness of the tunica albuginea during the fetal period (13 to 36 weeks after conception) in humans to establish normative patterns of growth. We studied 56 male human fetuses at 13 to 36 weeks after conception. We used histochemical and morphometric techniques to analyze the parameters of total penile area, area of corpora cavernosa, area of corpus spongiosum, and thickness of tunica albuginea in the dorsal and ventral regions using ImageJ software (National Institutes of Health, Bethesda, Maryland). Between 13 and 36 weeks after conception the area of the penis varies from 0.95 to 24.25 mm2. The area of the corpora cavernosa varies from 0.28 to 9.12 mm2, and the area of the corpus spongiosum varies from 0.14 to 3.99 mm2. The thickness of the tunica albuginea varies from 0.029 to 0.296 mm in the dorsal region and from 0.014 to 0.113 mm in the ventral region of the corpora cavernosa. We found a strong correlation between the total penile area, corpora cavernosa and corpus spongiosum with fetal age (weeks following conception). The growth rate was more intense during the second trimester (13 to 24 weeks of gestation) compared to the third trimester (25 to 36 weeks). Tunica albuginea thickness also was strongly correlated with fetal age and this structure was thicker in the dorsal vs ventral region. Copyright © 2013 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
[Do regional and generational differences in attitudes toward "Luck Resource Belief" exist?].

PubMed

Murakami, Koshi

2016-04-01

This article examines whether belief in superstitions and folklore differs by age and degree of modernization specifically. This study investigated regional and generational differences in attitudes toward "Luck Resource Belief," a notion regarding luck. The 500 Japanese participants in our sample were stratified by place of residence, age, and income. The results reflected gender differences, but not regional or generational differences with regard to the "Luck Resource Belief" scale scores. Based on these results, the hypothesis that the mass media plays a major role in the dissemination of information about superstitions and folklore is discussed in this context.
A Linguistic Inquiry and Word Count Analysis of the Adult Attachment Interview in Two Large Corpora.

PubMed

Waters, Theodore E A; Steele, Ryan D; Roisman, Glenn I; Haydon, Katherine C; Booth-LaForce, Cathryn

2016-01-01

An emerging literature suggests that variation in Adult Attachment Interview (AAI; George, Kaplan, & Main, 1985) states of mind about childhood experiences with primary caregivers is reflected in specific linguistic features captured by the Linguistic Inquiry Word Count automated text analysis program (LIWC; Pennebaker, Booth, & Francis, 2007). The current report addressed limitations of prior studies in this literature by using two large AAI corpora ( N s = 826 and 857) and a broader range of linguistic variables, as well as examining associations of LIWC-derived AAI dimensions with key developmental antecedents. First, regression analyses revealed that dismissing states of mind were associated with transcripts that were more truncated and deemphasized discussion of the attachment relationship whereas preoccupied states of mind were associated with longer, more conflicted, and angry narratives. Second, in aggregate, LIWC variables accounted for over a third of the variation in AAI dismissing and preoccupied states of mind, with regression weights cross-validating across samples. Third, LIWC-derived dismissing and preoccupied state of mind dimensions were associated with direct observations of maternal and paternal sensitivity as well as infant attachment security in childhood, replicating the pattern of results reported in Haydon, Roisman, Owen, Booth-LaForce, and Cox (2014) using coder-derived dismissing and preoccupation scores in the same sample.
The language of gene ontology: a Zipf's law analysis.

PubMed

Kalankesh, Leila Ranandeh; Stevens, Robert; Brass, Andy

2012-06-07

Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.
From language identification to language distance

NASA Astrophysics Data System (ADS)

Gamallo, Pablo; Pichel, José Ramom; Alegria, Iñaki

2017-10-01

In this paper, we define two quantitative distances to measure how far apart two languages are. The distance measure that we have identified as more accurate is based on the perplexity of n-gram models extracted from text corpora. An experiment to compare forty-four European languages has been performed. For this purpose, we computed the distances for all the possible language pairs and built a network whose nodes are languages and edges are distances. The network we have built on the basis of linguistic distances represents the current map of similarities and divergences among the main languages of Europe.
Improving Feature Representation Based on a Neural Network for Author Profiling in Social Media Texts

PubMed Central

2016-01-01

We introduce a lexical resource for preprocessing social media data. We show that a neural network-based feature representation is enhanced by using this resource. We conducted experiments on the PAN 2015 and PAN 2016 author profiling corpora and obtained better results when performing the data preprocessing using the developed lexical resource. The resource includes dictionaries of slang words, contractions, abbreviations, and emoticons commonly used in social media. Each of the dictionaries was built for the English, Spanish, Dutch, and Italian languages. The resource is freely available. PMID:27795703
Biting off More than They Can Chew? The Impact of Pedagogical Application of Corpus on Vocabulary Ability of Intermediate-Level ESL Learners in Mainland China: A Quasi-Experimental Study

ERIC Educational Resources Information Center

Shi, Jing

2017-01-01

The pedagogical values of corpora for ELT have been widely acknowledged and exploited, but their direct application in classroom teaching has entailed many difficulties. This project aims to investigate the impact of the pedagogical application of corpora on the vocabulary ability of intermediate-level ESL learners in mainland China. This…
Deleterious effects of progestagen treatment in VEGF expression in corpora lutea of pregnant ewes.

PubMed

Letelier, C A; Sanchez, M A; Garcia-Fernandez, R A; Sanchez, B; Garcia-Palencia, P; Gonzalez-Bulnes, A; Flores, J M

2011-06-01

The aim of the current study was to determine the possible effects of progestagen oestrous synchronization on vascular endothelial growth factor (VEGF) expression during sheep luteogenesis and the peri-implantation period and the relationship with luteal function. At days 9, 11, 13, 15, 17 and 21 of pregnancy, the ovaries from 30 progestagen treated and 30 ewes cycling after cloprostenol injection were evaluated by ultrasonography and, thereafter, collected and processed for immunohistochemical evaluation of VEGF; blood samples were drawn for evaluating plasma progesterone. The progestagen-treated group showed smaller corpora lutea than cloprostenol-treated and lower progesterone secretion. The expression of VEGF in the luteal cells increased with time in the cloprostenol group, but not in the progestagen-treated group, which even showed a decrease between days 11 and 13. In progestagen-treated sheep, VEGF expression in granulosa-derived parenchymal lobule capillaries was correlated with the size of the luteal tissue, larger corpora lutea had higher expression, and tended to have a higher progesterone secretion. In conclusion, the current study indicates the existence of deleterious effects from exogenous progestagen treatments on progesterone secretion from induced corpora lutea, which correlate with alterations in the expression of VEGF in the luteal tissue and, this, presumably in the processes of neoangiogenesis and luteogenesis. © 2010 Blackwell Verlag GmbH.
Expression of PCV2 antigen in the ovarian tissues of gilts.

PubMed

Tummaruk, Padet; Pearodwong, Pachara

2016-03-01

The present study was performed to determine the expression of porcine circovirus type 2 (PCV2) antigen in the ovarian tissue of naturally infected gilts. Ovarian tissues were obtained from 11 culled gilts. The ovarian tissues sections were divided into two groups according to PCV2 DNA detection using PCR. PCV2 antigen was assessed in the paraffin embedded ovarian tissue sections by immunohistochemistry. A total of 2,131 ovarian follicles (i.e., 1,437 primordial, 133 primary, 353 secondary and 208 antral follicles), 66 atretic follicles and 131 corpora lutea were evaluated. It was found that PCV2 antigen was detected in 280 ovarian follicles (i.e., 239 primordial follicles, 12 primary follicles, 10 secondary follicles and 19 antral follicles), 1 atretic follicles and 3 corpora lutea (P<0.05). PCV2 antigen was detected in primordial follicles more often than in secondary follicles, atretic follicles and corpora lutea (P<0.05). The detection of PCV2 antigen was found mainly in oocytes. PCV2 antigen was found in both PCV2 DNA positive and negative ovarian tissues. It can be concluded that PCV2 antigen is expressed in all types of the ovarian follicles and corpora lutea. Further studies should be carried out to determine the influence of PCV2 on porcine ovarian function and oocyte quality.
PIPE: a protein–protein interaction passage extraction module for BioCreative challenge

PubMed Central

Chu, Chun-Han; Su, Yu-Chen; Chen, Chien Chin; Hsu, Wen-Lian

2016-01-01

Identifying the interactions between proteins mentioned in biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this article, we propose PIPE, an interaction pattern generation module used in the Collaborative Biocurator Assistant Task at BioCreative V (http://www.biocreative.org/) to capture frequent protein-protein interaction (PPI) patterns within text. We also present an interaction pattern tree (IPT) kernel method that integrates the PPI patterns with convolution tree kernel (CTK) to extract PPIs. Methods were evaluated on LLL, IEPA, HPRD50, AIMed and BioInfer corpora using cross-validation, cross-learning and cross-corpus evaluation. Empirical evaluations demonstrate that our method is effective and outperforms several well-known PPI extraction methods. Database URL: PMID:27524807
Luteinizing hormone receptors in human ovarian follicles and corpora lutea during the menstrual cycle

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yamoto, M.; Nakano, R.; Iwasaki, M.

The binding of /sup 125/I-labeled human luteinizing hormone (hLH) to the 2000-g fraction of human ovarian follicles and corpora lutea during the entire menstrual cycle was examined. Specific high affinity, low capacity receptors for hLH were demonstrated in the 2000-g fraction of both follicles and corpora lutea. Specific binding of /sup 125/I-labeled hLH to follicular tissue increased from the early follicular phase to the ovulatory phase. Specific binding of /sup 125/I-labeled hLH to luteal tissue increased from the early luteal phase to the midluteal phase and decreased towards the late luteal phase. The results of the present study indicate thatmore » the increase and decrease in receptors for hLH during the menstrual cycle might play an important role in the regulation of the ovarian cycle.« less
Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing

PubMed Central

Deleger, Louise; Li, Qi; Kaiser, Megan; Stoutenborough, Laura

2013-01-01

Background A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. Objective Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. Methods To build the gold standard for evaluating the crowdsourcing workers’ performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd’s work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations. Results The agreement between the crowd’s annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task. Conclusions This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower’s quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches. PMID:23548263
tESA: a distributional measure for calculating semantic relatedness.

PubMed

Rybinski, Maciej; Aldana-Montes, José Francisco

2016-12-28

Semantic relatedness is a measure that quantifies the strength of a semantic link between two concepts. Often, it can be efficiently approximated with methods that operate on words, which represent these concepts. Approximating semantic relatedness between texts and concepts represented by these texts is an important part of many text and knowledge processing tasks of crucial importance in the ever growing domain of biomedical informatics. The problem of most state-of-the-art methods for calculating semantic relatedness is their dependence on highly specialized, structured knowledge resources, which makes these methods poorly adaptable for many usage scenarios. On the other hand, the domain knowledge in the Life Sciences has become more and more accessible, but mostly in its unstructured form - as texts in large document collections, which makes its use more challenging for automated processing. In this paper we present tESA, an extension to a well known Explicit Semantic Relatedness (ESA) method. In our extension we use two separate sets of vectors, corresponding to different sections of the articles from the underlying corpus of documents, as opposed to the original method, which only uses a single vector space. We present an evaluation of Life Sciences domain-focused applicability of both tESA and domain-adapted Explicit Semantic Analysis. The methods are tested against a set of standard benchmarks established for the evaluation of biomedical semantic relatedness quality. Our experiments show that the propsed method achieves results comparable with or superior to the current state-of-the-art methods. Additionally, a comparative discussion of the results obtained with tESA and ESA is presented, together with a study of the adaptability of the methods to different corpora and their performance with different input parameters. Our findings suggest that combined use of the semantics from different sections (i.e. extending the original ESA methodology with the use of title vectors) of the documents of scientific corpora may be used to enhance the performance of a distributional semantic relatedness measures, which can be observed in the largest reference datasets. We also present the impact of the proposed extension on the size of distributional representations.
TEES 2.2: Biomedical Event Extraction for Diverse Corpora

PubMed Central

2015-01-01

Background The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. Results The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. Conclusions The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented. PMID:26551925
TEES 2.2: Biomedical Event Extraction for Diverse Corpora.

PubMed

Björne, Jari; Salakoski, Tapio

2015-01-01

The Turku Event Extraction System (TEES) is a text mining program developed for the extraction of events, complex biomedical relationships, from scientific literature. Based on a graph-generation approach, the system detects events with the use of a rich feature set built via dependency parsing. The TEES system has achieved record performance in several of the shared tasks of its domain, and continues to be used in a variety of biomedical text mining tasks. The TEES system was quickly adapted to the BioNLP'13 Shared Task in order to provide a public baseline for derived systems. An automated approach was developed for learning the underlying annotation rules of event type, allowing immediate adaptation to the various subtasks, and leading to a first place in four out of eight tasks. The system for the automated learning of annotation rules is further enhanced in this paper to the point of requiring no manual adaptation to any of the BioNLP'13 tasks. Further, the scikit-learn machine learning library is integrated into the system, bringing a wide variety of machine learning methods usable with TEES in addition to the default SVM. A scikit-learn ensemble method is also used to analyze the importances of the features in the TEES feature sets. The TEES system was introduced for the BioNLP'09 Shared Task and has since then demonstrated good performance in several other shared tasks. By applying the current TEES 2.2 system to multiple corpora from these past shared tasks an overarching analysis of the most promising methods and possible pitfalls in the evolving field of biomedical event extraction are presented.
Folklore as a map of the world: rejecting "home" as a failure of the imagination.

PubMed

Lawless, Elaine J

2011-01-01

For years, I felt that my negative feelings about "home" were completely justified and that I saw no redemption in the area or the people whom I associated with "home." That is not to say that I do not love my family, but there was never any effort on my part to salvage or imagine whatever could have been viewed as—at the very least—constructive and positive about my "home" or how my home and childhood created the blueprint for my own personal "map of the world." I now believe that my rejection of home is actually a failure of my own imagination. In this article, I explore the ways in which I see folklore as constituting a map of our individual and collective world(s)—a comprehensive, if not always a comprehensible, map of our world, one that is often difficult to discern. I have taken the approach that I need to recover the various aspects of the maps my home offered to me, noting first their existence, then their utility, and finally, by extension, exploring how my personal project has become a way for me to rethink folklore as a kind of reconstituting enterprise.

Pharmacological Prevention and Reversion of Erectile Dysfunction after Radical Prostatectomy, By Modulation of Nitric Oxide/Cgmp Pathways

DTIC Science & Technology

2008-03-01

Figure 3. Time course of the effect of bilateral cavernosal nerve resection on the smooth muscle cell content in the rat corpora cavernosa. Penile...iindicates the apoptotic cells in the corpora cavernosa. Bottom: QIA for TUNEL ***Pɘ.001 Figure 7: Time course of the effect of bilateral...Figure 6 Effect of unilateral and bilateral cavernosal nerve resection and long-term sildenafil treatment on cell proliferation and turnover in the
Ontogenetic Profile of the Expression of Thyroid Hormone Receptors in Rat and Human Corpora Cavernosa of the Penis

PubMed Central

Carosa, Eleonora; Di Sante, Stefania; Rossi, Simona; Castri, Alessandra; D'Adamo, Fabio; Gravina, Giovanni Luca; Ronchi, Piero; Kostrouch, Zdenek; Dolci, Susanna; Lenzi, Andrea; Jannini, Emmanuele A

2010-01-01

Introduction In the last few years, various studies have underlined a correlation between thyroid function and male sexual function, hypothesizing a direct action of thyroid hormones on the penis. Aim To study the spatiotemporal distribution of mRNA for the thyroid hormone nuclear receptors (TR) α1, α2 and β in the penis and smooth muscle cells (SMCs) of the corpora cavernosa of rats and humans during development. Methods We used several molecular biology techniques to study the TR expression in whole tissues or primary cultures from human and rodent penile tissues of different ages. Main Outcome Measure We measured our data by semi-quantitative reverse transcription polymerase chain reaction (RT-PCR) amplification, Northern blot and immunohistochemistry. Results We found that TRα1 and TRα2 are both expressed in the penis and in SMCs during ontogenesis without development-dependent changes. However, in the rodent model, TRβ shows an increase from 3 to 6 days post natum (dpn) to 20 dpn, remaining high in adulthood. The same expression profile was observed in humans. While the expression of TRβ is strictly regulated by development, TRα1 is the principal isoform present in corpora cavernosa, suggesting its importance in SMC function. These results have been confirmed by immunohistochemistry localization in SMCs and endothelial cells of the corpora cavernosa. Conclusions The presence of TRs in the penis provides the biological basis for the direct action of thyroid hormones on this organ. Given this evidence, physicians would be advised to investigate sexual function in men with thyroid disorders. Carosa E, Di Sante S, Rossi S, Castri A, D'Adamo F, Gravina GL, Ronchi P, Kostrouch Z, Dolci S, Lenzi A, and Jannini EA. Ontogenetic profile of the expression of thyroid hormone receptors in rat and human corpora cavernosa of the penis. J Sex Med 2010;7:1381–1390. PMID:20141582
Surgical anatomy of penis in exstrophy-epispadias: a study of arrangement of fascial planes and superficial vessels of surgical significance.

PubMed

Kureel, Shiv Narain; Gupta, Archika; Singh, Chandra Shekhar; Kumar, Manoj

2013-10-01

To study the anatomic arrangement of the fascial planes and superficial vessels in relationship to the laid-open urethral plate, glans, corpus spongiosum, and corpora cavernosa in the penis of patients with exstrophy or epispadias. Of 6 patients, 4 had classic exstrophy and 2 had incontinent epispadias. These patients had presented beyond adolescence without previous intervention and were selected for the present study. Using a 1.5-T magnetic resonance imaging scanner and compatible 3-in. surface coil, the epispadiac penises were studied using fast spin echo sequences and contrast-enhanced sequences. In 2 patients, angiography of the superficial vessels was also performed using multidetector row helical computed tomography. The imaging findings were also verified during the subsequent reconstructive surgery. A clear demarcation of the skin, dartos fascia, Buck's fascia, corpora cavernosa, corpus spongiosum, and the intraglanular planes were seen with the course of the blood vessels. The penile dartos received axial pattern vessels from the external pudendal vessels, with collateral branches from the dorsal penile artery as transverse branches at the shaft of the penis and preputial branches at the coronal sulcus. Buck's fascia sleeved the corpora cavernosa, enveloped the neurovascular bundle, and fused with the corpus spongiosum without crossing the midline. Intraglanular extension of Buck's fascia separated the intraglanular vascular arcade from the tip of the corpora. Parallel to the ventral midline, axial pattern vessels to the skin-dartos complex are present, with an additional supply to the prepuce from the terminal penile arteries. These findings can be used for designing the skin coverage. The subfascial plane between the tip of the corpora and the intraglanular vascular arcade and the plane of cleavage between the cavernosa-spongiosum interface can be used for efficient corporal urethral separation. Copyright © 2013 Elsevier Inc. All rights reserved.
All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning.

PubMed

Airola, Antti; Pyysalo, Sampo; Björne, Jari; Pahikkala, Tapio; Ginter, Filip; Salakoski, Tapio

2008-11-19

Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure. We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus. We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided.
Contribution to terminology internationalization by word alignment in parallel corpora.

PubMed

Deléger, Louise; Merkel, Magnus; Zweigenbaum, Pierre

2006-01-01

Creating a complete translation of a large vocabulary is a time-consuming task, which requires skilled and knowledgeable medical translators. Our goal is to examine to which extent such a task can be alleviated by a specific natural language processing technique, word alignment in parallel corpora. We experiment with translation from English to French. Build a large corpus of parallel, English-French documents, and automatically align it at the document, sentence and word levels using state-of-the-art alignment methods and tools. Then project English terms from existing controlled vocabularies to the aligned word pairs, and examine the number and quality of the putative French translations obtained thereby. We considered three American vocabularies present in the UMLS with three different translation statuses: the MeSH, SNOMED CT, and the MedlinePlus Health Topics. We obtained several thousand new translations of our input terms, this number being closely linked to the number of terms in the input vocabularies. Our study shows that alignment methods can extract a number of new term translations from large bodies of text with a moderate human reviewing effort, and thus contribute to help a human translator obtain better translation coverage of an input vocabulary. Short-term perspectives include their application to a corpus 20 times larger than that used here, together with more focused methods for term extraction.
Contribution to Terminology Internationalization by Word Alignment in Parallel Corpora

PubMed Central

Deléger, Louise; Merkel, Magnus; Zweigenbaum, Pierre

2006-01-01

Background and objectives Creating a complete translation of a large vocabulary is a time-consuming task, which requires skilled and knowledgeable medical translators. Our goal is to examine to which extent such a task can be alleviated by a specific natural language processing technique, word alignment in parallel corpora. We experiment with translation from English to French. Methods Build a large corpus of parallel, English-French documents, and automatically align it at the document, sentence and word levels using state-of-the-art alignment methods and tools. Then project English terms from existing controlled vocabularies to the aligned word pairs, and examine the number and quality of the putative French translations obtained thereby. We considered three American vocabularies present in the UMLS with three different translation statuses: the MeSH, SNOMED CT, and the MedlinePlus Health Topics. Results We obtained several thousand new translations of our input terms, this number being closely linked to the number of terms in the input vocabularies. Conclusion Our study shows that alignment methods can extract a number of new term translations from large bodies of text with a moderate human reviewing effort, and thus contribute to help a human translator obtain better translation coverage of an input vocabulary. Short-term perspectives include their application to a corpus 20 times larger than that used here, together with more focused methods for term extraction. PMID:17238328
My family made me do it: the influence of family therapists' families of origin on their occupational choice.

PubMed

Goldklank, S

1986-06-01

This study is an empirical test and exploration of the folklore about family life correlates of family therapists' occupational choice. The folklore is translated into systems concepts, including role complementarity and the mutually determining effect of process and roles. Fifty-nine family therapists, 49 siblings of the therapists, and 51 undifferentiated, non-helping professionals were compared on FACES (29), The Complementary Role Questionnaire, and on demographic data. Inconsistencies in the results led to a critique of the clinical faithfulness of current systems measures. Family therapists did not differ on FACES, but did differ in aspects of roles from their siblings and from the control professionals.
Menzerath-Altmann law for distinct word distribution analysis in a large text

NASA Astrophysics Data System (ADS)

Eroglu, Sertac

2013-06-01

The empirical law uncovered by Menzerath and formulated by Altmann, known as the Menzerath-Altmann law (henceforth the MA law), reveals the statistical distribution behavior of human language in various organizational levels. Building on previous studies relating organizational regularities in a language, we propose that the distribution of distinct (or different) words in a large text can effectively be described by the MA law. The validity of the proposition is demonstrated by examining two text corpora written in different languages not belonging to the same language family (English and Turkish). The results show not only that distinct word distribution behavior can accurately be predicted by the MA law, but that this result appears to be language-independent. This result is important not only for quantitative linguistic studies, but also may have significance for other naturally occurring organizations that display analogous organizational behavior. We also deliberately demonstrate that the MA law is a special case of the probability function of the generalized gamma distribution.
Towards a Generalizable Time Expression Model for Temporal Reasoning in Clinical Notes

PubMed Central

Velupillai, Sumithra; Mowery, Danielle L.; Abdelrahman, Samir; Christensen, Lee; Chapman, Wendy W

2015-01-01

Accurate temporal identification and normalization is imperative for many biomedical and clinical tasks such as generating timelines and identifying phenotypes. A major natural language processing challenge is developing and evaluating a generalizable temporal modeling approach that performs well across corpora and institutions. Our long-term goal is to create such a model. We initiate our work on reaching this goal by focusing on temporal expression (TIMEX3) identification. We present a systematic approach to 1) generalize existing solutions for automated TIMEX3 span detection, and 2) assess similarities and differences by various instantiations of TIMEX3 models applied on separate clinical corpora. When evaluated on the 2012 i2b2 and the 2015 Clinical TempEval challenge corpora, our conclusion is that our approach is successful – we achieve competitive results for automated classification, and we identify similarities and differences in TIMEX3 modeling that will be informative in the development of a simplified, general temporal model. PMID:26958265
Sex Determination in Bees. IV. Genetic Control of Juvenile Hormone Production in MELIPONA QUADRIFASCIATA (Apidae)

PubMed Central

Kerr, Warwick Estevam; Akahira, Yukio; Camargo, Conceição A.

1975-01-01

Cell number and volume of corpora allata was determined for 8 phases of development, the first prepupal stage to adults 30 days old, in the social Apidae Melipona quadrifasciata. In the second prepupal stage a strong correlation was found between cell number and body weight ( r=0.651**), and cell number and corpora allata volume in prepupal stage (r=0.535*), which indicates that juvenile hormone has a definite role in caste determination in Melipona. The distribution of the volume of corpus allatum suggest a 3:1 segregation between bees with high volume of corpora allata against low and medium volume. This implies that genes xa and xb code for an enzyme that directly participates in juvenile hormone production. It was also concluded that the number of cells in the second prepupal stage is more important than the weight of the prepupa for caste determination. A scheme summarizing the genic control of sex and caste determination in Melipona bees in the prepupal phase is given. PMID:1213273
Co-occurrence frequency evaluated with large language corpora boosts semantic priming effects.

PubMed

Brunellière, Angèle; Perre, Laetitia; Tran, ThiMai; Bonnotte, Isabelle

2017-09-01

In recent decades, many computational techniques have been developed to analyse the contextual usage of words in large language corpora. The present study examined whether the co-occurrence frequency obtained from large language corpora might boost purely semantic priming effects. Two experiments were conducted: one with conscious semantic priming, the other with subliminal semantic priming. Both experiments contrasted three semantic priming contexts: an unrelated priming context and two related priming contexts with word pairs that are semantically related and that co-occur either frequently or infrequently. In the conscious priming presentation (166-ms stimulus-onset asynchrony, SOA), a semantic priming effect was recorded in both related priming contexts, which was greater with higher co-occurrence frequency. In the subliminal priming presentation (66-ms SOA), no significant priming effect was shown, regardless of the related priming context. These results show that co-occurrence frequency boosts pure semantic priming effects and are discussed with reference to models of semantic network.
Folklore and traditional ecological knowledge of geckos in Southern Portugal: implications for conservation and science.

PubMed

Ceríaco, Luis M P; Marques, Mariana P; Madeira, Natália C; Vila-Viçosa, Carlos M; Mendes, Paula

2011-09-05

Traditional Ecological Knowledge (TEK) and folklore are repositories of large amounts of information about the natural world. Ideas, perceptions and empirical data held by human communities regarding local species are important sources which enable new scientific discoveries to be made, as well as offering the potential to solve a number of conservation problems. We documented the gecko-related folklore and TEK of the people of southern Portugal, with the particular aim of understanding the main ideas relating to gecko biology and ecology. Our results suggest that local knowledge of gecko ecology and biology is both accurate and relevant. As a result of information provided by local inhabitants, knowledge of the current geographic distribution of Hemidactylus turcicus was expanded, with its presence reported in nine new locations. It was also discovered that locals still have some misconceptions of geckos as poisonous and carriers of dermatological diseases. The presence of these ideas has led the population to a fear of and aversion to geckos, resulting in direct persecution being one of the major conservation problems facing these animals. It is essential, from both a scientific and conservationist perspective, to understand the knowledge and perceptions that people have towards the animals, since, only then, may hitherto unrecognized pertinent information and conservation problems be detected and resolved.
Folklore and traditional ecological knowledge of geckos in Southern Portugal: implications for conservation and science

PubMed Central

2011-01-01

Traditional Ecological Knowledge (TEK) and folklore are repositories of large amounts of information about the natural world. Ideas, perceptions and empirical data held by human communities regarding local species are important sources which enable new scientific discoveries to be made, as well as offering the potential to solve a number of conservation problems. We documented the gecko-related folklore and TEK of the people of southern Portugal, with the particular aim of understanding the main ideas relating to gecko biology and ecology. Our results suggest that local knowledge of gecko ecology and biology is both accurate and relevant. As a result of information provided by local inhabitants, knowledge of the current geographic distribution of Hemidactylus turcicus was expanded, with its presence reported in nine new locations. It was also discovered that locals still have some misconceptions of geckos as poisonous and carriers of dermatological diseases. The presence of these ideas has led the population to a fear of and aversion to geckos, resulting in direct persecution being one of the major conservation problems facing these animals. It is essential, from both a scientific and conservationist perspective, to understand the knowledge and perceptions that people have towards the animals, since, only then, may hitherto unrecognized pertinent information and conservation problems be detected and resolved. PMID:21892925
Negated bio-events: analysis and identification

PubMed Central

2013-01-01

Background Negation occurs frequently in scientific literature, especially in biomedical literature. It has previously been reported that around 13% of sentences found in biomedical research articles contain negation. Historically, the main motivation for identifying negated events has been to ensure their exclusion from lists of extracted interactions. However, recently, there has been a growing interest in negative results, which has resulted in negation detection being identified as a key challenge in biomedical relation extraction. In this article, we focus on the problem of identifying negated bio-events, given gold standard event annotations. Results We have conducted a detailed analysis of three open access bio-event corpora containing negation information (i.e., GENIA Event, BioInfer and BioNLP’09 ST), and have identified the main types of negated bio-events. We have analysed the key aspects of a machine learning solution to the problem of detecting negated events, including selection of negation cues, feature engineering and the choice of learning algorithm. Combining the best solutions for each aspect of the problem, we propose a novel framework for the identification of negated bio-events. We have evaluated our system on each of the three open access corpora mentioned above. The performance of the system significantly surpasses the best results previously reported on the BioNLP’09 ST corpus, and achieves even better results on the GENIA Event and BioInfer corpora, both of which contain more varied and complex events. Conclusions Recently, in the field of biomedical text mining, the development and enhancement of event-based systems has received significant interest. The ability to identify negated events is a key performance element for these systems. We have conducted the first detailed study on the analysis and identification of negated bio-events. Our proposed framework can be integrated with state-of-the-art event extraction systems. The resulting systems will be able to extract bio-events with attached polarities from textual documents, which can serve as the foundation for more elaborate systems that are able to detect mutually contradicting bio-events. PMID:23323936
Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics.

PubMed

Percha, Bethany; Altman, Russ B

2013-01-01

The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology.
Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics

PubMed Central

Percha, Bethany; Altman, Russ B.

2013-01-01

The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology. PMID:24551397
Pharmacological properties of Datura stramonium L. as a potential medicinal tree: An overview

PubMed Central

Soni, Priyanka; Siddiqui, Anees Ahmad; Dwivedi, Jaya; Soni, Vishal

2012-01-01

India has a great wealth of various naturally occurring plant drugs which have great potential pharmacological activities. Datura stramonium (D. stramonium) is one of the widely well known folklore medicinal herbs. The troublesome weed, D. stramonium is a plant with both poisonous and medicinal properties and has been proven to have great pharmacological potential with a great utility and usage in folklore medicine. D. stromonium has been scientifically proven to contain alkaloids, tannins, carbohydrates and proteins. This plant has contributed various pharmacological actions in the scientific field of Indian systems of medicines like analgesic and antiasthmatic activities. The present paper presents an exclusive review work on the ethnomedical, phytochemical, pharmacological activities of this plant. PMID:23593583
Chapter 16: text mining for translational bioinformatics.

PubMed

Cohen, K Bretonnel; Hunter, Lawrence E

2013-04-01

Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Linguistic measures of chemical diversity and the "keywords" of molecular collections.

PubMed

Woźniak, Michał; Wołos, Agnieszka; Modrzyk, Urszula; Górski, Rafał L; Winkowski, Jan; Bajczyk, Michał; Szymkuć, Sara; Grzybowski, Bartosz A; Eder, Maciej

2018-05-15

Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections ("corpora"), including those deposited on the Internet - indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting most appropriate keywords from documents. This paper describes how such corpus-linguistic concepts can be extended to chemistry based on characteristic "chemical words" that span more than traditional functional groups and, instead, look at common structural fragments molecules share. Using these words, it is possible to quantify the diversity of chemical collections/databases in new ways and to define molecular "keywords" by which such collections are best characterized and annotated.
NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes.

PubMed

McEwan, Reed; Melton, Genevieve B; Knoll, Benjamin C; Wang, Yan; Hultman, Gretchen; Dale, Justin L; Meyer, Tim; Pakhomov, Serguei V

2016-01-01

Many design considerations must be addressed in order to provide researchers with full text and semantic search of unstructured healthcare data such as clinical notes and reports. Institutions looking at providing this functionality must also address the big data aspects of their unstructured corpora. Because these systems are complex and demand a non-trivial investment, there is an incentive to make the system capable of servicing future needs as well, further complicating the design. We present architectural best practices as lessons learned in the design and implementation NLP-PIER (Patient Information Extraction for Research), a scalable, extensible, and secure system for processing, indexing, and searching clinical notes at the University of Minnesota.

RysannMD: A biomedical semantic annotator balancing speed and accuracy.

PubMed

Cuzzola, John; Jovanović, Jelena; Bagheri, Ebrahim

2017-07-01

Recently, both researchers and practitioners have explored the possibility of semantically annotating large and continuously evolving collections of biomedical texts such as research papers, medical reports, and physician notes in order to enable their efficient and effective management and use in clinical practice or research laboratories. Such annotations can be automatically generated by biomedical semantic annotators - tools that are specifically designed for detecting and disambiguating biomedical concepts mentioned in text. The biomedical community has already presented several solid automated semantic annotators. However, the existing tools are either strong in their disambiguation capacity, i.e., the ability to identify the correct biomedical concept for a given piece of text among several candidate concepts, or they excel in their processing time, i.e., work very efficiently, but none of the semantic annotation tools reported in the literature has both of these qualities. In this paper, we present RysannMD (Ryerson Semantic Annotator for Medical Domain), a biomedical semantic annotation tool that strikes a balance between processing time and performance while disambiguating biomedical terms. In other words, RysannMD provides reasonable disambiguation performance when choosing the right sense for a biomedical term in a given context, and does that in a reasonable time. To examine how RysannMD stands with respect to the state of the art biomedical semantic annotators, we have conducted a series of experiments using standard benchmarking corpora, including both gold and silver standards, and four modern biomedical semantic annotators, namely cTAKES, MetaMap, NOBLE Coder, and Neji. The annotators were compared with respect to the quality of the produced annotations measured against gold and silver standards using precision, recall, and F 1 measure and speed, i.e., processing time. In the experiments, RysannMD achieved the best median F 1 measure across the benchmarking corpora, independent of the standard used (silver/gold), biomedical subdomain, and document size. In terms of the annotation speed, RysannMD scored the second best median processing time across all the experiments. The obtained results indicate that RysannMD offers the best performance among the examined semantic annotators when both quality of annotation and speed are considered simultaneously. Copyright © 2017 Elsevier Inc. All rights reserved.
Neural Influences on Sonic Hedgehog and Apoptosis in the Rat Penis1

PubMed Central

Bond, Christopher; Tang, Yi; Podlasek, Carol A.

2010-01-01

The role of sonic hedgehog (SHH) in maintaining corpora cavernosal morphology in the adult penis has been established; however, the mechanism of how SHH itself is regulated remains unclear. Since decreased SHH protein is a cause of smooth muscle apoptosis and erectile dysfunction (ED) in the penis, and SHH treatment can suppress cavernous nerve (CN) injury-induced apoptosis, the question of how SHH signaling is regulated is significant. It is likely that neural input is involved in this process since two models of neuropathy-induced ED exhibit decreased SHH protein and increased apoptosis in the penis. We propose the hypothesis that SHH abundance in the corpora cavernosa is regulated by SHH signaling in the pelvic ganglia, neural activity, or neural transport of a trophic factor from the pelvic ganglia to the corpora. We have examined each of these potential mechanisms. SHH inhibition in the penis shows a 12-fold increase in smooth muscle apoptosis. SHH inhibition in the pelvic ganglia causes significantly increased apoptosis (1.3-fold) and decreased SHH protein (1.1-fold) in the corpora cavernosa. SHH protein is not transported by the CN. Colchicine treatment of the CN resulted in significantly increased smooth muscle apoptosis (1.2-fold) and decreased SHH protein (1.3-fold) in the penis. Lidocaine treatment of the CN caused a similar increase in apoptosis (1.6-fold) and decrease in SHH protein (1.3-fold) in the penis. These results show that neural activity and a trophic factor from the pelvic ganglia/CN are necessary to regulate SHH protein and smooth muscle abundance in the penis. PMID:18256331
Ageing causes cytoplasmic retention of MaxiK channels in rat corporal smooth muscle cells

PubMed Central

Davies, KP; Stanevsky, Y; Moses, T; Chang, JS; Chance, MR; Melman, A

2007-01-01

The MaxiK channel plays a critical role in the regulation of corporal smooth muscle tone and thereby erectile function. Given that ageing results in a decline in erectile function, we determined changes in the expression of MaxiK, which might impact erectile function. Quantitative-polymerase chain reaction demonstrated that although there is no significant change in transcription of the α- and β-subunits that comprise the MaxiK channel, there are significant changes in the expression of transcripts encoding different splice variants. One transcript, SV1, is 13-fold increased in expression in the ageing rat corpora. SV1 has previously been reported to trap other isoforms of the MaxiK channel in the cytoplasm. Correlating with increased expression of SV1, we observed in older rats there is approximately a 13-fold decrease in MaxiK protein in the corpora cell membrane and a greater proportion is retained in the cytoplasm (approximately threefold). These experiments demonstrate that ageing of the corpora is accompanied by changes in alternative splicing and cellular localization of the MaxiK channel. PMID:17287835
Ageing causes cytoplasmic retention of MaxiK channels in rat corporal smooth muscle cells.

PubMed

Davies, K P; Stanevsky, Y; Tar, M T; Moses, T; Chang, J S; Chance, M R; Melman, A

2007-01-01

The MaxiK channel plays a critical role in the regulation of corporal smooth muscle tone and thereby erectile function. Given that ageing results in a decline in erectile function, we determined changes in the expression of MaxiK, which might impact erectile function. Quantitative-polymerase chain reaction demonstrated that although there is no significant change in transcription of the alpha- and beta-subunits that comprise the MaxiK channel, there are significant changes in the expression of transcripts encoding different splice variants. One transcript, SV1, is 13-fold increased in expression in the ageing rat corpora. SV1 has previously been reported to trap other isoforms of the MaxiK channel in the cytoplasm. Correlating with increased expression of SV1, we observed in older rats there is approximately a 13-fold decrease in MaxiK protein in the corpora cell membrane and a greater proportion is retained in the cytoplasm (approximately threefold). These experiments demonstrate that ageing of the corpora is accompanied by changes in alternative splicing and cellular localization of the MaxiK channel.
The undead in culture and science.

PubMed

Nugent, Connie; Berdine, Gilbert; Nugent, Kenneth

2018-04-01

The undead have a significant role in mythology, religion, folklore, and literature. In the 1800s, the word zombie was used to describe reanimated corpses in the Caribbean who often worked on plantations doing long, arduous field work. The movie White Zombie was released in 1932 and exploited this folklore, but it ignored the fact that zombies represent one outcome in Vodou religious beliefs regarding death and the migration of spirits following death. The interest in zombies eventually led to sociological and medical investigations into zombification. Wade Davis reported that powders used by malevolent priests (bokors) contained tetrodotoxin, which could cause the neurologic changes underlying the zombie phenotype. Recent clinical studies have indicated that synthetic cannabinoids and synthetic cathinones can cause bizarre zombie-like behavior. According to Haitian folklore, zombies can develop when bokors reanimate someone who suddenly died from an acute illness or who was purposely poisoned. Recent studies in molecular biology suggest that the sequence of programmed cell death can be reversed when the stressor is removed and that cells, tissues, and bodies (at least in Drosophila flies) can recover. These scientific studies would support the remote possibility that the near dead might recover under certain circumstances but have residual neuropsychological dysfunction. Alternatively, the bokors could maintain control of their victims using drugs with properties similar to those of synthetic cannabinoids. The concept of zombification needs to be considered in the context of culture, religion, and science.
Mining protein phosphorylation information from biomedical literature using NLP parsing and Support Vector Machines.

PubMed

Raja, Kalpana; Natarajan, Jeyakumar

2018-07-01

Extraction of protein phosphorylation information from biomedical literature has gained much attention because of the importance in numerous biological processes. In this study, we propose a text mining methodology which consists of two phases, NLP parsing and SVM classification to extract phosphorylation information from literature. First, using NLP parsing we divide the data into three base-forms depending on the biomedical entities related to phosphorylation and further classify into ten sub-forms based on their distribution with phosphorylation keyword. Next, we extract the phosphorylation entity singles/pairs/triplets and apply SVM to classify the extracted singles/pairs/triplets using a set of features applicable to each sub-form. The performance of our methodology was evaluated on three corpora namely PLC, iProLink and hPP corpus. We obtained promising results of >85% F-score on ten sub-forms of training datasets on cross validation test. Our system achieved overall F-score of 93.0% on iProLink and 96.3% on hPP corpus test datasets. Furthermore, our proposed system achieved best performance on cross corpus evaluation and outperformed the existing system with recall of 90.1%. The performance analysis of our unique system on three corpora reveals that it extracts protein phosphorylation information efficiently in both non-organism specific general datasets such as PLC and iProLink, and human specific dataset such as hPP corpus. Copyright © 2018 Elsevier B.V. All rights reserved.
Relationships between ovarian blood flow and ovarian response to eCG-treatment of dairy cows.

PubMed

Honnens, A; Niemann, H; Herzog, K; Paul, V; Meyer, H H D; Bollwein, H

2009-07-01

The goal of the present study was to investigate ovarian blood flow and ovarian response in cows undergoing a gonadotropin treatment to induce a superovulatory response, using transrectal colour Doppler sonography. Forty-two cows including 19 cross-bred, 14 German Holstein and 9 German Black Pied cows were examined sonographically before hormonal stimulation on Day 10 of the oestrous cycle, three days after administration of eCG (Day 13) and seven days after artificial insemination (Day 7(p.i.)). After each Doppler examination, blood was collected for determination of total oestrogens (E) and progesterone (P4) in peripheral plasma. The blood flow volume (BFV) and pulsatility index (PI), which is a measure for blood flow resistance, were determined in the ovarian artery, and B-mode sonography was used to count dominant follicles and corpora lutea. Important criteria to assess the ovarian response following the hormonal treatment were the number of follicles >5mm in diameter on Day 13 and the number of corpora lutea on Day 7(p.i.) per cow. The number of follicles ranged from 2 to 61 (mean+/-S.E.M.: 17.5+/-1.7) and corpora lutea from 0 to 50 (mean+/-S.E.M.: 17.0+/-1.6). The BFV increased from 28.4 to 45.0 ml/min between Days 10 and 13 and reached a maximum of 108.5 ml/min on Day 7(p.i.) The PI decreased from 6.25 on Day 10 to 4.70 on Day 13 and to 2.10 on Day 7(p.i.) The BFV and PI on Day 13 did not correlate with the number of follicles (P>0.05). However, on Day 7(p.i.) the number of corpora lutea correlated positively with the BFV (r=0.64; P<0.0001), and an inverse relationship was found for the PI (r=-0.51; P=0.0005). There were no correlations (P>0.05) between the BFV and PI on Day 10 and the number of follicles on Day 13 or the number of corpora lutea on Day 7(p.i.) Results of the present study show that in cows, a hormonal treatment to induce a superovulatory response yielded a marked increase in BFV and a marked decrease in PI in the ovarian artery. However, there was no correlation between BFV and PI in the ovarian arteries before hormonal stimulation and the number of follicles and corpora lutea that developed after stimulation. Thus BFV and PI measured in the ovarian arteries have limited diagnostic value to predict the outcome of a gonadotropin treatment.
Medical uses of Carthamus tinctorius L. (Safflower): a comprehensive review from Traditional Medicine to Modern Medicine.

PubMed

Delshad, Elahe; Yousefi, Mahdi; Sasannezhad, Payam; Rakhshandeh, Hasan; Ayati, Zahra

2018-04-01

Carthamus tinctorius L. , known as Kafesheh (Persian) and safflower (English) is vastly utilized in Traditional Medicine for various medical conditions, namely dysmenorrhea, amenorrhea, postpartum abdominal pain and mass, trauma and pain of joints. It is largely used for flavoring and coloring purposes among the local population. Recent reviews have addressed the uses of the plant in various ethnomedical systems. This review was an update to provide a summary on the botanical features, uses in Iranian folklore and modern medical applications of safflower. A main database containing important early published texts written in Persian, together with electronic papers was established on ethnopharmacology and modern pharmacology of C. tinctorius. Literature review was performed on the years from 1937 to 2016 in Web of Science, PubMed, Scientific Information Database, Google Scholar, and Scopus for the terms "Kafesheh", "safflower", "Carthamus tinctorius", and so forth. Safflower is an indispensable element of Iranian folklore medicine, with a variety of applications due to laxative effects. Also, it was recommended as treatment for rheumatism and paralysis, vitiligo and black spots, psoriasis, mouth ulcers, phlegm humor, poisoning, numb limbs, melancholy humor, and the like. According to the modern pharmacological and clinical examinations, safflower provides promising opportunities for the amelioration of myocardial ischemia, coagulation, thrombosis, inflammation, toxicity, cancer, and so forth. However, there have been some reports on its undesirable effects on male and female fertility. Most of these beneficial therapeutic effects were correlated to hydroxysafflor yellow A. More attention should be drawn to the lack of a thorough phytochemical investigation. The potential implications of safflower based on Persian traditional medicine, such as the treatment of rheumatism and paralysis, vitiligo and black spots, psoriasis, mouth ulcers, phlegm humor, poisoning, numb limbs, and melancholy humor warrant further consideration.
Influence of communal and private folklore on bringing meaning to the experience of persistent pain.

PubMed

Hendricks, Joyce Marie

2015-11-01

To provide an overview of the relevance and strengths of using the literary folkloristic methodology to explore the ways in which people with persistent pain relate to and make sense of their experiences through narrative accounts. Storytelling is a conversation with a purpose. The reciprocal bond between researcher and storyteller enables the examination of the meaning of experiences. Life narratives, in the context of wider traditional and communal folklore, can be analysed to discover how people make sense of their circumstances. This paper draws from the experience of the author, who has previously used this narrative approach. It is a reflection of how the approach may be used to understand those experiencing persistent pain without a consensual diagnosis. Using an integrative method, peer-reviewed research and discussion papers published between January 1990 and December 2014 and listed in the CINAHL, Science Direct, PsycINFO and Google Scholar databases were reviewed. In addition, texts that addressed research methodologies such as literary folkloristic methodology and Marxist literary theory were used. The unique role that nurses play in managing pain is couched in the historical and cultural context of nursing. Literary folkloristic methodology offers an opportunity to gain a better understanding and appreciation of how the experience of pain is constructed and to connect with sufferers. Literary folkloristic methodology reveals that those with persistent pain are often rendered powerless to live their lives. Increasing awareness of how this experience is constructed and maintained also allows an understanding of societal influences on nursing practice. Nurse researchers try to understand experiences in light of specific situations. Literary folkloristic methodology can enable them to understand the inter-relationship between people in persistent pain and how they construct their experiences.
Mining the pharmacogenomics literature—a survey of the state of the art

PubMed Central

Cohen, K. Bretonnel; Garten, Yael; Shah, Nigam H.

2012-01-01

This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research. PMID:22833496
Mining the pharmacogenomics literature--a survey of the state of the art.

PubMed

Hahn, Udo; Cohen, K Bretonnel; Garten, Yael; Shah, Nigam H

2012-07-01

This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.
Tick Removal

MedlinePlus

... down the toilet. Avoid folklore remedies such as “painting” the tick with nail polish or petroleum jelly, ... Privacy FOIA No Fear Act OIG 1600 Clifton Road Atlanta , GA 30329-4027 USA 800-CDC-INFO ( ...
PKDE4J: Entity and relation extraction for public knowledge discovery.

PubMed

Song, Min; Kim, Won Chul; Lee, Dahee; Heo, Go Eun; Kang, Keun Young

2015-10-01

Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction. Copyright © 2015 Elsevier Inc. All rights reserved.
Comparing published scientific journal articles to their pre-print versions

DOE PAGES

Klein, Martin; Broadwell, Peter; Farb, Sharon E.; ...

2018-02-05

Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: US academic libraries paid $1.7 billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. Here, we have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers from two distinct science, technology, and medicine corpora and their final published counterparts. This comparison had two working assumptions: (1) If the publishers’ argument is valid, the text ofmore » a pre-print paper should vary measurably from its corresponding final published version, and (2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications.« less
Identifying biological concepts from a protein-related corpus with a probabilistic topic model

PubMed Central

Zheng, Bin; McLean, David C; Lu, Xinghua

2006-01-01

Background Biomedical literature, e.g., MEDLINE, contains a wealth of knowledge regarding functions of proteins. Major recurring biological concepts within such text corpora represent the domains of this body of knowledge. The goal of this research is to identify the major biological topics/concepts from a corpus of protein-related MEDLINE© titles and abstracts by applying a probabilistic topic model. Results The latent Dirichlet allocation (LDA) model was applied to the corpus. Based on the Bayesian model selection, 300 major topics were extracted from the corpus. The majority of identified topics/concepts was found to be semantically coherent and most represented biological objects or concepts. The identified topics/concepts were further mapped to the controlled vocabulary of the Gene Ontology (GO) terms based on mutual information. Conclusion The major and recurring biological concepts within a collection of MEDLINE documents can be extracted by the LDA model. The identified topics/concepts provide parsimonious and semantically-enriched representation of the texts in a semantic space with reduced dimensionality and can be used to index text. PMID:16466569
Automated annotation of functional imaging experiments via multi-label classification

PubMed Central

Turner, Matthew D.; Chakrabarti, Chayan; Jones, Thomas B.; Xu, Jiawei F.; Fox, Peter T.; Luger, George F.; Laird, Angela R.; Turner, Jessica A.

2013-01-01

Identifying the experimental methods in human neuroimaging papers is important for grouping meaningfully similar experiments for meta-analyses. Currently, this can only be done by human readers. We present the performance of common machine learning (text mining) methods applied to the problem of automatically classifying or labeling this literature. Labeling terms are from the Cognitive Paradigm Ontology (CogPO), the text corpora are abstracts of published functional neuroimaging papers, and the methods use the performance of a human expert as training data. We aim to replicate the expert's annotation of multiple labels per abstract identifying the experimental stimuli, cognitive paradigms, response types, and other relevant dimensions of the experiments. We use several standard machine learning methods: naive Bayes (NB), k-nearest neighbor, and support vector machines (specifically SMO or sequential minimal optimization). Exact match performance ranged from only 15% in the worst cases to 78% in the best cases. NB methods combined with binary relevance transformations performed strongly and were robust to overfitting. This collection of results demonstrates what can be achieved with off-the-shelf software components and little to no pre-processing of raw text. PMID:24409112
Comparing published scientific journal articles to their pre-print versions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Klein, Martin; Broadwell, Peter; Farb, Sharon E.

Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: US academic libraries paid $1.7 billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. Here, we have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers from two distinct science, technology, and medicine corpora and their final published counterparts. This comparison had two working assumptions: (1) If the publishers’ argument is valid, the text ofmore » a pre-print paper should vary measurably from its corresponding final published version, and (2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications.« less
Dose-Volume Parameters of the Corpora Cavernosa Do Not Correlate With Erectile Dysfunction After External Beam Radiotherapy for Prostate Cancer: Results From a Dose-Escalation Trial

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wielen, Gerard J. van der; Hoogeman, Mischa S.; Dohle, Gert R.

2008-07-01

Purpose: To analyze the correlation between dose-volume parameters of the corpora cavernosa and erectile dysfunction (ED) after external beam radiotherapy (EBRT) for prostate cancer. Methods and Materials: Between June 1997 and February 2003, a randomized dose-escalation trial comparing 68 Gy and 78 Gy was conducted. Patients at our institute were asked to participate in an additional part of the trial evaluating sexual function. After exclusion of patients with less than 2 years of follow-up, ED at baseline, or treatment with hormonal therapy, 96 patients were eligible. The proximal corpora cavernosa (crura), the superiormost 1-cm segment of the crura, and themore » penile bulb were contoured on the planning computed tomography scan and dose-volume parameters were calculated. Results: Two years after EBRT, 35 of the 96 patients had developed ED. No statistically significant correlations between ED 2 years after EBRT and dose-volume parameters of the crura, the superiormost 1-cm segment of the crura, or the penile bulb were found. The few patients using potency aids typically indicated to have ED. Conclusion: No correlation was found between ED after EBRT for prostate cancer and radiation dose to the crura or penile bulb. The present study is the largest study evaluating the correlation between ED and radiation dose to the corpora cavernosa after EBRT for prostate cancer. Until there is clear evidence that sparing the penile bulb or crura will reduce ED after EBRT, we advise to be careful in sparing these structures, especially when this involves reducing treatment margins.« less
Prenatal development of the normal human vertebral corpora in different segments of the spine.

PubMed

Nolting, D; Hansen, B F; Keeling, J; Kjaer, I

1998-11-01

Vertebral columns from 13 normal human fetuses (10-24 weeks of gestation) that had aborted spontaneously were investigated as part of the legal autopsy procedure. The investigation included spinal cord analysis. To analyze the formation of the normal human vertebral corpora along the spine, including the early location and disappearance of the notochord. Reference material on the development of the normal human vertebral corpora is needed for interpretation of published observations on prenatal malformations in the spine, which include observations of various types of malformation (anencephaly, spina bifida) and various genotypes (trisomy 18, 21 and 13, as well as triploidy). The vertebral columns were studied by using radiography (Faxitron X-ray apparatus, Faxitron Model 43,855, Hewlett Packard) in lateral, frontal, and axial views and histology (decalcification, followed by toluidine blue and alcian blue staining) in and axial view. Immunohistochemical marking with Keratin Wide Spectrum also was done. Notochordal tissue (positive on marking with Keratin Wide Spectrum [DAKO, Denmark]) was located anterior to the cartilaginous body center in the youngest fetuses. The process of disintegration of the notochord and the morphology of the osseous vertebral corpora in the lumbosacral, thoracic, and cervical segments are described. Marked differences appeared in axial views, which were verified on horizontal histologic sections. Also, the increase in size was different in the different segments, being most pronounced in the thoracic and upper lumbar bodies. The lower thoracic bodies were the first to ossify. The morphologic changes observed by radiography were verified histologically. In this study, normal prenatal standards were established for the early development of the vertebral column. These standards can be used in the future--for evaluation of pathologic deviations in the human vertebral column in the second trimester.
Mice null for Frizzled4 (Fzd4-/-) are infertile and exhibit impaired corpora lutea formation and function.

PubMed

Hsieh, Minnie; Boerboom, Derek; Shimada, Masayuki; Lo, Yuet; Parlow, Albert F; Luhmann, Ulrich F O; Berger, Wolfgang; Richards, JoAnne S

2005-12-01

Previous studies showed that transcripts encoding specific Wnt ligands and Frizzled receptors including Wnt4, Frizzled1 (Fzd1), and Frizzled4 (Fzd4) were expressed in a cell-specific manner in the adult mouse ovary. Overlapping expression of Wnt4 and Fzd4 mRNA in small follicles and corpora lutea led us to hypothesize that the infertility of mice null for Fzd4 (Fzd4-/-) might involve impaired follicular growth or corpus luteum formation. Analyses at defined stages of reproductive function indicate that immature Fzd4-/- mouse ovaries contain follicles at many stages of development and respond to exogenous hormone treatments in a manner similar to their wild-type littermates, indicating that the processes controlling follicular development and follicular cell responses to gonadotropins are intact. Adult Fzd4-/- mice also exhibit normal mating behavior and ovulate, indicating that endocrine events controlling these processes occur. However, Fzd4-/- mice fail to become pregnant and do not produce offspring. Histological and functional analyses of ovaries from timed mating pairs at Days 1.5-7.5 postcoitus (p.c.) indicate that the corpora lutea of the Fzd4-/- mice do not develop normally. Expression of luteal cell-specific mRNAs (Lhcgr, Prlr, Cyp11a1 and Sfrp4) is reduced, luteal cell morphology is altered, and markers of angiogenesis and vascular formation (Efnb1, Efnb2, Ephb4, Vegfa, Vegfc) are low in the Fzd4-/- mice. Although a recently identified, high-affinity FZD4 ligand Norrin (Norrie disease pseudoglioma homolog) is expressed in the ovary, adult Ndph-/- mice contain functional corpora lutea and do not phenocopy Fzd4-/- mice. Thus, Fzd4 appears to impact the formation of the corpus luteum by mechanisms that more closely phenocopy Prlr null mice.

A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations

PubMed Central

2017-01-01

Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations. PMID:28644863
Helios: Understanding Solar Evolution Through Text Analytics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Randazzese, Lucien

This proof-of-concept project focused on developing, testing, and validating a range of bibliometric, text analytic, and machine-learning based methods to explore the evolution of three photovoltaic (PV) technologies: Cadmium Telluride (CdTe), Dye-Sensitized solar cells (DSSC), and Multi-junction solar cells. The analytical approach to the work was inspired by previous work by the same team to measure and predict the scientific prominence of terms and entities within specific research domains. The goal was to create tools that could assist domain-knowledgeable analysts in investigating the history and path of technological developments in general, with a focus on analyzing step-function changes in performance,more » or “breakthroughs,” in particular. The text-analytics platform developed during this project was dubbed Helios. The project relied on computational methods for analyzing large corpora of technical documents. For this project we ingested technical documents from the following sources into Helios: Thomson Scientific Web of Science (papers), the U.S. Patent & Trademark Office (patents), the U.S. Department of Energy (technical documents), the U.S. National Science Foundation (project funding summaries), and a hand curated set of full-text documents from Thomson Scientific and other sources.« less
News from the North.

ERIC Educational Resources Information Center

Ellis, Sarah

1985-01-01

Examines two instances where European based folklore has made its way to Canada via immigrant storytellers, Alice Kane's "Songs and Sayings of an Ulster Childhood," and Eva Martin and Laszlo Gal's "Canadian Fairy Tales." (RBW)
ECO: A Framework for Entity Co-Occurrence Exploration with Faceted Navigation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Halliday, K. D.

2010-08-20

Even as highly structured databases and semantic knowledge bases become more prevalent, a substantial amount of human knowledge is reported as written prose. Typical textual reports, such as news articles, contain information about entities (people, organizations, and locations) and their relationships. Automatically extracting such relationships from large text corpora is a key component of corporate and government knowledge bases. The primary goal of the ECO project is to develop a scalable framework for extracting and presenting these relationships for exploration using an easily navigable faceted user interface. ECO uses entity co-occurrence relationships to identify related entities. The system aggregates andmore » indexes information on each entity pair, allowing the user to rapidly discover and mine relational information.« less
NLP-PIER: A Scalable Natural Language Processing, Indexing, and Searching Architecture for Clinical Notes

PubMed Central

McEwan, Reed; Melton, Genevieve B.; Knoll, Benjamin C.; Wang, Yan; Hultman, Gretchen; Dale, Justin L.; Meyer, Tim; Pakhomov, Serguei V.

2016-01-01

Many design considerations must be addressed in order to provide researchers with full text and semantic search of unstructured healthcare data such as clinical notes and reports. Institutions looking at providing this functionality must also address the big data aspects of their unstructured corpora. Because these systems are complex and demand a non-trivial investment, there is an incentive to make the system capable of servicing future needs as well, further complicating the design. We present architectural best practices as lessons learned in the design and implementation NLP-PIER (Patient Information Extraction for Research), a scalable, extensible, and secure system for processing, indexing, and searching clinical notes at the University of Minnesota. PMID:27570663
Performance analysis of Supply Chain Management with Supply Chain Operation reference model

NASA Astrophysics Data System (ADS)

Hasibuan, Abdurrozzaq; Arfah, Mahrani; Parinduri, Luthfi; Hernawati, Tri; Suliawati; Harahap, Bonar; Rahmah Sibuea, Siti; Krianto Sulaiman, Oris; purwadi, Adi

2018-04-01

This research was conducted at PT. Shamrock Manufacturing Corpora, the company is required to think creatively to implement competition strategy by producing goods/services that are more qualified, cheaper. Therefore, it is necessary to measure the performance of Supply Chain Management in order to improve the competitiveness. Therefore, the company is required to optimize its production output to meet the export quality standard. This research begins with the creation of initial dimensions based on Supply Chain Management process, ie Plan, Source, Make, Delivery, and Return with hierarchy based on Supply Chain Reference Operation that is Reliability, Responsiveness, Agility, Cost, and Asset. Key Performance Indicator identification becomes a benchmark in performance measurement whereas Snorm De Boer normalization serves to equalize Key Performance Indicator value. Analiytical Hierarchy Process is done to assist in determining priority criteria. Measurement of Supply Chain Management performance at PT. Shamrock Manufacturing Corpora produces SC. Responsiveness (0.649) has higher weight (priority) than other alternatives. The result of performance analysis using Supply Chain Reference Operation model of Supply Chain Management performance at PT. Shamrock Manufacturing Corpora looks good because its monitoring system between 50-100 is good.
The undead in culture and science

PubMed Central

Nugent, Connie; Berdine, Gilbert; Nugent, Kenneth

2018-01-01

ABSTRACT The undead have a significant role in mythology, religion, folklore, and literature. In the 1800s, the word zombie was used to describe reanimated corpses in the Caribbean who often worked on plantations doing long, arduous field work. The movie White Zombie was released in 1932 and exploited this folklore, but it ignored the fact that zombies represent one outcome in Vodou religious beliefs regarding death and the migration of spirits following death. The interest in zombies eventually led to sociological and medical investigations into zombification. Wade Davis reported that powders used by malevolent priests (bokors) contained tetrodotoxin, which could cause the neurologic changes underlying the zombie phenotype. Recent clinical studies have indicated that synthetic cannabinoids and synthetic cathinones can cause bizarre zombie-like behavior. According to Haitian folklore, zombies can develop when bokors reanimate someone who suddenly died from an acute illness or who was purposely poisoned. Recent studies in molecular biology suggest that the sequence of programmed cell death can be reversed when the stressor is removed and that cells, tissues, and bodies (at least in Drosophila flies) can recover. These scientific studies would support the remote possibility that the near dead might recover under certain circumstances but have residual neuropsychological dysfunction. Alternatively, the bokors could maintain control of their victims using drugs with properties similar to those of synthetic cannabinoids. The concept of zombification needs to be considered in the context of culture, religion, and science. PMID:29706835
Injuries in students of three different dance techniques.

PubMed

Echegoyen, Soledad; Acuña, Eugenia; Rodríguez, Cristina

2010-06-01

As with any athlete, the dancer has a high risk for injury. Most studies carried out relate to classical and modern dance; however, there is a lack of reports on injuries involving other dance techniques. This study is an attempt to determine the differences in the incidence, the exposure-related rates, and the kind of injuries in three different dance techniques. A prospective study about dance injuries was carried out between 2004 and 2007 on students of modern, Mexican folkloric, and Spanish dance at the Escuela Nacional de Danza. A total of 1,168 injuries were registered in 444 students; the injury rate was 4 injuries/student for modern dance and 2 injuries/student for Mexican folkloric and Spanish dance. The rate per training hours was 4 for modern, 1.8 for Mexican folkloric, and 1.5 injuries/1,000 hr of training for Spanish dance. The lower extremity is the most frequent structure injured (70.47%), and overuse injuries comprised 29% of the total. The most frequent injuries were strain, sprain, back pain, and patellofemoral pain. This study has a consistent medical diagnosis of the injuries and is the first attempt in Mexico to compare the incidence of injuries in different dance techniques. To decrease the frequency of student injury, it is important to incorporate prevention programs into dance program curricula. More studies are necessary to define causes and mechanisms of injury, as well as an analysis of training methodology, to decrease the incidence of the muscle imbalances resulting in injury.
Exploiting Language Models to Classify Events from Twitter

PubMed Central

Vo, Duc-Thuan; Hai, Vo Thuan; Ock, Cheol-Young

2015-01-01

Classifying events is challenging in Twitter because tweets texts have a large amount of temporal data with a lot of noise and various kinds of topics. In this paper, we propose a method to classify events from Twitter. We firstly find the distinguishing terms between tweets in events and measure their similarities with learning language models such as ConceptNet and a latent Dirichlet allocation method for selectional preferences (LDA-SP), which have been widely studied based on large text corpora within computational linguistic relations. The relationship of term words in tweets will be discovered by checking them under each model. We then proposed a method to compute the similarity between tweets based on tweets' features including common term words and relationships among their distinguishing term words. It will be explicit and convenient for applying to k-nearest neighbor techniques for classification. We carefully applied experiments on the Edinburgh Twitter Corpus to show that our method achieves competitive results for classifying events. PMID:26451139
Bengali-English Relevant Cross Lingual Information Access Using Finite Automata

NASA Astrophysics Data System (ADS)

Banerjee, Avishek; Bhattacharyya, Swapan; Hazra, Simanta; Mondal, Shatabdi

2010-10-01

CLIR techniques searches unrestricted texts and typically extract term and relationships from bilingual electronic dictionaries or bilingual text collections and use them to translate query and/or document representations into a compatible set of representations with a common feature set. In this paper, we focus on dictionary-based approach by using a bilingual data dictionary with a combination to statistics-based methods to avoid the problem of ambiguity also the development of human computer interface aspects of NLP (Natural Language processing) is the approach of this paper. The intelligent web search with regional language like Bengali is depending upon two major aspect that is CLIA (Cross language information access) and NLP. In our previous work with IIT, KGP we already developed content based CLIA where content based searching in trained on Bengali Corpora with the help of Bengali data dictionary. Here we want to introduce intelligent search because to recognize the sense of meaning of a sentence and it has a better real life approach towards human computer interactions.
Marigold (Calendula officinalis L.): an evidence-based systematic review by the Natural Standard Research Collaboration.

PubMed

Basch, Ethan; Bent, Steve; Foppa, Ivo; Haskmi, Sadaf; Kroll, David; Mele, Michelle; Szapary, Philippe; Ulbricht, Catherine; Vora, Mamta; Yong, Sophanna

2006-01-01

An evidence-based systematic review including written and statistical analysis of scientific literature, expert opinion, folkloric precedent, history, pharmacology, kinetics/dynamics, interactions, adverse effects, toxicology and dosing.
An evidence-based systematic review of saffron (Crocus sativus) by the Natural Standard Research Collaboration.

PubMed

Ulbricht, Catherine; Conquer, Julie; Costa, Dawn; Hollands, Whitney; Iannuzzi, Carmen; Isaac, Richard; Jordan, Joseph K; Ledesma, Natalie; Ostroff, Cathy; Serrano, Jill M Grimes; Shaffer, Michael D; Varghese, Minney

2011-03-01

An evidence-based systematic review including written and statistical analysis of scientific literature, expert opinion, folkloric precedent, history, pharmacology, kinetics/dynamics, interactions, adverse effects, toxicology, and dosing.
Evidence-based systematic review of saw palmetto by the Natural Standard Research Collaboration.

PubMed

Ulbricht, Catherine; Basch, Ethan; Bent, Steve; Boon, Heather; Corrado, Michelle; Foppa, Ivo; Hashmi, Sadaf; Hammerness, Paul; Kingsbury, Eileen; Smith, Michael; Szapary, Philippe; Vora, Mamta; Weissner, Wendy

2006-01-01

Here presented is an evidence-based systematic review including written and statistical analysis of scientific literature, expert opinion, folkloric precedent, history, pharmacology, kinetics/dynamics, interactions, adverse effects, toxicology, and dosing.
New Folklore about Water.

ERIC Educational Resources Information Center

LeMaire, Peter; Waiveris, Charles

1995-01-01

Describes experiments designed to investigate the cooling rate of microwave-boiled water as compared to that of stove-boiled water. Concludes that within experimental limits, microwave-boiled water and stove-boiled water cool at the same rate. (JRH)
BioC: a minimalist approach to interoperability for biomedical text processing

PubMed Central

Comeau, Donald C.; Islamaj Doğan, Rezarta; Ciccarese, Paolo; Cohen, Kevin Bretonnel; Krallinger, Martin; Leitner, Florian; Lu, Zhiyong; Peng, Yifan; Rinaldi, Fabio; Torii, Manabu; Valencia, Alfonso; Verspoor, Karin; Wiegers, Thomas C.; Wu, Cathy H.; Wilbur, W. John

2013-01-01

A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/. Database URL: http://bioc.sourceforge.net/ PMID:24048470
Ishmael Reed and the Politics of Aesthetics, or Shake Hands and Come Out Conjuring

ERIC Educational Resources Information Center

Fontenot, Chester J.

1978-01-01

Discusses the ways in which Ishmael Reed uses black American folklore, black American language, traditional African religion, and African myths as poetic materials from which he develops artistic forms. (GW)
Automated extraction and semantic analysis of mutation impacts from the biomedical literature

PubMed Central

2012-01-01

Background Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities. Results We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner. Conclusion We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions. PMID:22759648
Circumferential Peyronie's disease involving both the corpora cavernosa.

PubMed

Narita, T; Kudo, H; Matsumoto, K

1995-05-01

An extraordinary form of Peyronies disease is reported. The patient was a 52 year old male, who died of a malignant thymoma with multiple bone metastasis, extensive pleural carcinomatosis of the left lung and some metastatic nodules in the liver and the mesenterium. At autopsy, the proximal and middle portions of the penis were very hard. Macroscopically, the entire tunica albuginea of both the corpora cavernosa was markedly thickened, 2-4 mm; and calcified. Microscopically, the tunica albuginea showed extensive hyaline degeneration, calcification and ossifying foci with osteoblasts and osteoclasts. Inflammatory cells were frequently found beneath the thickened tunica albuginea. In the corpus cavernosum, cavernous arteries showed marked intimal thickening and medial muscular degeneration with a few inflammatory cells. Smooth muscles of the stroma were extensively atrophic and degenerative, and some of them were infiltrated with a few inflammatory cells. In the corpus spongiosum, the tunica albuginea was not thickened, but the smooth muscle in the stroma was atrophic and degenerative and a few inflammatory cells were also found. Surprisingly, there was no Littrés gland around the urethra. In Peyronies disease, the dorsal part of the penis is usually involved, and less frequently lateral or ventral sites are involved. The circumferential involvement of both the corpora cavernosa has not been reported until now, as far as the authors know.
Subtropical acarien profile by topography, seasons and change of house furnishings: 80's blueprint to the future.

PubMed

Massey, Douglas G; Hope, Bradley E; Furumizo, Roy T

2010-04-01

In Hawai'i, mortality and morbidity from asthma are significant. In the 80's, there had been no local studies of topography folklore. There had been only one report of seasonal variation in house dust mite (HDM) density in Hawai'i, and this showed no significant variation in O'ahu's Manoa Valley but a definite variation in Waikiki. There were no studies of complete replacement of furnishings. In this pilot study, homes in a valley, coastal, and plain sites were investigated for 12 months in 2 homes on O'ahu. A 3rd home was studied prior to and after arrival of furnishings from Denver, Colorado. Of the 3 homes, #1 was in Palolo Valley Honolulu, #2 coastal at Pearl Harbor and #3 on the plain at Mililani. House dust samples were taken from 4-5 sites in 2 rooms every 5 weeks. Sampling and determination of density and species were those of Furumizo. They were unsupportive of the topography and seasonal variation folklore. Density surged in the 3rd home to > 12000 mites/ gram of dust within 10-15 weeks with the complete change of low density HDM furnishings. D. pteronyssinus (Dp) was dominant in each home year-round. Minor species of mites made up to 1/3 of total mites in 2 homes. The folklore relating improvement in asthma to geography was not supported. 2 of the 3 homes showed minimal seasonal variation in HDM density. Local mites heavily colonized furniture from high altitude Colorado in a surge within 10-15 weeks.
The Fairy-Folk Tale in Media Art: Reflections of Disney and Duvall.

ERIC Educational Resources Information Center

Molloy, Toni

1988-01-01

Focuses on Walt Disney and Shelley Duvall, mass media producers who furnish children with fairy-folklore. Compares and contrasts what Disney and Duvall do and do not convey through their fairy-folk tales. (MS)

1991 Annual Selected Bibliography.

ERIC Educational Resources Information Center

Omatsu, Glenn, Ed.

1991-01-01

Presents a bibliography containing 1,269 Asian-American studies in the following categories: research issues, bibliographies, and methodology; contemporary politics and social movements; culture, literature, and folklore; demography and geography; education; family relations; health and medicine; historical studies; identity and assimilation;…
A universal multilingual weightless neural network tagger via quantitative linguistics.

PubMed

Carneiro, Hugo C C; Pedreira, Carlos E; França, Felipe M G; Lima, Priscila M V

2017-07-01

In the last decade, given the availability of corpora in several distinct languages, research on multilingual part-of-speech tagging started to grow. Amongst the novelties there is mWANN-Tagger (multilingual weightless artificial neural network tagger), a weightless neural part-of-speech tagger capable of being used for mostly-suffix-oriented languages. The tagger was subjected to corpora in eight languages of quite distinct natures and had a remarkable accuracy with very low sample deviation in every one of them, indicating the robustness of weightless neural systems for part-of-speech tagging tasks. However, mWANN-Tagger needed to be tuned for every new corpus, since each one required a different parameter configuration. For mWANN-Tagger to be truly multilingual, it should be usable for any new language with no need of parameter tuning. This article proposes a study that aims to find a relation between the lexical diversity of a language and the parameter configuration that would produce the best performing mWANN-Tagger instance. Preliminary analyses suggested that a single parameter configuration may be applied to the eight aforementioned languages. The mWANN-Tagger instance produced by this configuration was as accurate as the language-dependent ones obtained through tuning. Afterwards, the weightless neural tagger was further subjected to new corpora in languages that range from very isolating to polysynthetic ones. The best performing instances of mWANN-Tagger are again the ones produced by the universal parameter configuration. Hence, mWANN-Tagger can be applied to new corpora with no need of parameter tuning, making it a universal multilingual part-of-speech tagger. Further experiments with Universal Dependencies treebanks reveal that mWANN-Tagger may be extended and that it has potential to outperform most state-of-the-art part-of-speech taggers if better word representations are provided. Copyright © 2017 Elsevier Ltd. All rights reserved.
Desiderata for ontologies to be used in semantic annotation of biomedical documents.

PubMed

Bada, Michael; Hunter, Lawrence

2011-02-01

A wealth of knowledge valuable to the translational research scientist is contained within the vast biomedical literature, but this knowledge is typically in the form of natural language. Sophisticated natural-language-processing systems are needed to translate text into unambiguous formal representations grounded in high-quality consensus ontologies, and these systems in turn rely on gold-standard corpora of annotated documents for training and testing. To this end, we are constructing the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-text biomedical journal articles that are being manually annotated with the entire sets of terms from select vocabularies, predominantly from the Open Biomedical Ontologies (OBO) library. Our efforts in building this corpus has illuminated infelicities of these ontologies with respect to the semantic annotation of biomedical documents, and we propose desiderata whose implementation could substantially improve their utility in this task; these include the integration of overlapping terms across OBOs, the resolution of OBO-specific ambiguities, the integration of the BFO with the OBOs and the use of mid-level ontologies, the inclusion of noncanonical instances, and the expansion of relations and realizable entities. Copyright © 2010 Elsevier Inc. All rights reserved.
Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript

PubMed Central

Amancio, Diego R.; Altmann, Eduardo G.; Rybski, Diego; Oliveira, Osvaldo N.; Costa, Luciano da F.

2013-01-01

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications. PMID:23844002
Probing the statistical properties of unknown texts: application to the Voynich Manuscript.

PubMed

Amancio, Diego R; Altmann, Eduardo G; Rybski, Diego; Oliveira, Osvaldo N; Costa, Luciano da F

2013-01-01

While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.
Earthquake Prediction is Coming

ERIC Educational Resources Information Center

MOSAIC, 1977

1977-01-01

Describes (1) several methods used in earthquake research, including P:S ratio velocity studies, dilatancy models; and (2) techniques for gathering base-line data for prediction using seismographs, tiltmeters, laser beams, magnetic field changes, folklore, animal behavior. The mysterious Palmdale (California) bulge is discussed. (CS)
A General Interview Guide.

ERIC Educational Resources Information Center

Ives, Edward D.

This guide is divided into 11 sections, each containing a number of questions and suggestions for conducting successful folklore and oral history interviews. Section 1, "Settlement and Dwellings," deals with the physical environment, local inhabitants, houses and outbuildings, and public buildings. Section 2, "Livelihood and…
Libros y mas libros: Recommended Children's Books in Spanish.

ERIC Educational Resources Information Center

Schon, Isabel

2000-01-01

Provides an annotated bibliography of recommended Spanish-language children's books. A total of 15 books are grouped in the following categories: (1) books for the very young; (2) fiction; (3) folklore; (4) literature; (5) poetry; and (6) history. (EV)
Social Media and Language Processing: How Facebook and Twitter Provide the Best Frequency Estimates for Studying Word Recognition.

PubMed

Herdağdelen, Amaç; Marelli, Marco

2017-05-01

Corpus-based word frequencies are one of the most important predictors in language processing tasks. Frequencies based on conversational corpora (such as movie subtitles) are shown to better capture the variance in lexical decision tasks compared to traditional corpora. In this study, we show that frequencies computed from social media are currently the best frequency-based estimators of lexical decision reaction times (up to 3.6% increase in explained variance). The results are robust (observed for Twitter- and Facebook-based frequencies on American English and British English datasets) and are still substantial when we control for corpus size. © 2016 The Authors. Cognitive Science published by Wiley Periodicals, Inc. on behalf of Cognitive Science Society.
Toward a complete dataset of drug-drug interaction information from publicly available sources.

PubMed

Ayvaz, Serkan; Horn, John; Hassanzadeh, Oktie; Zhu, Qian; Stan, Johann; Tatonetti, Nicholas P; Vilar, Santiago; Brochhausen, Mathias; Samwald, Matthias; Rastegar-Mojarad, Majid; Dumontier, Michel; Boyce, Richard D

2015-06-01

Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms, integrating high-quality PDDI information from the merged dataset into Wikidata, and making the combined dataset accessible as Semantic Web Linked Data. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Normalization of relative and incomplete temporal expressions in clinical narratives.

PubMed

Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem

2015-09-01

To improve the normalization of relative and incomplete temporal expressions (RI-TIMEXes) in clinical narratives. We analyzed the RI-TIMEXes in temporally annotated corpora and propose two hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative domain: the anchor point hypothesis and the anchor relation hypothesis. We annotated the RI-TIMEXes in three corpora to study the characteristics of RI-TMEXes in different domains. This informed the design of our RI-TIMEX normalization system for the clinical domain, which consists of an anchor point classifier, an anchor relation classifier, and a rule-based RI-TIMEX text span parser. We experimented with different feature sets and performed an error analysis for each system component. The annotation confirmed the hypotheses that we can simplify the RI-TIMEXes normalization task using two multi-label classifiers. Our system achieves anchor point classification, anchor relation classification, and rule-based parsing accuracy of 74.68%, 87.71%, and 57.2% (82.09% under relaxed matching criteria), respectively, on the held-out test set of the 2012 i2b2 temporal relation challenge. Experiments with feature sets reveal some interesting findings, such as: the verbal tense feature does not inform the anchor relation classification in clinical narratives as much as the tokens near the RI-TIMEX. Error analysis showed that underrepresented anchor point and anchor relation classes are difficult to detect. We formulate the RI-TIMEX normalization problem as a pair of multi-label classification problems. Considering only RI-TIMEX extraction and normalization, the system achieves statistically significant improvement over the RI-TIMEX results of the best systems in the 2012 i2b2 challenge. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Text mining, a race against time? An attempt to quantify possible variations in text corpora of medical publications throughout the years.

PubMed

Wagner, Mathias; Vicinus, Benjamin; Muthra, Sherieda T; Richards, Tereza A; Linder, Roland; Frick, Vilma Oliveira; Groh, Andreas; Rubie, Claudia; Weichert, Frank

2016-06-01

The continuous growth of medical sciences literature indicates the need for automated text analysis. Scientific writing which is neither unitary, transcending social situation nor defined by a timeless idea is subject to constant change as it develops in response to evolving knowledge, aims at different goals, and embodies different assumptions about nature and communication. The objective of this study was to evaluate whether publication dates should be considered when performing text mining. A search of PUBMED for combined references to chemokine identifiers and particular cancer related terms was conducted to detect changes over the past 36 years. Text analyses were performed using freeware available from the World Wide Web. TOEFL Scores of territories hosting institutional affiliations as well as various readability indices were investigated. Further assessment was conducted using Principal Component Analysis. Laboratory examination was performed to evaluate the quality of attempts to extract content from the examined linguistic features. The PUBMED search yielded a total of 14,420 abstracts (3,190,219 words). The range of findings in laboratory experimentation were coherent with the variability of the results described in the analyzed body of literature. Increased concurrence of chemokine identifiers together with cancer related terms was found at the abstract and sentence level, whereas complexity of sentences remained fairly stable. The findings of the present study indicate that concurrent references to chemokines and cancer increased over time whereas text complexity remained stable. Copyright © 2016 Elsevier Ltd. All rights reserved.
Gut Feel: Developing Intuition in Army Junior Officers

DTIC Science & Technology

2007-03-15

folklore that summer and fall. Yon’s postings include an August entry that describes Colonel Kurilla’s apparent extrasensory ability to spot...boldness, perception , and character. This approach focuses on assessment of the situation vice comparison of multiple options.”30 Army operational
An evidence-based systematic review of gymnema (Gymnema sylvestre R. Br.) by the Natural Standard Research Collaboration.

PubMed

Ulbricht, Catherine; Abrams, Tracee Rae; Basch, Ethan; Davies-Heerema, Theresa; Foppa, Ivo; Hammerness, Paul; Rusie, Erica; Tanguay-Colucci, Shaina; Taylor, Sarah; Ulbricht, Catherine; Varghese, Minney; Weissner, Wendy; Woods, Jen

2011-09-01

An evidence-based systematic review of gymnema (Gymnema sylvestre R. Br.), including written and statistical analysis of scientific literature, expert opinion, folkloric precedent, history, pharmacology, kinetics/dynamics, interactions, adverse effects, toxicology, and dosing.
Medical uses of Carthamus tinctorius L. (Safflower): a comprehensive review from Traditional Medicine to Modern Medicine

PubMed Central

Delshad, Elahe; Yousefi, Mahdi; Sasannezhad, Payam; Rakhshandeh, Hasan

2018-01-01

Background Carthamus tinctorius L., known as Kafesheh (Persian) and safflower (English) is vastly utilized in Traditional Medicine for various medical conditions, namely dysmenorrhea, amenorrhea, postpartum abdominal pain and mass, trauma and pain of joints. It is largely used for flavoring and coloring purposes among the local population. Recent reviews have addressed the uses of the plant in various ethnomedical systems. Objective This review was an update to provide a summary on the botanical features, uses in Iranian folklore and modern medical applications of safflower. Methods A main database containing important early published texts written in Persian, together with electronic papers was established on ethnopharmacology and modern pharmacology of C. tinctorius. Literature review was performed on the years from 1937 to 2016 in Web of Science, PubMed, Scientific Information Database, Google Scholar, and Scopus for the terms “Kafesheh”, “safflower”, “Carthamus tinctorius”, and so forth. Results Safflower is an indispensable element of Iranian folklore medicine, with a variety of applications due to laxative effects. Also, it was recommended as treatment for rheumatism and paralysis, vitiligo and black spots, psoriasis, mouth ulcers, phlegm humor, poisoning, numb limbs, melancholy humor, and the like. According to the modern pharmacological and clinical examinations, safflower provides promising opportunities for the amelioration of myocardial ischemia, coagulation, thrombosis, inflammation, toxicity, cancer, and so forth. However, there have been some reports on its undesirable effects on male and female fertility. Most of these beneficial therapeutic effects were correlated to hydroxysafflor yellow A. Conclusion More attention should be drawn to the lack of a thorough phytochemical investigation. The potential implications of safflower based on Persian traditional medicine, such as the treatment of rheumatism and paralysis, vitiligo and black spots, psoriasis, mouth ulcers, phlegm humor, poisoning, numb limbs, and melancholy humor warrant further consideration.
Penile embryology and anatomy.

PubMed

Yiee, Jenny H; Baskin, Laurence S

2010-06-29

Knowledge of penile embryology and anatomy is essential to any pediatric urologist in order to fully understand and treat congenital anomalies. Sex differentiation of the external genitalia occurs between the 7th and 17th weeks of gestation. The Y chromosome initiates male differentiation through the SRY gene, which triggers testicular development. Under the influence of androgens produced by the testes, external genitalia then develop into the penis and scrotum. Dorsal nerves supply penile skin sensation and lie within Buck's fascia. These nerves are notably absent at the 12 o'clock position. Perineal nerves supply skin sensation to the ventral shaft skin and frenulum. Cavernosal nerves lie within the corpora cavernosa and are responsible for sexual function. Paired cavernosal, dorsal, and bulbourethral arteries have extensive anastomotic connections. During erection, the cavernosal artery causes engorgement of the cavernosa, while the deep dorsal artery leads to glans enlargement. The majority of venous drainage occurs through a single, deep dorsal vein into which multiple emissary veins from the corpora and circumflex veins from the spongiosum drain. The corpora cavernosa and spongiosum are all made of spongy erectile tissue. Buck's fascia circumferentially envelops all three structures, splitting into two leaves ventrally at the spongiosum. The male urethra is composed of six parts: bladder neck, prostatic, membranous, bulbous, penile, and fossa navicularis. The urethra receives its blood supply from both proximal and distal directions.
Penile Embryology and Anatomy

PubMed Central

Yiee, Jenny H.; Baskin, Laurence S.

2010-01-01

Knowledge of penile embryology and anatomy is essential to any pediatric urologist in order to fully understand and treat congenital anomalies. Sex differentiation of the external genitalia occurs between the 7and 17 weeks of gestation. The Y chromosome initiates male differentiation through the SRY gene, which triggers testicular development. Under the influence of androgens produced by the testes, external genitalia then develop into the penis and scrotum. Dorsal nerves supply penile skin sensation and lie within Buck's fascia. These nerves are notably absent at the 12 o'clock position. Perineal nerves supply skin sensation to the ventral shaft skin and frenulum. Cavernosal nerves lie within the corpora cavernosa and are responsible for sexual function. Paired cavernosal, dorsal, and bulbourethral arteries have extensive anastomotic connections. During erection, the cavernosal artery causes engorgement of the cavernosa, while the deep dorsal artery leads to glans enlargement. The majority of venous drainage occurs through a single, deep dorsal vein into which multiple emissary veins from the corpora and circumflex veins from the spongiosum drain. The corpora cavernosa and spongiosum are all made of spongy erectile tissue. Buck's fascia circumferentially envelops all three structures, splitting into two leaves ventrally at the spongiosum. The male urethra is composed of six parts: bladder neck, prostatic, membranous, bulbous, penile, and fossa navicularis. The urethra receives its blood supply from both proximal and distal directions. PMID:20602076
Juvenile Hormone Biosynthesis Gene Expression in the corpora allata of Honey Bee (Apis mellifera L.) Female Castes

PubMed Central

Rosa, Gustavo Conrado Couto; Moda, Livia Maria; Martins, Juliana Ramos; Bitondi, Márcia Maria Gentile; Hartfelder, Klaus; Simões, Zilá Luz Paulino

2014-01-01

Juvenile hormone (JH) controls key events in the honey bee life cycle, viz. caste development and age polyethism. We quantified transcript abundance of 24 genes involved in the JH biosynthetic pathway in the corpora allata-corpora cardiaca (CA-CC) complex. The expression of six of these genes showing relatively high transcript abundance was contrasted with CA size, hemolymph JH titer, as well as JH degradation rates and JH esterase (jhe) transcript levels. Gene expression did not match the contrasting JH titers in queen and worker fourth instar larvae, but jhe transcript abundance and JH degradation rates were significantly lower in queen larvae. Consequently, transcriptional control of JHE is of importance in regulating larval JH titers and caste development. In contrast, the same analyses applied to adult worker bees allowed us inferring that the high JH levels in foragers are due to increased JH synthesis. Upon RNAi-mediated silencing of the methyl farnesoate epoxidase gene (mfe) encoding the enzyme that catalyzes methyl farnesoate-to-JH conversion, the JH titer was decreased, thus corroborating that JH titer regulation in adult honey bees depends on this final JH biosynthesis step. The molecular pathway differences underlying JH titer regulation in larval caste development versus adult age polyethism lead us to propose that mfe and jhe genes be assayed when addressing questions on the role(s) of JH in social evolution. PMID:24489805
Evaluating the state of the art in coreference resolution for electronic medical records

PubMed Central

Bodnari, Andreea; Shen, Shuying; Forbush, Tyler; Pestian, John; South, Brett R

2012-01-01

Background The fifth i2b2/VA Workshop on Natural Language Processing Challenges for Clinical Records conducted a systematic review on resolution of noun phrase coreference in medical records. Informatics for Integrating Biology and the Bedside (i2b2) and the Veterans Affair (VA) Consortium for Healthcare Informatics Research (CHIR) partnered to organize the coreference challenge. They provided the research community with two corpora of medical records for the development and evaluation of the coreference resolution systems. These corpora contained various record types (ie, discharge summaries, pathology reports) from multiple institutions. Methods The coreference challenge provided the community with two annotated ground truth corpora and evaluated systems on coreference resolution in two ways: first, it evaluated systems for their ability to identify mentions of concepts and to link together those mentions. Second, it evaluated the ability of the systems to link together ground truth mentions that refer to the same entity. Twenty teams representing 29 organizations and nine countries participated in the coreference challenge. Results The teams' system submissions showed that machine-learning and rule-based approaches worked best when augmented with external knowledge sources and coreference clues extracted from document structure. The systems performed better in coreference resolution when provided with ground truth mentions. Overall, the systems struggled in solving coreference resolution for cases that required domain knowledge. PMID:22366294
Unsupervised learning of natural languages

PubMed Central

Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

2005-01-01

We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics. PMID:16087885

Unsupervised learning of natural languages.

PubMed

Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

2005-08-16

We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
Tailoring vocabularies for NLP in sub-domains: a method to detect unused word sense.

PubMed

Figueroa, Rosa L; Zeng-Treitler, Qing; Goryachev, Sergey; Wiechmann, Eduardo P

2009-11-14

We developed a method to help tailor a comprehensive vocabulary system (e.g. the UMLS) for a sub-domain (e.g. clinical reports) in support of natural language processing (NLP). The method detects unused sense in a sub-domain by comparing the relational neighborhood of a word/term in the vocabulary with the semantic neighborhood of the word/term in the sub-domain. The semantic neighborhood of the word/term in the sub-domain is determined using latent semantic analysis (LSA). We trained and tested the unused sense detection on two clinical text corpora: one contains discharge summaries and the other outpatient visit notes. We were able to detect unused senses with precision from 79% to 87%, recall from 48% to 74%, and an area under receiver operation curve (AUC) of 72% to 87%.
Chemical Entity Recognition and Resolution to ChEBI

PubMed Central

Grego, Tiago; Pesquita, Catia; Bastos, Hugo P.; Couto, Francisco M.

2012-01-01

Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2–5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks. PMID:25937941
Challenges for automatically extracting molecular interactions from full-text articles.

PubMed

McIntosh, Tara; Curran, James R

2009-09-24

The increasing availability of full-text biomedical articles will allow more biomedical knowledge to be extracted automatically with greater reliability. However, most Information Retrieval (IR) and Extraction (IE) tools currently process only abstracts. The lack of corpora has limited the development of tools that are capable of exploiting the knowledge in full-text articles. As a result, there has been little investigation into the advantages of full-text document structure, and the challenges developers will face in processing full-text articles. We manually annotated passages from full-text articles that describe interactions summarised in a Molecular Interaction Map (MIM). Our corpus tracks the process of identifying facts to form the MIM summaries and captures any factual dependencies that must be resolved to extract the fact completely. For example, a fact in the results section may require a synonym defined in the introduction. The passages are also annotated with negated and coreference expressions that must be resolved.We describe the guidelines for identifying relevant passages and possible dependencies. The corpus includes 2162 sentences from 78 full-text articles. Our corpus analysis demonstrates the necessity of full-text processing; identifies the article sections where interactions are most commonly stated; and quantifies the proportion of interaction statements requiring coherent dependencies. Further, it allows us to report on the relative importance of identifying synonyms and resolving negated expressions. We also experiment with an oracle sentence retrieval system using the corpus as a gold-standard evaluation set. We introduce the MIM corpus, a unique resource that maps interaction facts in a MIM to annotated passages within full-text articles. It is an invaluable case study providing guidance to developers of biomedical IR and IE systems, and can be used as a gold-standard evaluation set for full-text IR tasks.
Go West, Young Man (and Woman)!

ERIC Educational Resources Information Center

Byerly, Greg; Brodie, Carolyn S.

1998-01-01

Provides an annotated bibliography of Web sites on the American West, the Oregon Trail, Santa Fe Trail, Gold Rush, Donner Party, mountain men and the fur trade, Native Americans, cowboys, and western folklore. Lists related books, activity books, and CD-ROMs on cowboys, migration, and settling. (PEN)
Beyond the Coke Ovens: Women's Literacy in Whitney Pier, Nova Scotia.

ERIC Educational Resources Information Center

Kozar, Seana

2001-01-01

In partnership with a Nova Scotia community museum, local women used folklore and culture centered on crafts, food customs, and beliefs to engage in learning. Their efforts enriched local historical knowledge as well as their own self-confidence and literacy. (SK)
Themes of Suicide in the Kalevala.

ERIC Educational Resources Information Center

Achte, Kalle; And Others

1988-01-01

The Kalevala, Finland's national epic, is a crucial element of Finnish cultural identity and important to Finnish culture. Violence, death, and suicide are often repeated themes in Finnish folklore. The Kalevala provides insight into past attitudes toward death. Traditions passed through generations have influenced people's attitudes toward…
Storytelling Figures: A Pueblo Tradition.

ERIC Educational Resources Information Center

Kraus, Nancy

1997-01-01

In a collaborative unit on pueblo storytelling figures involving art, music, language arts, and physical education, a teacher describes how she helped second graders understand the Pueblo pottery tradition by reading aloud literature covering the past and present. Lists folklore, fiction, poetry, nonfiction, professional resources, videos, CDs,…
From "Teo" to "Harry Potter": Books in Spanish for Children and Adolescents.

ERIC Educational Resources Information Center

Schon, Isabel

2001-01-01

Presents a listing of Spanish-language books for children and adolescents recently published in Mexico, Venezuela, Spain, Colombia, and Argentina. The books are categorized as biography, history, historical fiction, folklore, poetry, religion, fiction, and literature for the very young. (SM)
The Mythology of Information Overload.

ERIC Educational Resources Information Center

Tidline, Tonyia J.

1999-01-01

Combines ideas from mythology, folklore, and library and information science to conclude that information overload is a myth of modern culture. Reports results of a pilot project intended to describe information overload experienced by a particular folk group composed of future library and information professionals. (Author/LRW)
The search for novel anticancer agents: a differentiation-based assay and analysis of a folklore product.

PubMed

Dinnen, R D; Ebisuzaki, K

1997-01-01

One alternative approach to the current use of cytotoxic anticancer drugs involves the use of differentiation-inducing agents. However, a wider application of this strategy would require the development of assays to search for new differentiation-inducing agents. In this report we describe an in vitro assay using the murine erythroleukemia (clone 3-1) cells. Tests for the efficacy of this assay for the analysis of antineoplastic activity in natural products led to studies on pau d'arco, a South American folklore product used in the treatment of cancer. Purification of the activity in aqueous extracts by solvent partition and thin layer chromatography (TLC) indicated the presence of two activities, one of which was identified as lapachol. The activity in the pau d'arco extracts and of lapachol was inhibited by vitamin K1. As a vitamin K antagonist, lapachol might target such vitamin K-dependent reactions as the activation of a ligand for the Axl receptor tyrosine kinase.
The folklore medicinal orchids of Sikkim

PubMed Central

Panda, Ashok Kumar; Mandal, Debasis

2013-01-01

Background: Orchids are well-known for decorative and aromatic values than its medicinal properties. Jīvantī, Jīvaka, Ṛṣabhaka, Rāsnā, Mānakanda, Pañcagula are used in Ayurveda are said to be orchids. There are 50 species of orchids in medicine. Sikkim has identified 523 species of wild orchids so far. Aim: The aim of this study is to determine the folklore medicinal use of orchids in Sikkim. Materials and Methods: To assess the traditional medicinal uses of orchid species, close contacts were made with native people particularly, traditional healers, religious leaders, nursery growers and villagers of Sikkim. The information was gathered with the help of the questionnaire and personal interviews with various knowledgeable respondents during the field visit in between August 2009 and December 2011. Results and Conclusion: We found that 36 species of orchids are used as medicines for different purposes of health. The botanical and ayurvedic name, phenology, parts used and medicinal uses of 36 orchids are presented in this paper along with its local distribution. PMID:25284941
The folklore medicinal orchids of Sikkim.

PubMed

Panda, Ashok Kumar; Mandal, Debasis

2013-10-01

Orchids are well-known for decorative and aromatic values than its medicinal properties. Jīvantī, Jīvaka, Ṛṣabhaka, Rāsnā, Mānakanda, Pañcagula are used in Ayurveda are said to be orchids. There are 50 species of orchids in medicine. Sikkim has identified 523 species of wild orchids so far. The aim of this study is to determine the folklore medicinal use of orchids in Sikkim. To assess the traditional medicinal uses of orchid species, close contacts were made with native people particularly, traditional healers, religious leaders, nursery growers and villagers of Sikkim. The information was gathered with the help of the questionnaire and personal interviews with various knowledgeable respondents during the field visit in between August 2009 and December 2011. We found that 36 species of orchids are used as medicines for different purposes of health. The botanical and ayurvedic name, phenology, parts used and medicinal uses of 36 orchids are presented in this paper along with its local distribution.
Science and Sanity in Special Education.

ERIC Educational Resources Information Center

Dammann, James E.; Vaughn, Sharon

2001-01-01

This article describes the usefulness of a scientific approach to improving knowledge and practice in special education. Of four approaches to knowledge (superstition, folklore, craft, and science), craft and science are supported and implications for special education drawn including the need to bridge the gulf between research knowledge and…
Multicultural Bibliography: Kindergarten-Grade 8 Library Books.

ERIC Educational Resources Information Center

San Diego County Office of Education, CA.

This annotated bibliography includes approximately 375 elementary-level books on history, biography, folklore, fiction, poetry, arts and crafts, and contemporary life of Blacks, Native Americans, Pan Asian Americans, Puerto Ricans, and other ethnic groups. The books cited are deemed to be non-stereotyped and appropriate for developing a…
Multiculturalism: Beyond Food, Festival, Folklore, and Fashion

ERIC Educational Resources Information Center

Meyer, Calvin F.; Rhoades, Elizabeth Kelley

2006-01-01

Despite overall trends toward increasing student diversity, geographic areas in the United States vary widely in their ethnic composition. In areas where the population is predominately European American, grasping a realistic meaning of "multiculturalism" can be difficult. Often, interpretations of the concept result in a mix of…
Traditional Tales and Contemporary Art to Promote Multiple Literacies

ERIC Educational Resources Information Center

Blackrose, Morgan Schatz; Schatz, Roman W.

2010-01-01

Storytelling-based arts projects offer a universal and inclusive pedagogy; challenging prejudices, celebrating diversity and promoting tolerance and resilience in participants. In addition they assist in the development of receptive and expressive language skills, provide a credible basis for understanding folklore, cultural traditions and social…
Preserving American Folk Heritage through Story and Song.

ERIC Educational Resources Information Center

Jalongo, Mary Renck

Underscoring folklore's appropriateness to multicultural classroom settings are its connection with past and present cultures, its constancy and change, and its potential for oral transmission of human values. Most importantly, folktales and songs enable children to participate in the history of universal human emotions. To effectively include…
American Weather Stories.

ERIC Educational Resources Information Center

Hughes, Patrick

Weather has shaped United States' culture, national character and folklore; at times it has changed the course of history. The seven accounts compiled in this publication highlight some of the nation's weather experiences from the hurricanes that threatened Christopher Columbus to the peculiar run of bad weather that has plagued American…
Healing, health, and horticulture: introduction to the workshop

USDA-ARS?s Scientific Manuscript database

The present-day emphasis of horticulture and health is an extension of ancient and medieval traditions. The relationship of healing and the horticultural arts predates written history and relates to ancient wisdom, custom, and folklore. Plants and health have been of great concern for humankind cons...

Chicano Studies: A Multidisciplinary Approach.

ERIC Educational Resources Information Center

Garcia, Eugene E., Ed.; And Others

One in a series on bilingual education, this book contains 15 chapters organized under the following subject headings: Chicano studies; Chicano history, social structure, and politics; literature and folklore; and education. Carlos Munoz, Jr., traces the history of Chicano studies and its impact on access to higher education. Albert Camarillo…
Teaching Unit: Japan.

ERIC Educational Resources Information Center

Evans, Dina

The cultural diversity of Japan can provide a rewarding learning experience for children of all grade levels. This teaching unit includes resources and ideas for the study of Japanese society, art, folklore, and poetry. Included among the instructional objectives are: (1) children will compare U.S. lifestyles with Japanese lifestyles by reading…
Cranberry xyloglucan structure and inhibition of Escherichia coli adhesion to epithelial cells

USDA-ARS?s Scientific Manuscript database

Cranberry juice has been used to treat urinary tract infections based on scientific reports of proanthocyanidin anti-adhesion activity for Escherichia coli as well as folklore. Xyloglucan oligosaccharides were also detected in cranberry juice and the pulp remaining following commercial juice extract...
Bibliography for Professional Development.

ERIC Educational Resources Information Center

Fehr, Helen, Comp.

Information published between 1953 and 1970 on the American Indian is included in this annotated bibliography. The bibliography is designed to aid professional development in the field of education and attempts to categorize and separate fields of interest. Major topics are culture, education, ethnology, folklore, art, housing, history, language,…
Developing a corpus of clinical notes manually annotated for part-of-speech.

PubMed

Pakhomov, Serguei V; Coden, Anni; Chute, Christopher G

2006-06-01

This paper presents a project whose main goal is to construct a corpus of clinical text manually annotated for part-of-speech (POS) information. We describe and discuss the process of training three domain experts to perform linguistic annotation. Three domain experts were trained to perform manual annotation of a corpus of clinical notes. A part of this corpus was combined with the Penn Treebank corpus of general purpose English text and another part was set aside for testing. The corpora were then used for training and testing statistical part-of-speech taggers. We list some of the challenges as well as encouraging results pertaining to inter-rater agreement and consistency of annotation. We used the Trigrams'n'Tags (TnT) [T. Brants, TnT-a statistical part-of-speech tagger, In: Proceedings of NAACL/ANLP-2000 Symposium, 2000] tagger trained on general English data to achieve 89.79% correctness. The same tagger trained on a portion of the medical data annotated for this project improved the performance to 94.69%. Furthermore, we find that discriminating between different types of discourse represented by different sections of clinical text may be very beneficial to improve correctness of POS tagging. Our preliminary experimental results indicate the necessity for adapting state-of-the-art POS taggers to the sublanguage domain of clinical text.
SAIL: Summation-bAsed Incremental Learning for Information-Theoretic Text Clustering.

PubMed

Cao, Jie; Wu, Zhiang; Wu, Junjie; Xiong, Hui

2013-04-01

Information-theoretic clustering aims to exploit information-theoretic measures as the clustering criteria. A common practice on this topic is the so-called Info-Kmeans, which performs K-means clustering with KL-divergence as the proximity function. While expert efforts on Info-Kmeans have shown promising results, a remaining challenge is to deal with high-dimensional sparse data such as text corpora. Indeed, it is possible that the centroids contain many zero-value features for high-dimensional text vectors, which leads to infinite KL-divergence values and creates a dilemma in assigning objects to centroids during the iteration process of Info-Kmeans. To meet this challenge, in this paper, we propose a Summation-bAsed Incremental Learning (SAIL) algorithm for Info-Kmeans clustering. Specifically, by using an equivalent objective function, SAIL replaces the computation of KL-divergence by the incremental computation of Shannon entropy. This can avoid the zero-feature dilemma caused by the use of KL-divergence. To improve the clustering quality, we further introduce the variable neighborhood search scheme and propose the V-SAIL algorithm, which is then accelerated by a multithreaded scheme in PV-SAIL. Our experimental results on various real-world text collections have shown that, with SAIL as a booster, the clustering performance of Info-Kmeans can be significantly improved. Also, V-SAIL and PV-SAIL indeed help improve the clustering quality at a lower cost of computation.
Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach.

PubMed

Rinaldi, Fabio; Schneider, Gerold; Kaljurand, Kaarel; Hess, Michael; Andronis, Christos; Konstandi, Ourania; Persidis, Andreas

2007-02-01

The amount of new discoveries (as published in the scientific literature) in the biomedical area is growing at an exponential rate. This growth makes it very difficult to filter the most relevant results, and thus the extraction of the core information becomes very expensive. Therefore, there is a growing interest in text processing approaches that can deliver selected information from scientific publications, which can limit the amount of human intervention normally needed to gather those results. This paper presents and evaluates an approach aimed at automating the process of extracting functional relations (e.g. interactions between genes and proteins) from scientific literature in the biomedical domain. The approach, using a novel dependency-based parser, is based on a complete syntactic analysis of the corpus. We have implemented a state-of-the-art text mining system for biomedical literature, based on a deep-linguistic, full-parsing approach. The results are validated on two different corpora: the manually annotated genomics information access (GENIA) corpus and the automatically annotated arabidopsis thaliana circadian rhythms (ATCR) corpus. We show how a deep-linguistic approach (contrary to common belief) can be used in a real world text mining application, offering high-precision relation extraction, while at the same time retaining a sufficient recall.
Dealing with extreme data diversity: extraction and fusion from the growing types of document formats

NASA Astrophysics Data System (ADS)

David, Peter; Hansen, Nichole; Nolan, James J.; Alcocer, Pedro

2015-05-01

The growth in text data available online is accompanied by a growth in the diversity of available documents. Corpora with extreme heterogeneity in terms of file formats, document organization, page layout, text style, and content are common. The absence of meaningful metadata describing the structure of online and open-source data leads to text extraction results that contain no information about document structure and are cluttered with page headers and footers, web navigation controls, advertisements, and other items that are typically considered noise. We describe an approach to document structure and metadata recovery that uses visual analysis of documents to infer the communicative intent of the author. Our algorithm identifies the components of documents such as titles, headings, and body content, based on their appearance. Because it operates on an image of a document, our technique can be applied to any type of document, including scanned images. Our approach to document structure recovery considers a finer-grained set of component types than prior approaches. In this initial work, we show that a machine learning approach to document structure recovery using a feature set based on the geometry and appearance of images of documents achieves a 60% greater F1- score than a baseline random classifier.
"Solid All the Way Through": Margaret Mahy's Ordinary Witches

ERIC Educational Resources Information Center

Waller, Alison

2004-01-01

In "The Haunting," "The Changeover," and "The Tricksters," Margaret Mahy fuses supernatural iconography of witchcraft and magic with images of ordinary and domestic adolescence. This article argues that Mahy's "fantastic realism" illuminates aspects of female teenage experience through a blend of myth, fairy tale, folklore and history, as well as…
Paperback Books for Children.

ERIC Educational Resources Information Center

Simmons, Beatrice, Ed.

Nearly 700 titles are included in "Paperback Books for Children," a guide for librarians and teachers published by the American Association of School Librarians. It is divided into the following sections: Picture Books; Fiction; Nonfiction; Myths, Folklore, and Fairy Tales; and Poetry, Rhymes, Riddles and Jokes. The book also contains an adult…
Around the World Through Stories. An Annotated Bibliography of Folk Literature.

ERIC Educational Resources Information Center

Folk, Judith A., Ed.

Part of a series entitled "Traditional Literature and Folklore in Library and Storytelling Programs," this annotated bibliography was produced by graduate students in the Traditional Literature and Oral Narration class at the University of Hawaii at Manoa. The bibliography is designed to provide librarians and teachers with information…
Stories That Must Not Die. Volume Three.

ERIC Educational Resources Information Center

Sauvageau, Juan

Local history and legends of spirits appear often in this bilingual Spanish and English collection of 10 Southwest traditional tales, intended to promote interest in bilingual/bicultural programs and to preserve the colorful folklore of the area. Black and white drawings accompany the stories which deal with animals ("A Parrot for…
Popcorn from the Sky.

ERIC Educational Resources Information Center

Kelin, Daniel A., II

2001-01-01

Describes how one Hawaiian elementary teacher uses drama and folklore to teach students about historic events and human rights, guiding students through an active, hands-on reconstruction of a dramatic historic event (nuclear testing in the Bikini Atoll) and providing just enough information at each step of the story to elicit accurate, emotional…
Fiesta, Spanish-S: 7509.73.

ERIC Educational Resources Information Center

Miyar, Olga C.

This course is intended to teach the student about Spanish and Latin American culture in order to broaden his capacity for understanding and appreciating those cultures. The subject matter in this course is the folklore, customs, types of food, and music of a culture as seen through holidays, festivals, styles, and special occasions. Learning…
Secondary Lessons from Indiana's Underground Railroad Institute (July 22-27, 2001).

ERIC Educational Resources Information Center

Indiana Univ.-Purdue Univ., Indianapolis. Geography Educators' Network of Indiana.

The Geography Educator's Network of Indiana's 2001 Exploring and Teaching Institute series led 23 educators from around the state on a six day traveling adventure. Participants explored art, literature/folklore, historical sites and archives, physical environments, architecture, economics, politics, and cultures associated with the Underground…
ENVIRONMENTAL PERCEPTIONS AND TRADITIONAL ENVIRONMENTAL KNOWLEDGE AMONG ETHNIC GROUPS OF ALTAI MOUNTAINS OF RUSSIA AND MONGOLIA

EPA Science Inventory

The intellectual merit of the proposed research lies in:
1. an enhanced understanding of people-environment relations in cultures with strong folklore and shamanic traditions;
2. identifying environmental/climatic change in the region from local observa...
3. Tall Tales of North America.
  
  ERIC Educational Resources Information Center
  
  Fresno City Unified School District, CA.
  
  Designed for use in junior high school language arts classes, this learning activity packet introduces students to North American folklore. Selected readings cover Indian tales, real folk heroes (Davy Crockett and John Henry), imaginary folk heroes (Paul Bunyan and Pecos Bill), Black folk stories (Brer Rabbit), and tales of Washington Irving. Each…
4. February Folklore: You'll Love It in Your Curriculum.
  
  ERIC Educational Resources Information Center
  
  Gale, David
  
  1980-01-01
  
  Folk tales which are particularly suited to February cover legends about George Washington, the Chinese zodiac, and Groundhog Day. Also included is a calendar of activities for February. These activities are appropriate for Groundhog Day, Valentine's Day, Chinese New Year, President's (Washington and Lincoln) Day, and Leap Year. (KC)
5. Serving PE Teachers' Professional Learning Experiences in Social Circus
  
  ERIC Educational Resources Information Center
  
  Li, Chung
  
  2010-01-01
  
  Background: Social circus has long been the folklore in the Chinese culture. Recently, initiatives have been undergoing to introduce it in the school physical education curriculum in Hong Kong. Aims: This article reports a study on 38 PE teachers' professional learning experiences while attending two 2-day workshops respectively concerning…
6. Arkansas Reference Sources. Bibliographic Series No. 26.
  
  ERIC Educational Resources Information Center
  
  Ahrens, Joan; Roberts, Joan
  
  Varied sources for information on Arkansas held by the Arkansas University library are listed. Bibliographies and indexes of Arkansas publications are included, as well as materials dealing with the state's folklore and literature, arts and humanities, government and law, business and economics, social conditions, labor, history and biography,…
- «
- 16
- 17
- 18
- 19
- 20
- »

The Visual Narrative: Kids, Comic Books, and Creativity.

ERIC Educational Resources Information Center

Hoff, Gary R.

1982-01-01

Discusses why junior high school students like comic books and examines how comic book art and visual narrative can be used in education. Copying comic book art can teach students several useful art techniques. Suggestions for using visual narratives to study science fiction, literature, folklore, and art history are included. (AM)
Integrating Anthropology in Elementary Social Studies.

ERIC Educational Resources Information Center

Zachlod, Michelle

2000-01-01

Discusses how anthropology can be integrated into the social studies classroom focusing on second and fifth grade levels. Demonstrates how different subject areas can be integrated with anthropology, such as history, geography, science, mathematics, and art. Covers topics such as foods, American Indian folklore, moonsticks, and myths and legends.…
Discovering Folklore Through Community Resources.

ERIC Educational Resources Information Center

Sumpter, Magdalena Benavides, Ed.

The folkways and cultural heritage of the Mexican Americans of South Texas are explored in this volume which is designed to provide the student with the opportunity for cultural enrichment, oral language development, and vocabulary expansion. The first chapter deals with "Creencias" which are common beliefs handed down from generation to…
Folklore and the Liberal Arts

ERIC Educational Resources Information Center

Brodie, Ian

2012-01-01

In this article, the author argues that the content of what folklorists study pervades all avenues of human interaction, from the food court to the lecture hall, and adverting to the seemingly negligible, interstitial, and ephemeral moments of informal communication is not only fruitful but a necessary complement to the liberal arts. These four…
Using Folklore Research to Improve Undergraduate Writing Skills.

ERIC Educational Resources Information Center

McClenon, James M.

1991-01-01

As a means of improving their writing skills, mostly African-American students from Elizabeth City State University gathered reports from African Americans in 16 northeastern North Carolina counties about extrasensory perception, contact with the dead, and other anomalous experiences and compared them to reports from Chinese students and students…
Owls On Silent Wings. The Wonder Series.

ERIC Educational Resources Information Center

Cooper, Ann C.

This curriculum guide is all about owls and provides information on the folklore related to owls, present populations, explanations of physical characteristics, exploring owl pellets, burrowing owls, snowy owls, and great horned owls. Included are eight activities using owl cards, owl pellets, puzzles, and origami. This guide aims to increase…
Czech Basic Course: Folklore.

ERIC Educational Resources Information Center

Defense Language Inst., Washington, DC.

This booklet is designed for use in the advanced phase of the Defense Language Institute's "Basic Course" in Czech. It is used in the advanced phase as a part of cultural background information. Reading selections, with vocabulary lists, include: (1) ethnography; (2) incantations and spells; (3) proverbs, sayings, and weather lore; (4) fairy tales…
Elementary Lessons from Indiana's Underground Railroad Institute (July 22-27, 2001).

ERIC Educational Resources Information Center

Indiana Univ.-Purdue Univ., Indianapolis. Geography Educators' Network of Indiana.

The Geography Educators' Network of Indiana's 2001 Exploring and Teaching Institute led 23 educators from around the state on a six day traveling adventure. Participants explored art, literature/folklore, historical sites and archives, physical environments, architecture, economics, politics, and cultures associated with the Underground Railroad…
Mel White's Readers Theatre Anthology: Twenty-Eight All-Occasion Readings for Storytellers.

ERIC Educational Resources Information Center

White, Melvin R.

This anthology of literature contains selections that can be performed in classrooms, workshops, and speech and theater conventions as reader's theater, defined as a dramatic approach to literature. Divided into six categories--comedy, mystery/suspense, Christmas specials, folklore, children's classics, and the human spirit--the anthology features…
Africanisms in Gullah Oral Tradition.

ERIC Educational Resources Information Center

Holloway, Joseph E.

1989-01-01

The Sea Islands off the coast of South Carolina, Georgia, and Northern Florida retain almost every element of African culture, including language, oral tradition, folklore, and aesthetics. Examines the African influence in the lifestyle of the Gullah people of the Sea Islands, especially in terms of their concept of time. (AF)
The Art of Mexico.

ERIC Educational Resources Information Center

Saccardi, Marianne

1997-01-01

Provides an annotated bibliography of books for grades K and up which explores the folklore, poetry, fiction, and art of Mexico, and focuses on the Mayans and Aztecs and Diego Rivera and Frida Kahlo. Also suggests various research, reading, drama, music, social studies, physical education, and art activities and lists related videos and Internet…
Data Bases and Other Computer Tools in the Humanities.

ERIC Educational Resources Information Center

Collegiate Microcomputer, 1990

1990-01-01

Describes 38 database projects sponsored by the National Endowment for the Humanities (NEH). Information on hardware, software, and access and dissemination is given for projects in the areas of art and architectural history; folklore; history; medicinal plants; interdisciplinary topics; language and linguistics; literature; and music and music…
Distilling Wisdom from Practice: Finding Meaning in PDS Stories

ERIC Educational Resources Information Center

Breault, Rick A.

2010-01-01

Much of what has been written about the Professional Development School (PDS) experience consists of recounting personal experiences. However, these accounts often offer little to readers since they are neither good research nor good storytelling. In this article I draw on mythology, folklore, psychology and literature to suggest that effective…
The Manager's Job: Folklore and Fact

ERIC Educational Resources Information Center

Mintzberg, Henry

1975-01-01

Contrasts popular myths about managers' duties with the facts, as indicated by various studies of managers and how they function. The author argues that managers often have distorted views of their role, and that they must first recognize what their job really involves in order to perform it effectively. (JG)
Yupik Eskimo Folklore and Children's Play Activities.

ERIC Educational Resources Information Center

Suskind, Diane; Phillip, Anna

Yaaruilta stories are told by children of all ages in Yupik-speaking Eskimo villages in Alaska. These stories are illustrated by figures sketched in mud with a ceremonial knife. The sustained involvement and effort of the children engaged in Yaaruilta may aid cognitive development by encouraging the learning of culturally related geometrical…
Hydrogen peroxide production from fibrous pectic cellulose analogs and effect on dermal fibroblasts

USDA-ARS?s Scientific Manuscript database

Naturally derived products with folklore remedies have in recent years been reconsidered for their benefit to wound healing i.e., honey’s application to chronic wound dressing products. Similarly, we have undertaken an evaluation of Fibrous pectin-cellulose (FPC) (cellulose blended with primary cel...
The presence of English and Spanish dyslexia in the Web

NASA Astrophysics Data System (ADS)

Rello, Luz; Baeza-Yates, Ricardo

2012-09-01

In this study we present a lower bound of the prevalence of dyslexia in the Web for English and Spanish. On the basis of analysis of corpora written by dyslexic people, we propose a classification of the different kinds of dyslexic errors. A representative data set of dyslexic words is used to calculate this lower bound in web pages containing English and Spanish dyslexic errors. We also present an analysis of dyslexic errors in major Internet domains, social media sites, and throughout English- and Spanish-speaking countries. To show the independence of our estimations from the presence of other kinds of errors, we compare them with the overall lexical quality of the Web and with the error rate of noncorrected corpora. The presence of dyslexic errors in the Web motivates work in web accessibility for dyslexic users.
Permanent alterations in catecholamine concentrations in discrete areas of brain in the offspring of rats treated with methylamphetamine and chlorpromazine

PubMed Central

Tonge, Sally R.

1973-01-01

Methylamphetamine hydrochloride (80 mg/l.) and/or chlorpromazine hydrochloride (200 mg/l.) have been administered in the drinking water of female Wistar rats during pregnancy and suckling. The offspring were weaned at 21 days and thereafter received no drugs. Nine months later, male offspring were killed and noradrenaline and normetanephrine concentrations were determined in eight discrete areas of the brains: neocortex, hippocampus, striatum, thalamus, hypothalamus, corpora quadrigemina, pons/medulla, and amygdala region. Both drugs appeared to have permanently altered catecholamine concentrations in several areas of the brain. There was evidence of antagonism between the effects of the two drugs in the hippocampus, striatum, thalamus, and corpora quadrigemina, where the individual drugs produced altered noradrenaline concentrations but a combination of the two had no effect. PMID:4722052
The Hebrew CHILDES corpus: transcription and morphological analysis

PubMed Central

Albert, Aviad; MacWhinney, Brian; Nir, Bracha

2014-01-01

We present a corpus of transcribed spoken Hebrew that reflects spoken interactions between children and adults. The corpus is an integral part of the CHILDES database, which distributes similar corpora for over 25 languages. We introduce a dedicated transcription scheme for the spoken Hebrew data that is sensitive to both the phonology and the standard orthography of the language. We also introduce a morphological analyzer that was specifically developed for this corpus. The analyzer adequately covers the entire corpus, producing detailed correct analyses for all tokens. Evaluation on a new corpus reveals high coverage as well. Finally, we describe a morphological disambiguation module that selects the correct analysis of each token in context. The result is a high-quality morphologically-annotated CHILDES corpus of Hebrew, along with a set of tools that can be applied to new corpora. PMID:25419199
Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

PubMed Central

Kim, Jae-Hoon; Kwon, Hong-Seok; Seo, Hyeong-Won

2015-01-01

A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word association between source words (resp., target words) and pivot words and the other estimates them from two parallel corpora based on word alignment tools for statistical machine translation. Empirical results on two language pairs (e.g., Korean-Spanish and Korean-French) have shown that the pivot-based approach is very promising for resource-poor languages and this approach observes its validity and usability. Furthermore, for words with low frequency, our method is also well performed. PMID:25983745

Morphosyntactic annotation of CHILDES transcripts*

PubMed Central

SAGAE, KENJI; DAVIS, ERIC; LAVIE, ALON; MACWHINNEY, BRIAN; WINTNER, SHULY

2014-01-01

Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. We have produced a corpus of over 18,800 utterances (approximately 65,000 words) with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for the English CHILDES data, which we used to automatically annotate the remainder of the English section of CHILDES. We have also extended the parser to Spanish, and are currently working on supporting more languages. The parser and the manually and automatically annotated data are freely available for research purposes. PMID:20334720
SyllabO+: A new tool to study sublexical phenomena in spoken Quebec French.

PubMed

Bédard, Pascale; Audet, Anne-Marie; Drouin, Patrick; Roy, Johanna-Pascale; Rivard, Julie; Tremblay, Pascale

2017-10-01

Sublexical phonotactic regularities in language have a major impact on language development, as well as on speech processing and production throughout the entire lifespan. To understand the impact of phonotactic regularities on speech and language functions at the behavioral and neural levels, it is essential to have access to oral language corpora to study these complex phenomena in different languages. Yet, probably because of their complexity, oral language corpora remain less common than written language corpora. This article presents the first corpus and database of spoken Quebec French syllables and phones: SyllabO+. This corpus contains phonetic transcriptions of over 300,000 syllables (over 690,000 phones) extracted from recordings of 184 healthy adult native Quebec French speakers, ranging in age from 20 to 97 years. To ensure the representativeness of the corpus, these recordings were made in both formal and familiar communication contexts. Phonotactic distributional statistics (e.g., syllable and co-occurrence frequencies, percentages, percentile ranks, transition probabilities, and pointwise mutual information) were computed from the corpus. An open-access online application to search the database was developed, and is available at www.speechneurolab.ca/syllabo . In this article, we present a brief overview of the corpus, as well as the syllable and phone databases, and we discuss their practical applications in various fields of research, including cognitive neuroscience, psycholinguistics, neurolinguistics, experimental psychology, phonetics, and phonology. Nonacademic practical applications are also discussed, including uses in speech-language pathology.
Making adjustments to event annotations for improved biological event extraction.

PubMed

Baek, Seung-Cheol; Park, Jong C

2016-09-16

Current state-of-the-art approaches to biological event extraction train statistical models in a supervised manner on corpora annotated with event triggers and event-argument relations. Inspecting such corpora, we observe that there is ambiguity in the span of event triggers (e.g., "transcriptional activity" vs. 'transcriptional'), leading to inconsistencies across event trigger annotations. Such inconsistencies make it quite likely that similar phrases are annotated with different spans of event triggers, suggesting the possibility that a statistical learning algorithm misses an opportunity for generalizing from such event triggers. We anticipate that adjustments to the span of event triggers to reduce these inconsistencies would meaningfully improve the present performance of event extraction systems. In this study, we look into this possibility with the corpora provided by the 2009 BioNLP shared task as a proof of concept. We propose an Informed Expectation-Maximization (EM) algorithm, which trains models using the EM algorithm with a posterior regularization technique, which consults the gold-standard event trigger annotations in a form of constraints. We further propose four constraints on the possible event trigger annotations to be explored by the EM algorithm. The algorithm is shown to outperform the state-of-the-art algorithm on the development corpus in a statistically significant manner and on the test corpus by a narrow margin. The analysis of the annotations generated by the algorithm shows that there are various types of ambiguity in event annotations, even though they could be small in number.
Biomechanically Preferred Consonant-Vowel Combinations Fail to Appear in Adult Spoken Corpora

PubMed Central

Whalen, D. H.; Giulivi, Sara; Nam, Hosung; Levitt, Andrea G.; Hallé, Pierre; Goldstein, Louis M.

2012-01-01

Certain consonant/vowel (CV) combinations are more frequent than would be expected from the individual C and V frequencies alone, both in babbling and, to a lesser extent, in adult language, based on dictionary counts: Labial consonants co-occur with central vowels more often than chance would dictate; coronals co-occur with front vowels, and velars with back vowels (Davis & MacNeilage, 1994). Plausible biomechanical explanations have been proposed, but it is also possible that infants are mirroring the frequency of the CVs that they hear. As noted, previous assessments of adult language were based on dictionaries; these “type” counts are incommensurate with the babbling measures, which are necessarily “token” counts. We analyzed the tokens in two spoken corpora for English, two for French and one for Mandarin. We found that the adult spoken CV preferences correlated with the type counts for Mandarin and French, not for English. Correlations between the adult spoken corpora and the babbling results had all three possible outcomes: significantly positive (French), uncorrelated (Mandarin), and significantly negative (English). There were no correlations of the dictionary data with the babbling results when we consider all nine combinations of consonants and vowels. The results indicate that spoken frequencies of CV combinations can differ from dictionary (type) counts and that the CV preferences apparent in babbling are biomechanically driven and can ignore the frequencies of CVs in the ambient spoken language. PMID:23420980
NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.

PubMed

Peng, Yifan; Wang, Xiaosong; Lu, Le; Bagheri, Mohammadhadi; Summers, Ronald; Lu, Zhiyong

2018-01-01

Negative and uncertain medical findings are frequent in radiology reports, but discriminating them from positive findings remains challenging for information extraction. Here, we propose a new algorithm, NegBio, to detect negative and uncertain findings in radiology reports. Unlike previous rule-based methods, NegBio utilizes patterns on universal dependencies to identify the scope of triggers that are indicative of negation or uncertainty. We evaluated NegBio on four datasets, including two public benchmarking corpora of radiology reports, a new radiology corpus that we annotated for this work, and a public corpus of general clinical texts. Evaluation on these datasets demonstrates that NegBio is highly accurate for detecting negative and uncertain findings and compares favorably to a widely-used state-of-the-art system NegEx (an average of 9.5% improvement in precision and 5.1% in F1-score). https://github.com/ncbi-nlp/NegBio.
Vietnamese Document Representation and Classification

NASA Astrophysics Data System (ADS)

Nguyen, Giang-Son; Gao, Xiaoying; Andreae, Peter

Vietnamese is very different from English and little research has been done on Vietnamese document classification, or indeed, on any kind of Vietnamese language processing, and only a few small corpora are available for research. We created a large Vietnamese text corpus with about 18000 documents, and manually classified them based on different criteria such as topics and styles, giving several classification tasks of different difficulty levels. This paper introduces a new syllable-based document representation at the morphological level of the language for efficient classification. We tested the representation on our corpus with different classification tasks using six classification algorithms and two feature selection techniques. Our experiments show that the new representation is effective for Vietnamese categorization, and suggest that best performance can be achieved using syllable-pair document representation, an SVM with a polynomial kernel as the learning algorithm, and using Information gain and an external dictionary for feature selection.
hSMR3A as a Marker for Patients With Erectile Dysfunction

PubMed Central

Tong, Yuehong; Tar, Moses; Monrose, Val; DiSanto, Michael; Melman, Arnold; Davies, Kelvin P.

2007-01-01

Purpose We recently reported that Vcsa1 is one of the most down-regulated genes in the corpora of rats in 3 distinct models of erectile dysfunction. Since gene transfer of plasmids expressing Vcsa1 or intracorporeal injection of its mature peptide product sialorphin into the corpora of aging rats was shown to restore erectile function, we proposed that the Vcsa1 gene has a direct role in erectile function. To determine if similar changes in gene expression occur in the corpora of human subjects with erectile dysfunction we identified a human homologue of Vcsa1 (hSMR3A) and determined the level of expression of hSMR3A in patients. Materials and Methods hSMR3A was identified as a homologue of Vcsa1 by searching protein databases for proteins with similarity. hSMR3A cDNA was generated and subcloned into the plasmid pVAX to generate pVAX-hSMR3A. pVAX-hSMR3A (25 or 100 μg) was intracorporeally injected into aging rats. The effect on erectile physiology was compared histologically and by measuring intracorporeal pressure/blood pressure with controls treated with the empty plasmid pVAX. Total RNA was extracted from human corporeal tissue obtained from patients undergoing previously scheduled penile surgery. Patients were grouped according to normal erectile function (3), erectile dysfunction and diabetes (5) and patients without diabetes but with erectile dysfunction (5). Quantitative reverse-transcriptase polymerase chain reaction was used to determine the hSMR3A expression level. Results Intracorporeal injection of 25 μg pVAX-hSMR3A was able to significantly increase the intracorporeal pressure-to-blood pressure ratio in aging rats compared to age matched controls. Higher amounts (100 μg) of gene transfer of the plasmid caused less of an improvement in the intracorporeal pressure-to-blood pressure ratio compared to controls, although there was histological and visual evidence that the animals were post-priapitic. These physiological effects were similar to previously reported effects of intracorporeal injection of pVAX-Vcsa1 into the corpora of aging rats, establishing hSMR3A as a functional homologue of Vcsa1. More than 10-fold down-regulation in hSMR3A transcript expression was observed in the corpora of patients with vs without erectile dysfunction. In patients with diabetes associated and nondiabetes associated erectile dysfunction hSMR3A expression was found to be down-regulated. Conclusions These results suggest that hSMR3A can act as a marker for erectile dysfunction associated with diabetic and nondiabetic etiologies. Given that our previous studies demonstrated that gene transfer of the Vcsa1 gene and intracorporeal injection of its protein product in rats can restore erectile function, these results suggest that therapies that increase the hSMR3A gene and product expression could potentially have a positive impact on erectile function. PMID:17512016
hSMR3A as a marker for patients with erectile dysfunction.

PubMed

Tong, Yuehong; Tar, Moses; Monrose, Val; DiSanto, Michael; Melman, Arnold; Davies, Kelvin P

2007-07-01

We recently reported that Vcsa1 is one of the most down-regulated genes in the corpora of rats in 3 distinct models of erectile dysfunction. Since gene transfer of plasmids expressing Vcsa1 or intracorporeal injection of its mature peptide product sialorphin into the corpora of aging rats was shown to restore erectile function, we proposed that the Vcsa1 gene has a direct role in erectile function. To determine if similar changes in gene expression occur in the corpora of human subjects with erectile dysfunction we identified a human homologue of Vcsa1 (hSMR3A) and determined the level of expression of hSMR3A in patients. hSMR3A was identified as a homologue of Vcsa1 by searching protein databases for proteins with similarity. hSMR3A cDNA was generated and subcloned into the plasmid pVAX to generate pVAX-hSMR3A. pVAX-hSMR3A (25 or 100 microg) was intracorporeally injected into aging rats. The effect on erectile physiology was compared histologically and by measuring intracorporeal pressure/blood pressure with controls treated with the empty plasmid pVAX. Total RNA was extracted from human corporeal tissue obtained from patients undergoing previously scheduled penile surgery. Patients were grouped according to normal erectile function (3), erectile dysfunction and diabetes (5) and patients without diabetes but with erectile dysfunction (5). Quantitative reverse-transcriptase polymerase chain reaction was used to determine the hSMR3A expression level. Intracorporeal injection of 25 microg pVAX-hSMR3A was able to significantly increase the intracorporeal pressure-to-blood pressure ratio in aging rats compared to age matched controls. Higher amounts (100 microg) of gene transfer of the plasmid caused less of an improvement in the intracorporeal pressure-to-blood pressure ratio compared to controls, although there was histological and visual evidence that the animals were post-priapitic. These physiological effects were similar to previously reported effects of intracorporeal injection of pVAX-Vcsa1 into the corpora of aging rats, establishing hSMR3A as a functional homologue of Vcsa1. More than 10-fold down-regulation in hSMR3A transcript expression was observed in the corpora of patients with vs without erectile dysfunction. In patients with diabetes associated and nondiabetes associated erectile dysfunction hSMR3A expression was found to be down-regulated. These results suggest that hSMR3A can act as a marker for erectile dysfunction associated with diabetic and nondiabetic etiologies. Given that our previous studies demonstrated that gene transfer of the Vcsa1 gene and intracorporeal injection of its protein product in rats can restore erectile function, these results suggest that therapies that increase the hSMR3A gene and product expression could potentially have a positive impact on erectile function.
Knowledge based word-concept model estimation and refinement for biomedical text mining.

PubMed

Jimeno Yepes, Antonio; Berlanga, Rafael

2015-02-01

Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.
Folklore around the World: An Annotated Bibliography of Folk Literature.

ERIC Educational Resources Information Center

Eastman, Kristen Paletti, Ed.; Omura, Grace Inokuchi, Ed.

Fourth in a series, the annotated bibliographies in this collection were compiled by students in the Traditional Literature and Oral Narration class at the School of Library and Information Studies, University of Hawaii at Manoa. These bibliographies are designed to make information about specific topics in traditional literature easily accessible…
Indiana Underground Railroad Folklore: Western Route and Daviess County.

ERIC Educational Resources Information Center

Shelton, Lois G.

Materials for teaching a unit about the Underground Railroad (the system set up to assist fleeing, runaway slaves heading north) in Indiana are presented. Specifically, the Western Route that passed through Daviess County in Indiana is examined. The materials provide background on the Underground Railroad and the Western Route, plans for teaching…
Malay Digital Folklore: Using Multimedia to Educate Children through Storytelling

ERIC Educational Resources Information Center

Abidin, Mohd Izani Zainal; Razak, Aishah Abd.

2003-01-01

In the early centuries of human evolution, the information to express cultures, social contents, ideas, values, and the society itself were primarily developed by means of expression. This information was represented in the form of classical, signs, figures, traditional manuscripts and performing arts. On the other hand, it becomes less important…
Indigenous Minority Languages of Russia: A Bibliographical Guide.

ERIC Educational Resources Information Center

Matsumura, Kazuto, Ed.

This publication is a printed version of 54 Web documents as they were at the end of March 2002. It includes selected lists of school textbooks, dictionaries, grammars, grammatical descriptions, and folklore collections in and on 54 indigenous minority languages of Russia, many of which are endangered. The 54 languages are arranged in the…
Bats: Swift Shadows in the Twilight. The Wonder Series.

ERIC Educational Resources Information Center

Cooper, Ann C.

This curriculum guide is all about bats and provides information through the telling of stories about bats and their history and folklore. The activities contained in this guide employ an interdisciplinary approach and use mazes, puzzles, model-building, and board games to interest and inform students. Topics covered include the physical…
Carpentier, Collecting, and "Lo Barroco Americano"

ERIC Educational Resources Information Center

Rogers, Charlotte

2011-01-01

Throughout his life, Alejo Carpentier was a tireless collector of paintings, sculpture, musical recordings, and folklore objects. In light of Carpentier's Swiss birth and many years of residence outside of Cuba, the act of collecting plays a crucial role in defining the relationship between the author and Latin American culture in his life and…
Thematic Units in Teaching English and the Humanities.

ERIC Educational Resources Information Center

Spann, Sylvia, Ed.; Culp, Mary Beth, Ed.

This book is dedicated to the use of a humanistic, thematic approach to the teaching of English. The chapters deals with such topics as teaching poetry, teaching American folklore and tradition, and helping students achieve greater self-knowledge and self-understanding through using the "speaking voice" in oral and written communication.…
The Inner World: A Psycho-analytic Study of Childhood and Society in India.

ERIC Educational Resources Information Center

Kakar, Sudhir

This book explores the developmental significance of Hindu infancy and childhood, and its influence on Indian identity formation. Drawing upon anthropological evidence, life-historical and clinical data, mythology and folklore, the investigation encompasses collective fantasy as well as the daily worlds of Hindu social organization in search of…
Lithuanian Astronomy

NASA Astrophysics Data System (ADS)

Sudzius, J.; Murdin, P.

2002-01-01

Lithuanian folklore, archaic calendars and terminology show that Lithuanians were interested in astronomy from ancient times. A lot of celestial bodies have names of Lithuanian origin that are not related to widely accepted ancient Greek mythology. For example, the Milky Way is named `Pauksciu Takas' (literally the way of birds), the constellation of the Great Bear `Didieji Grizulo Ratai' (literal...
Folklife Annual, 1987.

ERIC Educational Resources Information Center

Jabbour, Alan, Ed.; Hardin, James, Ed.

This annual publication is intended to promote the documentation and study of the folklife of the United States, to share the traditions, values, and activities of U.S. folk culture, and to serve as a national forum for the discussion of ideas and issues in folklore and folklife. The articles in this collection are: (1) "Eating in the Belly…
Supernatural Themes in Selected Children's Stories of Isaac Bashevis Singer.

ERIC Educational Resources Information Center

Schlessinger, June H.; Vanderryst, June D.

1989-01-01

Discusses the impact of the traditional folklore theme of good versus evil on children's development and analyzes the development of this theme using magical and supernatural situations in the work of Isaac Bashevis Singer. A selected bibliography of work by and literary criticisms of Singer's writings is provided. (five references) (CLB)

American Folk Music and Folklore Recordings 1984: A Selected List.

ERIC Educational Resources Information Center

Library of Congress, Washington, DC. American Folklife Center.

In an effort to encourage appreciation of the rich folk heritage of the United States, the American Folklife Center of the Library of Congress presents this annual list of 30 recordings selected by a panel of distinguished experts from nearly two hundred titles submitted by producers, suggested by folklorists and ethnomusicologists, and proposed…
Wejkwapniaq (Coming of the Dawn).

ERIC Educational Resources Information Center

Christmas, Peter

Just as the Micmac word "wejkwapniaq" may be interpreted in several ways, the title of this compilation of fact, folklore, and history of the Micmac people can be interpreted to mean "The Dawn of Nova Scotia History" or "The Coming of New and Bad Things for the Micmac". This teacher's guide provides information on the…
African Diaspora Movement Arts in Philadelphia: A Beginning Resource List. Philadelphia Folklore Project Working Papers #10.

ERIC Educational Resources Information Center

Brown-Danquah, Benita Binta

This guide provides history, format, contact names, addresses, and phone numbers of some African dance and African American marching units in Philadelphia (Pennsylvania). The working papers are divided into two categories. "Part One: Movements of African Dance in Philadelphia" begins with a sensitive, detailed explanation of the…
Latin America: An Annotated List of Materials for Children.

ERIC Educational Resources Information Center

United Nations Children's Fund, New York, NY. United States Committee.

This annotated bibliography of materials on Latin America is intended for children to age 14. South and Central America, Mexico, and the French, English, and Spanish speaking areas of the Caribbean are covered. Listings are by country and include history books, geography books, fiction, nonfiction, poetry, and folklore books. Some works in Spanish…
Once upon a Tale. 1995 Florida Library Youth Program.

ERIC Educational Resources Information Center

Abramoff, Carolann Palm, Comp.; And Others

The Florida Library Youth Program is an extension of the Florida Summer Library Program. Many libraries have wanted to provide programs for school-age children at times other than the traditional summer vacation, and this guide responds to their needs. The theme, "Once Upon a Tale," focuses on folklore, stories, and storytelling. The…
Spike Lee and Commentaries on His Work. Occasional Papers Series 2, No. 1.

ERIC Educational Resources Information Center

Hudson, Herman C., Ed.

This monograph presents a critical essay and a comprehensive 454-item bibliography on the contemporary African-American filmmaker, Spike Lee. The essay, entitled "African-American Folklore and Cultural History in the Films of Spike Lee" (Gloria J. Gibson-Hudson), analyzes Lee's filmmaking approach from a cultural and historical…
Juegos y Diversiones. (Games Collected and Adapted to Teach Spanish to Children.)

ERIC Educational Resources Information Center

Marquez, Nancy, Ed.; And Others

Games, both from the folklore heritage of children in Spanish-speaking countries and those created in the classroom, are excellent ways to teach language to children because they accomplish their language goals while entertaining and involving the children, often physically. Most games, because they are rigidly patterned and repetitious, are…
Complexities of Vietnamese Femininities: A Resource for Rethinking Women's University Leadership Practices

ERIC Educational Resources Information Center

Do, Van Hanh Thi; Brennan, Marie

2015-01-01

This paper develops a dialogical encounter between northern-inspired theorisations of gender and Vietnam's historical and cultural differentiation identified through the presence of matriarchy in ancient societies and its popularity in folklore and contemporary politics. The article draws on interviews with 12 senior women from 8 universities in…
African American Physical Education Folklore Surrounding School Transition

ERIC Educational Resources Information Center

Woodruff, Elizabeth A.; Curtner-Smith, Matthew D.

2015-01-01

Transferring from elementary to secondary school can be difficult for many children, and students making this transition often suffer from anxiety and stress. One source of stress can be found in the scary stories transitioning pupils hear about their new schools, particularly those about physical education and sport. The purpose of this study was…
El Espiritu Siempre Eterno Del Mexico Americano (The Always Eternal Spirit of the Mexican American).

ERIC Educational Resources Information Center

Quintanilla, Guadalupe C.; Silman, James B.

Twenty stories and essays suitable for intermediate and secondary grades illustrate the enduring spirit of Mexican American life, legend, custom, and culture. The Spanish language book describes the ceremonies of baptism, engagement, marriage, and the "quinceanera" (a girl's 15th birthday). Folklore (magic spells, superstitions, "cuentos" or…
Description of an injury in a human caused by a false tocandira (Dinoponera gigantea, Perty, 1833) with a revision on folkloric, pharmacological and clinical aspects of the giant ants of the genera Paraponera and Dinoponera (sub-family Ponerinae).

PubMed

Haddad Junior, Vidal; Cardoso, João Luiz Costa; Moraes, Roberto Henrique Pinto

2005-01-01

The authors observed an injury caused by the sting of a false tocandira ant in the hand of an amateur fisherman and they describe the clinical findings and the evolution of the envenoming, which presented an acute and violent pain, cold sweating, nausea, a vomiting episode, malaise, tachycardia and left axillary's lymphadenopathy. About three hours after the accident, still feeling intense pain in the place of the sting, he presented an episode of great amount of blood in the feces with no history of digestive, hematological or vascular problems. The intense pain decreased after eight hours, but the place stayed moderately painful for about 24 hours. In that moment, he presented small grade of local edema and erythema. The authors still present the folkloric, pharmacological and clinical aspects related to the tocandiras stings, a very interesting family of ants, which presents the largest and more venomous ants of the world.
Buddhism, the status of women and the spread of HIV/AIDS in Thailand.

PubMed

Klunklin, Areewan; Greenwood, Jennifer

2005-01-01

The common-sense construction of Buddhism is that of a general power for good; the less positive aspects of Buddhism's power, especially when reinforced by folklore and ancient superstition, is infrequently recognised. In this article we make explicit Buddhism's less positive power, particularly as it relates to the status of women and, by implication, its role in the human immunodeficiency (HIV)/acquired immune deficiency syndrome (AIDS) epidemic in Thailand. The Buddhist, folklore, and superstitious bases of Thai misogyny are explored, together with its expression in the differential gender roles of women and men. In addition, the attitudes of both women and men to commercial sex workers (CSWs) and condom use is discussed. The implications of these attitudinal analyses to the epidemiology of HIV/AIDS in Thailand is outlined. We argue that the current spread of HIV/AIDS in Thailand is primarily a function of the inferior status of women, which, in turn, is a function of Buddhism and Thai cultural beliefs. In light of this, some realistic strategies to address the problem also are discussed.
Compressed Natural Gas Safety in Transit Operations

DOT National Transportation Integrated Search

1995-09-14

This report examines the safety issues relating to the use of Compressed Natural Gas (CNG) in transit service. The safety issues were determined by on-site surveys performed by Battelle of Columbus, Ohio and Science Applications International Corpora...
WARCProcessor: An Integrative Tool for Building and Management of Web Spam Corpora.

PubMed

Callón, Miguel; Fdez-Glez, Jorge; Ruano-Ordás, David; Laza, Rosalía; Pavón, Reyes; Fdez-Riverola, Florentino; Méndez, Jose Ramón

2017-12-22

In this work we present the design and implementation of WARCProcessor, a novel multiplatform integrative tool aimed to build scientific datasets to facilitate experimentation in web spam research. The developed application allows the user to specify multiple criteria that change the way in which new corpora are generated whilst reducing the number of repetitive and error prone tasks related with existing corpus maintenance. For this goal, WARCProcessor supports up to six commonly used data sources for web spam research, being able to store output corpus in standard WARC format together with complementary metadata files. Additionally, the application facilitates the automatic and concurrent download of web sites from Internet, giving the possibility of configuring the deep of the links to be followed as well as the behaviour when redirected URLs appear. WARCProcessor supports both an interactive GUI interface and a command line utility for being executed in background.
Birth of the cool: a two-centuries decline in emotional expression in Anglophone fiction.

PubMed

Morin, Olivier; Acerbi, Alberto

2017-12-01

The presence of emotional words and content in stories has been shown to enhance a story's memorability, and its cultural success. Yet, recent cultural trends run in the opposite direction. Using the Google Books corpus, coupled with two metadata-rich corpora of Anglophone fiction books, we show a decrease in emotionality in English-speaking literature starting plausibly in the nineteenth century. We show that this decrease cannot be explained by changes unrelated to emotionality (such as demographic dynamics concerning age or gender balance, changes in vocabulary richness, or changes in the prevalence of literary genres), and that, in our three corpora, the decrease is driven almost entirely by a decline in the proportion of positive emotion-related words, while the frequency of negative emotion-related words shows little if any decline. Consistently with previous studies, we also find a link between ageing and negative emotionality at the individual level.
Emergence of linguistic laws in human voice

PubMed Central

Torre, Iván González; Luque, Bartolo; Lacasa, Lucas; Luque, Jordi; Hernández-Fernández, Antoni

2017-01-01

Linguistic laws constitute one of the quantitative cornerstones of modern cognitive sciences and have been routinely investigated in written corpora, or in the equivalent transcription of oral corpora. This means that inferences of statistical patterns of language in acoustics are biased by the arbitrary, language-dependent segmentation of the signal, and virtually precludes the possibility of making comparative studies between human voice and other animal communication systems. Here we bridge this gap by proposing a method that allows to measure such patterns in acoustic signals of arbitrary origin, without needs to have access to the language corpus underneath. The method has been applied to sixteen different human languages, recovering successfully some well-known laws of human communication at timescales even below the phoneme and finding yet another link between complexity and criticality in a biological system. These methods further pave the way for new comparative studies in animal communication or the analysis of signals of unknown code. PMID:28272418
Emergence of linguistic laws in human voice

NASA Astrophysics Data System (ADS)

Torre, Iván González; Luque, Bartolo; Lacasa, Lucas; Luque, Jordi; Hernández-Fernández, Antoni

2017-03-01

Linguistic laws constitute one of the quantitative cornerstones of modern cognitive sciences and have been routinely investigated in written corpora, or in the equivalent transcription of oral corpora. This means that inferences of statistical patterns of language in acoustics are biased by the arbitrary, language-dependent segmentation of the signal, and virtually precludes the possibility of making comparative studies between human voice and other animal communication systems. Here we bridge this gap by proposing a method that allows to measure such patterns in acoustic signals of arbitrary origin, without needs to have access to the language corpus underneath. The method has been applied to sixteen different human languages, recovering successfully some well-known laws of human communication at timescales even below the phoneme and finding yet another link between complexity and criticality in a biological system. These methods further pave the way for new comparative studies in animal communication or the analysis of signals of unknown code.
A set of high quality colour images with Spanish norms for seven relevant psycholinguistic variables: the Nombela naming test.

PubMed

Moreno-Martinez, Francisco Javier; Montoro, Pedro R; Laws, Keith R

2011-05-01

This paper presents a new corpus of 140 high quality colour images belonging to 14 subcategories and covering a range of naming difficulty. One hundred and six Spanish speakers named the items and provided data for several psycholinguistic variables: age of acquisition, familiarity, manipulability, name agreement, typicality and visual complexity. Furthermore, we also present lexical frequency data derived internet search hits. Apart from the large number of variables evaluated, these stimuli present an important advantage with respect to other comparable image corpora in so far as naming performance in healthy individuals is less prone to ceiling effect problems. Reliability and validity indexes showed that our items display similar psycholinguistic characteristics to those of other corpora. In sum, this set of ecologically valid stimuli provides a useful tool for scientists engaged in cognitive and neuroscience-based research.
Subtitle-Based Word Frequencies as the Best Estimate of Reading Behavior: The Case of Greek

PubMed Central

Dimitropoulou, Maria; Duñabeitia, Jon Andoni; Avilés, Alberto; Corral, José; Carreiras, Manuel

2010-01-01

Previous evidence has shown that word frequencies calculated from corpora based on film and television subtitles can readily account for reading performance, since the language used in subtitles greatly approximates everyday language. The present study examines this issue in a society with increased exposure to subtitle reading. We compiled SUBTLEX-GR, a subtitled-based corpus consisting of more than 27 million Modern Greek words, and tested to what extent subtitle-based frequency estimates and those taken from a written corpus of Modern Greek account for the lexical decision performance of young Greek adults who are exposed to subtitle reading on a daily basis. Results showed that SUBTLEX-GR frequency estimates effectively accounted for participants’ reading performance in two different visual word recognition experiments. More importantly, different analyses showed that frequencies estimated from a subtitle corpus explained the obtained results significantly better than traditional frequencies derived from written corpora. PMID:21833273
WARCProcessor: An Integrative Tool for Building and Management of Web Spam Corpora

PubMed Central

Callón, Miguel; Fdez-Glez, Jorge; Ruano-Ordás, David; Laza, Rosalía; Pavón, Reyes; Méndez, Jose Ramón

2017-01-01

In this work we present the design and implementation of WARCProcessor, a novel multiplatform integrative tool aimed to build scientific datasets to facilitate experimentation in web spam research. The developed application allows the user to specify multiple criteria that change the way in which new corpora are generated whilst reducing the number of repetitive and error prone tasks related with existing corpus maintenance. For this goal, WARCProcessor supports up to six commonly used data sources for web spam research, being able to store output corpus in standard WARC format together with complementary metadata files. Additionally, the application facilitates the automatic and concurrent download of web sites from Internet, giving the possibility of configuring the deep of the links to be followed as well as the behaviour when redirected URLs appear. WARCProcessor supports both an interactive GUI interface and a command line utility for being executed in background. PMID:29271913

Tradition in treating taboo: Folkloric medicinal wisdom of the aboriginals of Purulia district, West Bengal, India against sexual, gynaecological and related disorders.

PubMed

Modak, Biplob Kumar; Gorai, Partha; Dhan, Raghunath; Mukherjee, Anuradha; Dey, Abhijit

2015-07-01

In order to explore the traditional medicine practised by the ethnic communities residing in the topographically and climatically challenged Purulia, an underprivileged district of West Bengal, India, a quantitative ethnobiological approach was adopted to document the folkloric use of ethnomedicinals against different sexual, gynaecological and related ailments. Ethnobiological surveys were conducted during 2012-2015 by interviewing 82 informants or traditional healers with the help of a semi-structured questionnaire. The survey included questions on botanical and non-botanical ingredients and additives mixed with monoherbal and polyherbal formulations, vernacular names of the plants and animals, methods of preparation and administration and restrictions during medications. Additional quantitative indices such as use value, informant׳s consensus factor and fidelity level were used for data analysis. Twenty eight sexual and gynaecological disorders were found to be treated with 18 monoherbal and 31 polyherbal formulations consisting of a total number of 96 plant species from 86 genera and 47 families and four animal species. A variety of additives, either botanicals or non-botanicals were used with the formulations for higher efficacy and taste enhancement. Fabaceae (16 species) was found to be the most common family of medicinal plants whereas herbs (42.7%) and roots (32%) were the most common habit type and plant part used respectively. Use value, informant׳s consensus factor and fidelity level indicate frequency and coherence of citations. Age old belief on traditional medicine prevails in the studied area due to its efficacy, inexpensive price and the remoteness of tribal villages from conventional medical centres. Traditional healers had detailed knowledge of preparations, doses, methods of administration, restrictions during medications, safety and efficacy of using folkloric therapeutics against sexual and gynaecological disorders. Possible synergistic interactions among phytochemicals and additives were indicated to explain enhanced therapeutic efficacy of mixed herbal formulations. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing

PubMed Central

Wu, Stephen; Miller, Timothy; Masanz, James; Coarr, Matt; Halgrim, Scott; Carrell, David; Clark, Cheryl

2014-01-01

A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP. PMID:25393544
Event-based text mining for biology and functional genomics

PubMed Central

Thompson, Paul; Nawaz, Raheel; McNaught, John; Kell, Douglas B.

2015-01-01

The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of ‘events’, i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research. PMID:24907365
Multimodal Word Meaning Induction From Minimal Exposure to Natural Text.

PubMed

Lazaridou, Angeliki; Marelli, Marco; Baroni, Marco

2017-04-01

By the time they reach early adulthood, English speakers are familiar with the meaning of thousands of words. In the last decades, computational simulations known as distributional semantic models (DSMs) have demonstrated that it is possible to induce word meaning representations solely from word co-occurrence statistics extracted from a large amount of text. However, while these models learn in batch mode from large corpora, human word learning proceeds incrementally after minimal exposure to new words. In this study, we run a set of experiments investigating whether minimal distributional evidence from very short passages suffices to trigger successful word learning in subjects, testing their linguistic and visual intuitions about the concepts associated with new words. After confirming that subjects are indeed very efficient distributional learners even from small amounts of evidence, we test a DSM on the same multimodal task, finding that it behaves in a remarkable human-like way. We conclude that DSMs provide a convincing computational account of word learning even at the early stages in which a word is first encountered, and the way they build meaning representations can offer new insights into human language acquisition. Copyright © 2017 Cognitive Science Society, Inc.
Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method.

PubMed

Liu, H; Lussier, Y A; Friedman, C

2001-08-01

With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.
Blood transfusion and resuscitation using penile corpora: an experimental study.

PubMed

Abolyosr, Ahmad; Sayed, M A; Elanany, Fathy; Smeika, M A; Shaker, S E

2005-10-01

To test the feasibility of using the penile corpora cavernosa for blood transfusion and resuscitation purposes. Three male donkeys were used for autologous blood transfusion into the corpus cavernosum during three sessions with a 1-week interval between each. Two blood units (450 mL each) were transfused per session to each donkey. Moreover, three dogs were bled up until a state of shock was produced. The mean arterial blood pressure decreased to 60 mm Hg. The withdrawn blood (mean volume 396.3 mL) was transfused back into their corpora cavernosa under 150 mm Hg pressure. Different transfusion parameters were assessed. The Assiut faculty of medicine ethical committee approved the study before its initiation. For the donkey model, the mean time of blood collection was 12 minutes. The mean time needed to establish corporal access was 22 seconds. The mean time of blood transfusion was 14.2 minutes. The mean rate of blood transfusion was 31.7 mL/min. Mild penile elongation with or without mild penile tumescence was observed on four occasions. All penile shafts returned spontaneously to their pretransfusion state at a maximum of 5 minutes after cessation of blood transfusion. No extravasation, hematoma formation, or color changes occurred. Regarding the dog model, the mean rate of transfusion was 35.2 mL/min. All dogs were resuscitated at the end of the transfusion. The corpus cavernosum is a feasible, simple, rapid, and effective alternative route for blood transfusion and venous access. It can be resorted to whenever necessary. It is a reliable means for volume replacement and resuscitation in males.
Two-stage repair with long channel technique for primary severe hypospadias.

PubMed

Yang, Tianyou; Xie, Qigen; Liang, Qifeng; Xu, Yeqing; Su, Cheng

2014-07-01

To introduce a 2-stage repair with long channel technique for primary severe hypospadias. Between March 2010 and November 2013, 16 children with primary severe hypospadias underwent 2-stage repair with long channel technique. The technique applied in the first stage was almost the same as Bracka 2-stage repair. The second stage was usually performed 6 months later. A small transverse skin incision, distal to the meatal opening and about 1 cm in length, was made. Dissection was carried out deep into the surface of corpora cavernosa and a plane between the subcutaneous tissue and corpora cavernosa was reached. A long channel between the subcutaneous tissue and corpora cavernosa was created from the para-meatus incision to the apex of glans. A rectangle, pedicle scrotal septal skin flap was elevated and tubularized into neourethra around a stenting tube. The neourethra was delivered through the subcutaneous channel and fixed at the apex of glans. The mean operation time of the first and second stages was 65 and 55 minutes, respectively. The mean age at the first and second operation was 28 and 36 months, respectively. The mean follow-up was 10 months. No fistula, glans dehiscence, urethral stricture, and meatal stenosis were recorded. One scrotal surgical wound infection occurred after second stage and healed successfully with antibiotics treatment. The overall cosmetic and functional outcomes after second stage were excellent. Two-stage repair with long channel technique was applicable for primary severe hypospadias, with excellent short-term outcomes. Copyright © 2014 Elsevier Inc. All rights reserved.
Information Extraction from Unstructured Text for the Biodefense Knowledge Center

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samatova, N F; Park, B; Krishnamurthy, R

2005-04-29

The Bio-Encyclopedia at the Biodefense Knowledge Center (BKC) is being constructed to allow an early detection of emerging biological threats to homeland security. It requires highly structured information extracted from variety of data sources. However, the quantity of new and vital information available from every day sources cannot be assimilated by hand, and therefore reliable high-throughput information extraction techniques are much anticipated. In support of the BKC, Lawrence Livermore National Laboratory and Oak Ridge National Laboratory, together with the University of Utah, are developing an information extraction system built around the bioterrorism domain. This paper reports two important pieces ofmore » our effort integrated in the system: key phrase extraction and semantic tagging. Whereas two key phrase extraction technologies developed during the course of project help identify relevant texts, our state-of-the-art semantic tagging system can pinpoint phrases related to emerging biological threats. Also we are enhancing and tailoring the Bio-Encyclopedia by augmenting semantic dictionaries and extracting details of important events, such as suspected disease outbreaks. Some of these technologies have already been applied to large corpora of free text sources vital to the BKC mission, including ProMED-mail, PubMed abstracts, and the DHS's Information Analysis and Infrastructure Protection (IAIP) news clippings. In order to address the challenges involved in incorporating such large amounts of unstructured text, the overall system is focused on precise extraction of the most relevant information for inclusion in the BKC.« less
Corpora and Data Preparation for Information Extraction

DTIC Science & Technology

1993-09-01

technical publications in fields such as communications, airline transportation, rubber & plas- tics, and food marketing . The Japanese-language...types in the U. S., for example, avocado farms, electric popcorn popper sales, management consulting. The template-filling task required that products
Effects of ageing and streptozotocin-induced diabetes on connexin43 and P2 purinoceptor expression in the rat corpora cavernosa and urinary bladder.

PubMed

Suadicani, Sylvia O; Urban-Maldonado, Marcia; Tar, Moses T; Melman, Arnold; Spray, David C

2009-06-01

To investigate whether ageing and diabetes alter the expression of the gap junction protein connexin43 (Cx43) and of particular purinoceptor (P2R) subtypes in the corpus cavernosum and urinary bladder, and determine whether changes in expression of these proteins correlate with development of erectile and bladder dysfunction in diabetic and ageing rats. Erectile and bladder function of streptozotocin (STZ)-induced diabetic, insulin-treated and age-matched control Fischer-344 rats were evaluated 2, 4 and 8 months after diabetes induction by in vivo cystometry and cavernosometry. Corporal and bladder tissue were then isolated at each of these sample times and protein expression levels of Cx43 and of various P2R subtypes were determined by Western blotting. In the corpora of control rats ageing was accompanied by a significant decrease in Cx43 and P2X(1)R, and increase in P2X(7)R expression. There was decreased Cx43 and increased P2Y(4)R expression in the ageing control rat bladder. There was a significant negative correlation between erectile capacity and P2X(1)R expression levels, and a positive correlation between bladder spontaneous activity and P2Y(4)R expression levels. There was already development of erectile dysfunction and bladder overactivity at 2 months after inducing diabetes, the earliest sample measured in the study. The development of these urogenital complications was accompanied by significant decreases in Cx43, P2Y(2)R, P2X(4)R and increase in P2X(1)R expression in the corpora, and by a doubling in Cx43 and P2Y(2)R, and significant increase in P2Y(4)R expression in the bladder. Changes in Cx43 and P2R expression were largely prevented by insulin therapy. Ageing and diabetes mellitus markedly altered the expression of the gap junction protein Cx43 and of particular P2R subtypes in the rat penile corpora and urinary bladder. These changes in Cx43 and P2R expression provide the molecular substrate for altered gap junction and purinergic signalling in these tissues, and thus probably contribute to the early development of erectile dysfunction and higher detrusor activity in ageing and in diabetic rats.
The Middle East: An Annotated Bibliography of Literature for Children.

ERIC Educational Resources Information Center

Maehr, Jane

This is an annotated bibliography of folklore, fiction and nonfiction about the Middle East, written in English for children aged 5 and older. There are eleven chapters - one which focuses on the entire Middle Eastern region, and ten which deal with individual countries: Iran, Iraq, Israel, Jordan, Kuwait, Lebanon, Saudi Arabia, Syria, Turkey, and…
Exploring a Common Past: Researching and Interpreting the Underground Railroad.

ERIC Educational Resources Information Center

National Park Service (Dept. of Interior), Washington, DC.

Although the Underground Railroad has been an integral part of U.S. history and folklore for well over 150 years, the recent past has seen an increased public interest in the identification of historic sites associated with the experiences of fugitive slaves. This booklet is part of a National Park Service initiative to design research methods…
Child Bilingualism in an Immigrant Society: Implications of Borrowing in the Hebrew 'Language of Games.'

ERIC Educational Resources Information Center

Bar-Adon, Aaron

The first waves of immigrants arriving in Palestine were faced with the problem of forming a new culture and creating a new language, actually, reviving Hebrew, an ancient language. The children were faced with creating their own traditions, games, and folklore; in so doing, through straight borrowing, spontaneous translation (loan translation),…
Chinese-Russian Study Center. Bibliography of Materials (with Supplement Number 1).

ERIC Educational Resources Information Center

McIndoe, Sara S.

The major bibliographic emphasis in this work is on Russia and China, although some of the sub-headings and entries also focus on India and Japan. Entries are listed under the following categories: 1) Bibliographies; 2) Art, Music, Theater, and Dance; 3) Civilization; 4) Communism, Marxism, and Socialism; 5) Customs and Folklore; 6) Economy and…
Being Black in America, K-12. A Multimedia Listing of the 70's.

ERIC Educational Resources Information Center

Dean, Frances C., Comp.

This catalog lists over 600 sources, including books, records, kits, and filmstrips covering both black American and African history, folklore, literature, and present day life. It is designed to assist personnel in the selection of media for schools. The contents are organized according to the Dewey Decimal Classification System: Generalities;…
Beyond the Folklore: A Strategy for Identifying Quality Undergraduate Colleges

ERIC Educational Resources Information Center

Conrad, Clinton F.

2012-01-01

College and university quality--what it is and how to identify it--is a preoccupation of many prospective college students and their parents, high school counselors, and college admission personnel. Regardless of class, race, and gender, it is no longer enough for a growing number of individuals simply to attend college: matriculating at an…
Historical Dictionary of Children's Literature. Historical Dictionaries of Literature and the Arts

ERIC Educational Resources Information Center

O'Sullivan, Emer

2010-01-01

Children's literature comes from a number of different sources--folklore (folk- and fairy tales), books originally for adults and subsequently adapted for children, and material authored specifically for them--and its audience ranges from infants through middle graders to young adults (readers from about 12 to 18 years old). Its forms include…
Toward a Theoretical Approach to Teaching Folk Art: A Definition.

ERIC Educational Resources Information Center

Congdon, Kristin G.

1987-01-01

Proposes a definition for folk art based on analyzing and sorting the descriptors and identifiers used in the disciplines of art history, folklore, anthropology, and antique and folk art collection. The proposed definition is not meant to specify an undeniable category of art, but rather to suggest specific aspects which should be identified in…
"Work and Leisure in Country Schools in Wyoming." Country School Legacy: Humanities on the Frontier.

ERIC Educational Resources Information Center

Gulliford, Andrew; And Others

The country school legacy of Wyoming is rich in history, folklore, and tradition. Materials (many anecdotal) gathered from school records, oral histories, autobiographies, and memoirs provide glimpses into the diverse and demanding role of frontier teachers (who were mostly female and, by contract requirement, usually single) and the work and…
Mistletoes on Hardwoods in the United States (FIDL)

Treesearch

Robert F. Scharpf; Frank G. Hawksworth

1974-01-01

The traditional use of mistletoes during holiday seasons, their involvement in folklore and legend, their consumption by domestic and wild animals, and their use for medicinal purposes make mistletoes of widespread interest to the public. The fact that these plants are parasites that injure and eventually kill trees both conifers and hardwoods is not well known. Two...

Folktale Themes and Activities for Children. Volume 2: Trickster and Transformation Tales. Learning through Folklore Series.

ERIC Educational Resources Information Center

Kraus, Anne Marie

This companion volume to "Folktale Themes Volume 1: Pourquoi Tales," shows educators how to use folktales to provide meaningful, educational experiences for children. This book provides a complete package using folktales in the classroom--activity pages, teaching ideas, story themes, and an annotated bibliography of further reading for a…
American Folk Music and Folklore Recordings 1983: A Selected List.

ERIC Educational Resources Information Center

Library of Congress, Washington, DC. American Folklife Center.

Recognizing the need to inform the public about newly issued folk recordings and audio tapes, the American Folklife Center of the Library of Congress initiates this annual list of selected titles, chosen by a panel of distinguished experts from a compilation of 1983 releases prepared by the Center staff. Although not a comprehensive list, it is…
The Wonder of Wolves: A Story & Activities. Revised Edition. The Wonder Series.

ERIC Educational Resources Information Center

Robinson, Sandra Chisholm

This curriculum guide is all about wolves and provides information through the telling of a story about wolves and their history and folklore. The 14 activities contained in this guide employ an interdisciplinary approach and use mazes, puzzles, model-building, and board games to motivate students. Activity topics include building a wolf,…
White Willow in Russian Literature: Folklore "Roots" of Image

ERIC Educational Resources Information Center

Dudareva, Marianna A.; Goeva, Nina P.

2017-01-01

The article deals with a complicated archetypal tree complex in Russian literature. The object chosen here is "white willow" (vetla) as one of the species of willow in its different variations--daphne willow (verba) and goat willow (rakita), and willow itself. In the 19th century Russian literature we can find the image of white willow…
Hispanic Folk Arts and the Environment: An Interdisciplinary Curriculum Guide. A New Mexican Perspective.

ERIC Educational Resources Information Center

Lopez, Alejandro

This interdisciplinary, bilingual curriculum resource, contains a 29-minute videotape program, 20 colorplate posters, and a curriculum guide. The resource presents an examination of the folklife and folklore expressions of the Hispanic people of New Mexico. The focus of the curriculum is the relationship of survival-based folk activities to the…
Children of Deb Koh: Young Life in an Iranian Village.

ERIC Educational Resources Information Center

Friedl, Erika

This book is based on ethnographic research carried out between 1965 and 1994 during eight visits to a tribal region in southwest Iran. The book weaves together local practices, cognitive categories, folklore, and anecdotes concerning all aspects of growing up to illuminate the world of children in the village of Deh Koh. The book describes how…
Of Bugs and Beasts: Fact, Folklore, and Activities.

ERIC Educational Resources Information Center

Livo, Lauren J.; And Others

In an effort to increase respect for certain creatures, this book profiles animals with reputations out of proportion to the actual potential harm they do. An introduction reviews the results of a survey conducted to determine which animals people generally favor or disfavor and the common beliefs held towards animals. The remainder of the book is…
Complexity, Diversity and Management: Some Reflections on Folklore and Learning Leadership in Education

ERIC Educational Resources Information Center

Rayner, Stephen G.

2008-01-01

This article seeks to challenge a perceived mythology previously touched upon which is now widely established in the English educational system and is associated with what the author has elsewhere called the establishment model of educational policy. This establishment model is grounded in a "state learning theory." It reflects a set of…
Rainsticks: Integrating Culture, Folklore, and the Physics of Sound

ERIC Educational Resources Information Center

Moseley, Christine; Fies, Carmen

2007-01-01

The purpose of this activity is for students to build a rainstick out of materials in their own environment and imitate the sound of rain while investigating the physical principles of sound. Students will be able to relate the sound produced by an instrument to the type and quantity of materials used in its construction.
Transitioning from Elementary to Secondary School: American Pupils' Scary Stories and Physical Education Folklore

ERIC Educational Resources Information Center

Woodruff, Elizabeth A.; Curtner-Smith, Matthew D.

2007-01-01

The purpose of this study was to examine scary stories that young American adults recalled being told about physical education as they transferred from elementary school to secondary school. Participants were 70 undergraduate students. They were required to write about any scary stories concerning (a) secondary schooling in general, and (b)…
BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery.

PubMed

Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Pafilis, Evangelos; Theodosiou, Theodosios; Schneider, Reinhard; Satagopam, Venkata P; Ouzounis, Christos A; Eliopoulos, Aristides G; Promponas, Vasilis J; Iliopoulos, Ioannis

2014-11-15

The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Complex Event Extraction using DRUM

DTIC Science & Technology

2015-10-01

towards tackling these challenges . Figure 9. Evaluation results for eleven teams. The diamond ◆ represents the results of our system. The two topmost...Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/ VLC -2000). The UniProt
Selective Arterial Embolization of Idiopathic Priapism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, Gary S.; Braunstein, Larry; Ball, David S.

1996-11-15

We report a case of idiopathic priapism that was only identified as high-flow or arterial priapism after drainage of the corpora cavernosa. Following failure of conservative and surgical treatment attempts, two consecutive embolizations of a unilateral penile artery were performed with gelgoam particles.
TNF-alpha infusion impairs corpora cavernosa reactivity.

PubMed

Carneiro, Fernando S; Zemse, Saiprazad; Giachini, Fernanda R C; Carneiro, Zidonia N; Lima, Victor V; Webb, R Clinton; Tostes, Rita C

2009-03-01

Erectile dysfunction (ED), as well as cardiovascular diseases (CVDs), is associated with endothelial dysfunction and increased levels of proinflammatory cytokines, such as tumor necrosis factor-alpha (TNF-alpha). We hypothesized that increased TNF-alpha levels impair cavernosal function. In vitro organ bath studies were used to measure cavernosal reactivity in mice infused with vehicle or TNF-alpha (220 ng/kg/min) for 14 days. Gene expression of nitric oxide synthase isoforms was evaluated by real-time polymerase chain reaction. Corpora cavernosa from TNF-alpha-infused mice exhibited decreased nitric oxide (NO)-dependent relaxation, which was associated with decreased endothelial nitric oxide synthase (eNOS) and neuronal nitric oxide synthase (nNOS) cavernosal expression. Cavernosal strips from the TNF-alpha-infused mice displayed decreased nonadrenergic-noncholinergic (NANC)-induced relaxation (59.4 +/- 6.2 vs. control: 76.2 +/- 4.7; 16 Hz) compared with the control animals. These responses were associated with decreased gene expression of eNOS and nNOS (P < 0.05). Sympathetic-mediated, as well as phenylephrine (PE)-induced, contractile responses (PE-induced contraction; 1.32 +/- 0.06 vs. control: 0.9 +/- 0.09, mN) were increased in cavernosal strips from TNF-alpha-infused mice. Additionally, infusion of TNF-alpha increased cavernosal responses to endothelin-1 and endothelin receptor A subtype (ET(A)) receptor expression (P < 0.05) and slightly decreased tumor necrosis factor-alpha receptor 1 (TNFR1) expression (P = 0.063). Corpora cavernosa from TNF-alpha-infused mice display increased contractile responses and decreased NANC nerve-mediated relaxation associated with decreased eNOS and nNOS gene expression. These changes may trigger ED and indicate that TNF-alpha plays a detrimental role in erectile function. Blockade of TNF-alpha actions may represent an alternative therapeutic approach for ED, especially in pathologic conditions associated with increased levels of this cytokine.
Response of prepubertal ewes primed with monensin or progesterone to administration of FSH.

PubMed

Sumbung, F P; Williamson, P; Carson, R S

1987-11-01

Prepubertal ewe lambs were treated with FSH after progesterone priming for 12 days (Group P), monensin supplementation for 14 days (Group M) or a standard diet (Group C). Serial blood samples were taken for LH and progesterone assay, and ovariectomy was performed on half of each group 38-52 h after start of treatment to assess ovarian function, follicular steroid production in vitro and the concentration of gonadotrophin binding sites in follicles. The remaining ewe lambs were ovariectomized 8 days after FSH treatment to determine whether functional corpora lutea were present. FSH treatment was followed by a preovulatory LH surge which occurred significantly later (P less than 0.05) and was better synchronized in ewes in Groups P and M than in those in Group C. At 13-15 h after the LH surge significantly more large follicles were present on ovaries from Group P and M ewes than in Group C. Follicles greater than 5 mm diameter from ewes in Groups P and M produced significantly less oestrogen and testosterone and more dihydrotestosterone, and had significantly more hCG binding sites, than did similar-sized follicles from Group C animals. Ovariectomy on Day 8 after the completion of FSH treatment showed that ewes in Groups P and M had significantly greater numbers of functional corpora lutea. These results indicate that, in prepubertal ewes, progesterone priming and monensin supplementation may delay the preovulatory LH surge, allowing follicles developing after FSH treatment more time to mature before ovulation. This may result in better luteinization of ruptured follicles in these ewes, with the formation of functional corpora lutea.(ABSTRACT TRUNCATED AT 250 WORDS)
Assessing the readability of ClinicalTrials.gov

PubMed Central

Wu, Danny TY; Hanauer, David A; Mei, Qiaozhu; Clark, Patricia M; An, Lawrence C; Proulx, Joshua; Zeng, Qing T; Vydiswaran, VG Vinod; Collins-Thompson, Kevyn

2016-01-01

Objective ClinicalTrials.gov serves critical functions of disseminating trial information to the public and helping the trials recruit participants. This study assessed the readability of trial descriptions at ClinicalTrials.gov using multiple quantitative measures. Materials and Methods The analysis included all 165 988 trials registered at ClinicalTrials.gov as of April 30, 2014. To obtain benchmarks, the authors also analyzed 2 other medical corpora: (1) all 955 Health Topics articles from MedlinePlus and (2) a random sample of 100 000 clinician notes retrieved from an electronic health records system intended for conveying internal communication among medical professionals. The authors characterized each of the corpora using 4 surface metrics, and then applied 5 different scoring algorithms to assess their readability. The authors hypothesized that clinician notes would be most difficult to read, followed by trial descriptions and MedlinePlus Health Topics articles. Results Trial descriptions have the longest average sentence length (26.1 words) across all corpora; 65% of their words used are not covered by a basic medical English dictionary. In comparison, average sentence length of MedlinePlus Health Topics articles is 61% shorter, vocabulary size is 95% smaller, and dictionary coverage is 46% higher. All 5 scoring algorithms consistently rated CliniclTrials.gov trial descriptions the most difficult corpus to read, even harder than clinician notes. On average, it requires 18 years of education to properly understand these trial descriptions according to the results generated by the readability assessment algorithms. Discussion and Conclusion Trial descriptions at CliniclTrials.gov are extremely difficult to read. Significant work is warranted to improve their readability in order to achieve CliniclTrials.gov’s goal of facilitating information dissemination and subject recruitment. PMID:26269536
Abnormal morphology of the penis in male rats exposed neonatally to diethylstilbestrol is associated with altered profile of estrogen receptor-alpha protein, but not of androgen receptor protein: a developmental and immunocytochemical study.

PubMed

Goyal, H O; Braden, T D; Williams, C S; Dalvi, P; Mansour, M M; Mansour, M; Williams, J W; Bartol, F F; Wiley, A A; Birch, L; Prins, G S

2004-05-01

Objectives of the study were to determine developmental changes in morphology and expression of androgen receptor (AR) and estrogen receptor (ER)alpha in the body of the rat penis exposed neonatally to diethylstilbestrol (DES). Male pups received DES at a dose of 10 microg per rat on alternate days from Postnatal Day 2 to Postnatal Day 12. Controls received olive oil vehicle only. Tissue samples were collected on Days 18 (prepuberty), 41 (puberty), and 120 (adult) of age. DES-induced abnormalities were evident at 18 days of age and included smaller, lighter, and thinner penis, loss of cavernous spaces and associated smooth muscle cells, and increased deposition of fat cells in the corpora cavernosa penis. Fat cells virtually filled the entire area of the corpora cavernosa at puberty and adulthood. Plasma testosterone (T) was reduced to an undetectable level, while LH was unaltered in all treated groups. AR-positive cells were ubiquitous and their profile (incidence and staining intensity) did not differ between control and treated rats of the respective age groups. Conversely, ERalpha-positive cells were limited to the stroma of corpus spongiosus in all age groups of both control and treated rats, but the expression in treated rats at 18 days was up-regulated in stromal cells of corpora cavernosa, coincident with the presence of morphological abnormalities. Hence, this study reports for the first time DES-induced developmental, morphological abnormalities in the body of the penis and suggests that these abnormalities may have resulted from decreased T and/or overexpression of ERalpha.
Timing of mating and ovarian response in llamas (Lama glama) treated with pFSH.

PubMed

Ratto, M H; Gatica, R; Correa, J E

1997-08-01

The effect of the timing of mating on ovarian response in llamas was evaluated using 20 adult llamas weighing 90-120 kg which had been in oestrus for 5 days and were treated with 20 mg pFSH every 12 h for the following 5 days (total dose: 200 mg of FSH-NIH-P1). They were randomly allocated to Group A (N = 10) and mated immediately at the end of pFSH treatment or to Group B (n = 10) and mated 36 h after the end of pFSH treatment. Llamas of both groups were given hCG (750 iu, i.m.) immediately after mating. A second mating was allowed 12 h later. Ova and embryos were recovered by non-surgical uterine flushing 7 days after the first mating. Ovarian response was immediately evaluated afterwards via laparoscopy. The mean ovulation rate of 4.5 corpora lutea for Group A was significantly lower (P < 0.01) than the mean of 13.8 observed for Group B. The total ovarian response (number of corpora lutea + follicles > 10 mm) was also significantly higher (P < 0.01) in Group B than in Group A. Twenty-seven ova were recovered in each group, corresponding to 60% and 20% (P < 0.01) of the corpora lutea observed in Groups A and B, respectively; however, no significant difference (P > 0.05) in fertilisation rate was observed. The results show that pFSH induces superovulation in llamas treated during oestrus and that a 36-h interval between the end of FSH treatment and mating increases ovulation rate and the total ovarian response but does not affect the number of ova/embryos recovered.
Losartan, an Angiotensin type I receptor, restores erectile function by downregulation of cavernous renin-angiotensin system in streptozocin-induced diabetic rats.

PubMed

Yang, Rong; Yang, Bin; Wen, Yanting; Fang, Feng; Cui, Souxi; Lin, Guiting; Sun, Zeyu; Wang, Run; Dai, Yutian

2009-03-01

The high incidence of erectile dysfunction (ED) in diabetes highlights the need for good treatment strategies. Recent evidence indicates that blockade of the angiotensin type I receptor (AT1) may reverse ED from various diseases. To explore the role of cavernous renin-angiotensin system (RAS) in the pathogenesis of diabetic ED and the role of losartan in the treatment of diabetic ED. The AT1 blocker (ARB) losartan (30 mg/kg/d) was administered to rats with streptozocin (65 mg/kg)-induced diabetes. Erectile function, cavernous structure, and tissue gene and protein expression of RAS in the corpora cavernosa were studied. We sought to determine the changes of cavernous RAS in the condition of diabetes and after treatment with losartan. RAS components (angiotensinogen, [pro]renin receptor, angiotensin-converting enzyme [ACE], and AT1) were expressed in cavernosal tissue. In diabetic rats, RAS components were upregulated, resulting in the increased concentration of angiotensin II (Ang II) in the corpora. A positive feedback loop for Ang II formation in cavernosum was also identified, which could contribute to overactivity of cavernous RAS in diabetic rats. Administration of losartan blocked the effect of Ang II, downregulated the expression of AT1 and Ang II generated locally, and partially restored erectile function (losartan-treated group revealed an improved intracavernous pressure/mean systemic arterial pressure ratio as compared with the diabetic group (0.480 +/- 0.031 vs. 0.329 +/- 0.020, P < 0.01). However, losartan could not elevate the reduced smooth muscle/collagen ratio in diabetic rats. The cavernous RAS plays a role in modulating erectile function in corpora cavernosa and is involved in the pathogenesis of diabetic ED. ARB can restore diabetic ED through downregulating cavernous RAS.
Morphological and histological characters of penile organization in eleven species of molossid bats.

PubMed

Comelis, Manuela T; Bueno, Larissa M; Góes, Rejane M; Taboga, S R; Morielle-Versute, Eliana

2018-04-01

The penis is the reproductive organ that ensures efficient copulation and success of internal fertilization in all species of mammals, with special challenges for bats, where copulation can occur during flight. Comparative anatomical analyses of different species of bats can contribute to a better understanding of morphological diversity of this organ, concerning organization and function. In this study, we describe the external morphology and histomorphology of the penis and baculum in eleven species of molossid bats. The present study showed that penile organization in these species displayed the basic vascular mammalian pattern and had a similar pattern concerning the presence of the tissues constituting the penis, exhibiting three types of erectile tissue (the corpus cavernosum, accessory cavernous tissue, and corpus spongiosum) around the urethra. However, certain features varied among the species, demonstrating that most species are distinguishable by glans and baculum morphology and glans histological organization. Major variations in glans morphology were genus-specific, and the greatest similarities were shared by Eumops species and N. laticaudatus. The greatest interspecific similarities occurred between M. molossus and M. rufus and between Eumops species. Save for M. molossus and M. rufus, morphology of the baculum was species-specific; and in E. perotis, it did not occur in all specimens, indicating that it is probably under selection. In the histological organization, the most evident differences were number of septa and localization of the corpora cavernosa. In species with a baculum (Molossus, Eumops and Nyctinomops species), the corpora cavernosa predominantly occupied the dorsal region of the penile glans and is associated with the proximal (basal) portion of the baculum. In species that do not have a baculum (Cynomops, Molossops and Neoplatymops species), the corpora cavernosa predominantly occupied the ventro-lateral region of the glans. Copyright © 2018 Elsevier GmbH. All rights reserved.

Building an ontology of pulmonary diseases with natural language processing tools using textual corpora.

PubMed

Baneyx, Audrey; Charlet, Jean; Jaulent, Marie-Christine

2007-01-01

Pathologies and acts are classified in thesauri to help physicians to code their activity. In practice, the use of thesauri is not sufficient to reduce variability in coding and thesauri are not suitable for computer processing. We think the automation of the coding task requires a conceptual modeling of medical items: an ontology. Our task is to help lung specialists code acts and diagnoses with software that represents medical knowledge of this concerned specialty by an ontology. The objective of the reported work was to build an ontology of pulmonary diseases dedicated to the coding process. To carry out this objective, we develop a precise methodological process for the knowledge engineer in order to build various types of medical ontologies. This process is based on the need to express precisely in natural language the meaning of each concept using differential semantics principles. A differential ontology is a hierarchy of concepts and relationships organized according to their similarities and differences. Our main research hypothesis is to apply natural language processing tools to corpora to develop the resources needed to build the ontology. We consider two corpora, one composed of patient discharge summaries and the other being a teaching book. We propose to combine two approaches to enrich the ontology building: (i) a method which consists of building terminological resources through distributional analysis and (ii) a method based on the observation of corpus sequences in order to reveal semantic relationships. Our ontology currently includes 1550 concepts and the software implementing the coding process is still under development. Results show that the proposed approach is operational and indicates that the combination of these methods and the comparison of the resulting terminological structures give interesting clues to a knowledge engineer for the building of an ontology.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilson, Andrew T.; Robinson, David Gerald

Most topic modeling algorithms that address the evolution of documents over time use the same number of topics at all times. This obscures the common occurrence in the data where new subjects arise and old ones diminish or disappear entirely. We propose an algorithm to model the birth and death of topics within an LDA-like framework. The user selects an initial number of topics, after which new topics are created and retired without further supervision. Our approach also accommodates many of the acceleration and parallelization schemes developed in recent years for standard LDA. In recent years, topic modeling algorithms suchmore » as latent semantic analysis (LSA)[17], latent Dirichlet allocation (LDA)[10] and their descendants have offered a powerful way to explore and interrogate corpora far too large for any human to grasp without assistance. Using such algorithms we are able to search for similar documents, model and track the volume of topics over time, search for correlated topics or model them with a hierarchy. Most of these algorithms are intended for use with static corpora where the number of documents and the size of the vocabulary are known in advance. Moreover, almost all current topic modeling algorithms fix the number of topics as one of the input parameters and keep it fixed across the entire corpus. While this is appropriate for static corpora, it becomes a serious handicap when analyzing time-varying data sets where topics come and go as a matter of course. This is doubly true for online algorithms that may not have the option of revising earlier results in light of new data. To be sure, these algorithms will account for changing data one way or another, but without the ability to adapt to structural changes such as entirely new topics they may do so in counterintuitive ways.« less
Collaborative work on evaluation of ovarian toxicity. 13) Two- or four-week repeated dose studies and fertility study of PPAR alpha/gamma dual agonist in female rats.

PubMed

Sato, Norihiro; Uchida, Keisuke; Nakajima, Mikio; Watanabe, Atsushi; Kohira, Terutomo

2009-01-01

The main focus of this study was to determine the optimal dosing period in a repeated dose toxicity study based on toxic effects as assessed by ovarian morphological changes. To assess morphological and functional changes induced in the ovary by a peroxisome proliferator-activated receptor (PPAR) alpha/gamma dual agonist, the compound was administered to female rats at dose levels of 0, 4, 20, and 100 mg/kg/day in a repeated dose toxicity study for 2 or 4 weeks, and from 2 weeks prior to mating to Day 7 of pregnancy in a female fertility study. In the repeated dose toxicity study, an increase in atresia of large follicles, a decrease in corpora lutea, and an increase in stromal cells were observed in the treated groups. In addition, the granulosa cell exfoliations into antrum of large follicles and corpora lutea with retained oocyte are morphological characteristics induced by this compound, and they might be related with abnormal condition of ovulation. In the female fertility study, the pregnancy rate tended to decrease in the 100 mg/kg/day group. At necropsy, decreases in the number of corpora lutea, implantations and live embryos were noted in the 20 and 100 mg/kg/day group. No changes were observed in animals given 4 mg/kg/day. These findings indicated that histopathological changes in the ovary are important endpoints for evaluation of drugs inducing ovarian damage. In conclusion, a 2-week administration period is sufficient to detect ovarian toxicity of this test compound in the repeated dose toxicity study.
The Developmental Pattern of Resistance to Peer Influence in Adolescence: Will the Teenager Ever Be Able to Resist?

ERIC Educational Resources Information Center

Sumter, Sindy R.; Bokhorst, Caroline L.; Steinberg, Laurence; Westenberg, P. Michiel

2009-01-01

Common folklore seems to suggest that adolescents are particularly susceptible to peer influence. However, from the literature the exact age differences in susceptibility to peer influence remain unclear. The current study's main focus was to chart the development of general susceptibility to peer pressure in a community sample of 10-18 year olds…
The Chicano Literary World--1974. The National Symposium on Chicano Literature and Critical Analysis (1st, Las Vegas, New Mexico, November 1974).

ERIC Educational Resources Information Center

Ortego, Felipe, Comp.; Conde, David, Comp.

Over 200 participants from 10 states and 17 universities attended "The First National Symposium on Chicano Literature and Critical Analysis." Five of the papers presented at the symposium are given in this publication. The papers cover Chicano poetry, novel, drama, and popular folklore humor. "National Character vs Universality in…
From Folklore to Molecular Pharmacophores: Cultivating STEM Students among Young, First-Generation Female Mexican-Americans

ERIC Educational Resources Information Center

Gardea, Jessica; Rios, Laura; Pal, Rituraj; Gardea-Torresdey, Jorge L.; Narayan, Mahesh

2011-01-01

The Research and Engineering Apprenticeship Program of the Academy of Applied Science has funded several high school student summer internships to work within the Department of Chemistry at the University of Texas at El Paso. Over the last nine years, young Mexican-American scholars have been recruited into STEM-specific (science, technology,…
The Sky This Week, 2016 February 2 - 9 - Naval Oceanography Portal

Science.gov Websites

section Advanced Search... Sections Home Time Earth Orientation Astronomy Meteorology Oceanography Ice You Moon occurs on the 8th at 9:39 am Eastern Daylight Time. Look for Luna about four degrees northwest of same time! According to folklore, the lack of a shadow cast by an indigenous rodent in rural
Learning from Your Community: Folklore and Video in the Schools. A Classroom Curriculum for Grades 4-8.

ERIC Educational Resources Information Center

Matthews, Gail; Patterson, Don

This curriculum guide describes a unit of study designed to help students learn academic and technical skills necessary for creating a video. Mayesville Elementary School (South Carolina) teachers and their students collaborated with a videographer and folklorist in a series of 3-week school residencies requiring students to develop stories using…
Examples of Practice: An Intercultural Approach to Translate Romanian Children's Folklore into Spanish

ERIC Educational Resources Information Center

Oprica, Dana

2016-01-01

Due to the great number of Romanian pupils in Spanish public school, the local administration organises extra-curricular Romanian courses in order to preserve the Romanian language and culture. This is a way to contribute to build and consolidate a bicultural and bilingual profile of the young people. Besides, it is also an opportunity to align…
"Strange Things Happen to Non-Christian People": Human-Animal Transformation among the Inupiat of Arctic Alaska

ERIC Educational Resources Information Center

Cassady, Joslyn

2008-01-01

Inuit myths, folklore, and material culture are filled with examples of people who turn into animals. Margaret Lantis, a well-known Eskimologist of the mid-twentieth century, once commented that human-animal transformation in Inuit mythology had an "immediacy and a reality" that was unknown in other parts of the world. It is hard to…
Contributions of Playground Singing Games to the Social Inclusion of Refugee and Newly Arrived Immigrant Children in Australia

ERIC Educational Resources Information Center

Marsh, Kathryn; Dieckmann, Samantha

2017-01-01

In recent decades, researchers from the fields of music education, ethnomusicology, folklore and sociology have developed an increasing interest in children's musical play traditions and the ways in which children teach and learn, perform, create and transform playground games and songs. Such repertoire is drawn both from oral traditions and from…
Traditional Ceremonies and Rituals of Kazakh People as the Reflection of the Spiritual Culture in the Kazakh Cinematograph

ERIC Educational Resources Information Center

Tuyakbaeva, Aigul Sh.

2016-01-01

This paper covers the folklore and traditional nature of Kazakh cinematograph. A flexible manner was used for analysis of the specifics of cinematizing of ethnic and cultural values, and also, the cinematization of traditions and customs as the factor of spiritual development was studied. Systematical and scientific analysis of peculiarities and…
Antologia Del Saber Popular: A Selection from Various Genres of Mexican Folklore Across Borders. Monograph No. 2.

ERIC Educational Resources Information Center

Robe, Stanley L., Ed.

A variety of oral folk material from Mexican sources is presented in this anthology. The 114 selections are derived from the various genres available and from traditional as well as newer formations. The selections include folktales, jests and anecdotes, legends and beliefs, beliefs about popular medicine, prayers, verses, children's games and…
Creating Realistic Corpora for Security and Forensic Education

DTIC Science & Technology

2011-05-01

School of Information and Library Science University of North Carolina Chapel Hill, NC kamwoods@email.unc.edu Christopher A. Lee School of...Information and Library Science University of North Carolina Chapel Hill, NC callee@ils.unc.edu Simson Garfinkel Graduate School of Operational and
Translation Ambiguity in and out of Context

ERIC Educational Resources Information Center

Prior, Anat; Wintner, Shuly; MacWhinney, Brian; Lavie, Alon

2011-01-01

We compare translations of single words, made by bilingual speakers in a laboratory setting, with contextualized translation choices of the same items, made by professional translators and extracted from parallel language corpora. The translation choices in both cases show moderate convergence, demonstrating that decontextualized translation…
BAAL/CUP Seminars 2009

ERIC Educational Resources Information Center

Cutting, Joan; Murphy, Brona

2010-01-01

The seminar, organised by Joan Cutting and Brona Murphy, aimed: (1) to bring together researchers involved in both emergent and established academic corpora (written and spoken) as well as linguists, lecturers and teachers researching in education, be it language teaching, language-teacher training or continuing professional development in…
Large Extremity Peripheral Nerve Repair

DTIC Science & Technology

2016-12-01

norbornene-2,3-dicarboxylic anhydride)/DMP-30 [2,4,6-tri (dimethylamino- methyl)phenol] (Tousimis Research Corp., Rock- ville, Md.); and then baked ...embedded in Epoxy resin (Tousimis Research Corpora- tion, Rockville, MD), and then baked overnight in a 60°C oven. From each proximal and distal
31 CFR 358.7 - Where do I send my bearer corpora and detached bearer coupons to be converted?

Code of Federal Regulations, 2010 CFR

2010-07-01

... detached bearer coupons to be converted to: Bureau of the Public Debt, Division of Customer Service, P. O... Relating to Money and Finance (Continued) FISCAL SERVICE, DEPARTMENT OF THE TREASURY BUREAU OF THE PUBLIC...
Automatic measurement of voice onset time using discriminative structured prediction.

PubMed

Sonderegger, Morgan; Keshet, Joseph

2012-12-01

A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets.
Accessory corpora lutea formation in pregnant Hokkaido sika deer (Cervus nippon yesoensis) investigated by examination of ovarian dynamics and steroid hormone concentrations.

PubMed

Yanagawa, Yojiro; Matsuura, Yukiko; Suzuki, Masatsugu; Saga, Shin-Ichi; Okuyama, Hideto; Fukui, Daisuke; Bando, Gen; Nagano, Masashi; Katagiri, Seiji; Takahashi, Yoshiyuki; Tsubota, Toshio

2015-01-01

Generally, sika deer conceive a single fetus, but approximately 80% of pregnant females have two corpora lutea (CLs). The function of the accessory CL (ACL) is unknown; moreover, the process of ACL formation is unclear, and understanding this is necessary to know its role. To elucidate the process of ACL formation, the ovarian dynamics of six adult Hokkaido sika deer females were examined ultrasonographically together with peripheral estradiol-17β and progesterone concentrations. ACLs formed in three females that conceived at the first estrus of the breeding season, but not in those females that conceived at the second estrus. After copulation, postconception ovulation of the dominant follicle of the first wave is induced by an increase in estradiol-17β, which leads to formation of an ACL. A relatively low concentration of progesterone after the first estrus of the breeding season is considered to be responsible for the increase in estradiol-17β after copulation.

Accessory corpora lutea formation in pregnant Hokkaido sika deer (Cervus nippon yesoensis) investigated by examination of ovarian dynamics and steroid hormone concentrations

PubMed Central

YANAGAWA, Yojiro; MATSUURA, Yukiko; SUZUKI, Masatsugu; SAGA, Shin-ichi; OKUYAMA, Hideto; FUKUI, Daisuke; BANDO, Gen; NAGANO, Masashi; KATAGIRI, Seiji; TAKAHASHI, Yoshiyuki; TSUBOTA, Toshio

2014-01-01

Generally, sika deer conceive a single fetus, but approximately 80% of pregnant females have two corpora lutea (CLs). The function of the accessory CL (ACL) is unknown; moreover, the process of ACL formation is unclear, and understanding this is necessary to know its role. To elucidate the process of ACL formation, the ovarian dynamics of six adult Hokkaido sika deer females were examined ultrasonographically together with peripheral estradiol-17β and progesterone concentrations. ACLs formed in three females that conceived at the first estrus of the breeding season, but not in those females that conceived at the second estrus. After copulation, postconception ovulation of the dominant follicle of the first wave is induced by an increase in estradiol-17β, which leads to formation of an ACL. A relatively low concentration of progesterone after the first estrus of the breeding season is considered to be responsible for the increase in estradiol-17β after copulation. PMID:25482110
Rhythm histograms and musical meter: A corpus study of Malian percussion music.

PubMed

London, Justin; Polak, Rainer; Jacoby, Nori

2017-04-01

Studies of musical corpora have given empirical grounding to the various features that characterize particular musical styles and genres. Palmer & Krumhansl (1990) found that in Western classical music the likeliest places for a note to occur are the most strongly accented beats in a measure, and this was also found in subsequent studies using both Western classical and folk music corpora (Huron & Ommen, 2006; Temperley, 2010). We present a rhythmic analysis of a corpus of 15 performances of percussion music from Bamako, Mali. In our corpus, the relative frequency of note onsets in a given metrical position does not correspond to patterns of metrical accent, though there is a stable relationship between onset frequency and metrical position. The implications of this non-congruence between simple statistical likelihood and metrical structure for the ways in which meter and metrical accent may be learned and understood are discussed, along with importance of cross-cultural studies for psychological research.
Experiments in automatic word class and word sense identification for information retrieval

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gauch, S.; Futrelle, R.P.

Automatic identification of related words and automatic detection of word senses are two long-standing goals of researchers in natural language processing. Word class information and word sense identification may enhance the performance of information retrieval system4ms. Large online corpora and increased computational capabilities make new techniques based on corpus linguisitics feasible. Corpus-based analysis is especially needed for corpora from specialized fields for which no electronic dictionaries or thesauri exist. The methods described here use a combination of mutual information and word context to establish word similarities. Then, unsupervised classification is done using clustering in the word space, identifying word classesmore » without pretagging. We also describe an extension of the method to handle the difficult problems of disambiguation and of determining part-of-speech and semantic information for low-frequency words. The method is powerful enough to produce high-quality results on a small corpus of 200,000 words from abstracts in a field of molecular biology.« less
Association between erythrocyte 2,3-diphosphoglycerate levels and reproduction capacity in Long-Evans rats.

PubMed

Noble, N A; Brewer, G J

1982-03-01

During genetic selection of rats for high and low levels of red cell 2,3-diphosphoglycerate (DPG) the decreased fertility in Low-DPG animals was due to significantly (P less than 0.01) fewer offspring born per litter. The rat lines were intercrossed and animals at the tails of the F2 2,3-diphosphoglycerate distribution were mated. Subsequent matings of F3 offspring were monitored. Low-DPG F3 pregnant females killed at 20 days of gestation showed significantly (P less than 0.05) fewer corpora lutea than High-DPG F3 females. There were also significantly (P less than 0.01) fewer corpora lutea in Low-DPG line rats compared to High-DPG rats. It is concluded that the relationship between 2,3-diphosphoglycerate levels and fertility is not due to inbreeding but to a possible genetic linkage, a shared biochemical determinant or a relationship through the effect of 2,3-diphosphoglycerate levels on oxygen delivery to tissue.
Specification of Drosophila Corpora Cardiaca Neuroendocrine Cells from Mesoderm Is Regulated by Notch Signaling

PubMed Central

Park, Sangbin; Bustamante, Erika L.; Antonova, Julie; McLean, Graeme W.; Kim, Seung K.

2011-01-01

Drosophila neuroendocrine cells comprising the corpora cardiaca (CC) are essential for systemic glucose regulation and represent functional orthologues of vertebrate pancreatic α-cells. Although Drosophila CC cells have been regarded as developmental orthologues of pituitary gland, the genetic regulation of CC development is poorly understood. From a genetic screen, we identified multiple novel regulators of CC development, including Notch signaling factors. Our studies demonstrate that the disruption of Notch signaling can lead to the expansion of CC cells. Live imaging demonstrates localized emergence of extra precursor cells as the basis of CC expansion in Notch mutants. Contrary to a recent report, we unexpectedly found that CC cells originate from head mesoderm. We show that Tinman expression in head mesoderm is regulated by Notch signaling and that the combination of Daughterless and Tinman is sufficient for ectopic CC specification in mesoderm. Understanding the cellular, genetic, signaling, and transcriptional basis of CC cell specification and expansion should accelerate discovery of molecular mechanisms regulating ontogeny of organs that control metabolism. PMID:21901108
Good gibbons and evil macaques: a historical review on cognitive features of non-human primates in Chinese traditional culture.

PubMed

Zhang, Peng

2015-07-01

For several thousand years the ancient Chinese have accumulated rich knowledge, in the form of written literature and folklore, on the non-human primates widely distributed in China. I have used critical text analysis and discourse analysis to clarify when and how ancient Chinese distinguished gibbons from macaques. I divided the progress into four main stages, the Pre-Shang to Shang dynasty (before 1046 BC), the Zhou to Han dynasty (1046 BC-220 AD), the six dynasties to Song dynasty (220-1279 AD), and the Yuan to Qing dynasties (1279-1840 AD). I found that China's traditional cognition of gibbons and macaques emphasized the appearance of animals, organoleptic performance, or even whether or not their behavior was "moral". They described them as human-like animals by ethical standards but ignored the species itself. This kind of cognitive style actually embodies the "pursuit of goodness", which is the feature of Chinese traditional culture. This study presents some original views on Chinese traditional knowledge of non-human primates.
Cell line name recognition in support of the identification of synthetic lethality in cancer from text

PubMed Central

Kaewphan, Suwisa; Van Landeghem, Sofie; Ohta, Tomoko; Van de Peer, Yves; Ginter, Filip; Pyysalo, Sampo

2016-01-01

Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers. Availability and implementation: The manually annotated datasets, the cell line dictionary, derived corpora, NERsuite models and the results of the large-scale run on unannotated texts are available under open licenses at http://turkunlp.github.io/Cell-line-recognition/. Contact: sukaew@utu.fi PMID:26428294
Trends in gel dosimetry: Preliminary bibliometric overview of active growth areas, research trends and hot topics from Gore’s 1984 paper onwards

NASA Astrophysics Data System (ADS)

Baldock, C.

2017-05-01

John Gore’s seminal 1984 paper on gel dosimetry spawned a vibrant research field ranging from fundamental science through to clinical applications. A preliminary bibliometric study was undertaken of the gel dosimetry family of publications inspired by, and resulting from, Gore’s original 1984 paper to determine active growth areas, research trends and hot topics from Gore’s paper up to and including 2016. Themes and trends of the gel dosimetry research field were bibliometrically explored by way of co-occurrence term maps using the titles and abstracts text corpora from the Web of Science database for all relevant papers from 1984 to 2016. Visualisation of similarities was used by way of the VOSviewer visualisation tool to generate cluster maps of gel dosimetry knowledge domains and the associated citation impact of topics within the domains. Heat maps were then generated to assist in the understanding of active growth areas, research trends, and emerging and hot topics in gel dosimetry.
Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD.

PubMed

Bullinaria, John A; Levy, Joseph P

2012-09-01

In a previous article, we presented a systematic computational study of the extraction of semantic representations from the word-word co-occurrence statistics of large text corpora. The conclusion was that semantic vectors of pointwise mutual information values from very small co-occurrence windows, together with a cosine distance measure, consistently resulted in the best representations across a range of psychologically relevant semantic tasks. This article extends that study by investigating the use of three further factors--namely, the application of stop-lists, word stemming, and dimensionality reduction using singular value decomposition (SVD)--that have been used to provide improved performance elsewhere. It also introduces an additional semantic task and explores the advantages of using a much larger corpus. This leads to the discovery and analysis of improved SVD-based methods for generating semantic representations (that provide new state-of-the-art performance on a standard TOEFL task) and the identification and discussion of problems and misleading results that can arise without a full systematic study.
A Large-Scale Analysis of Variance in Written Language.

PubMed

Johns, Brendan T; Jamieson, Randall K

2018-01-22

The collection of very large text sources has revolutionized the study of natural language, leading to the development of several models of language learning and distributional semantics that extract sophisticated semantic representations of words based on the statistical redundancies contained within natural language (e.g., Griffiths, Steyvers, & Tenenbaum, ; Jones & Mewhort, ; Landauer & Dumais, ; Mikolov, Sutskever, Chen, Corrado, & Dean, ). The models treat knowledge as an interaction of processing mechanisms and the structure of language experience. But language experience is often treated agnostically. We report a distributional semantic analysis that shows written language in fiction books varies appreciably between books from the different genres, books from the same genre, and even books written by the same author. Given that current theories assume that word knowledge reflects an interaction between processing mechanisms and the language environment, the analysis shows the need for the field to engage in a more deliberate consideration and curation of the corpora used in computational studies of natural language processing. Copyright © 2018 Cognitive Science Society, Inc.
Ontology construction and application in practice case study of health tourism in Thailand.

PubMed

Chantrapornchai, Chantana; Choksuchat, Chidchanok

2016-01-01

Ontology is one of the key components in semantic webs. It contains the core knowledge for an effective search. However, building ontology requires the carefully-collected knowledge which is very domain-sensitive. In this work, we present the practice of ontology construction for a case study of health tourism in Thailand. The whole process follows the METHONTOLOGY approach, which consists of phases: information gathering, corpus study, ontology engineering, evaluation, publishing, and the application construction. Different sources of data such as structure web documents like HTML and other documents are acquired in the information gathering process. The tourism corpora from various tourism texts and standards are explored. The ontology is evaluated in two aspects: automatic reasoning using Pellet, and RacerPro, and the questionnaires, used to evaluate by experts of the domains: tourism domain experts and ontology experts. The ontology usability is demonstrated via the semantic web application and via example axioms. The developed ontology is actually the first health tourism ontology in Thailand with the published application.
Estimating affective word covariates using word association data.

PubMed

Van Rensbergen, Bram; De Deyne, Simon; Storms, Gert

2016-12-01

Word ratings on affective dimensions are an important tool in psycholinguistic research. Traditionally, they are obtained by asking participants to rate words on each dimension, a time-consuming procedure. As such, there has been some interest in computationally generating norms, by extrapolating words' affective ratings using their semantic similarity to words for which these values are already known. So far, most attempts have derived similarity from word co-occurrence in text corpora. In the current paper, we obtain similarity from word association data. We use these similarity ratings to predict the valence, arousal, and dominance of 14,000 Dutch words with the help of two extrapolation methods: Orientation towards Paradigm Words and k-Nearest Neighbors. The resulting estimates show very high correlations with human ratings when using Orientation towards Paradigm Words, and even higher correlations when using k-Nearest Neighbors. We discuss possible theoretical accounts of our results and compare our findings with previous attempts at computationally generating affective norms.
Context-Aware Adaptive Hybrid Semantic Relatedness in Biomedical Science

NASA Astrophysics Data System (ADS)

Emadzadeh, Ehsan

Text mining of biomedical literature and clinical notes is a very active field of research in biomedical science. Semantic analysis is one of the core modules for different Natural Language Processing (NLP) solutions. Methods for calculating semantic relatedness of two concepts can be very useful in solutions solving different problems such as relationship extraction, ontology creation and question / answering [1--6]. Several techniques exist in calculating semantic relatedness of two concepts. These techniques utilize different knowledge sources and corpora. So far, researchers attempted to find the best hybrid method for each domain by combining semantic relatedness techniques and data sources manually. In this work, attempts were made to eliminate the needs for manually combining semantic relatedness methods targeting any new contexts or resources through proposing an automated method, which attempted to find the best combination of semantic relatedness techniques and resources to achieve the best semantic relatedness score in every context. This may help the research community find the best hybrid method for each context considering the available algorithms and resources.
Progressive Learning of Topic Modeling Parameters: A Visual Analytics Framework.

PubMed

El-Assady, Mennatallah; Sevastjanova, Rita; Sperrle, Fabian; Keim, Daniel; Collins, Christopher

2018-01-01

Topic modeling algorithms are widely used to analyze the thematic composition of text corpora but remain difficult to interpret and adjust. Addressing these limitations, we present a modular visual analytics framework, tackling the understandability and adaptability of topic models through a user-driven reinforcement learning process which does not require a deep understanding of the underlying topic modeling algorithms. Given a document corpus, our approach initializes two algorithm configurations based on a parameter space analysis that enhances document separability. We abstract the model complexity in an interactive visual workspace for exploring the automatic matching results of two models, investigating topic summaries, analyzing parameter distributions, and reviewing documents. The main contribution of our work is an iterative decision-making technique in which users provide a document-based relevance feedback that allows the framework to converge to a user-endorsed topic distribution. We also report feedback from a two-stage study which shows that our technique results in topic model quality improvements on two independent measures.
Machine learning with naturally labeled data for identifying abbreviation definitions.

PubMed

Yeganova, Lana; Comeau, Donald C; Wilbur, W John

2011-06-09

The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most existing approaches for the abbreviation definition identification task employ rule-based methods. While achieving high precision, rule-based methods are limited to the rules defined and fail to capture many uncommon definition patterns. Supervised learning techniques, which offer more flexibility in detecting abbreviation definitions, have also been applied to the problem. However, they require manually labeled training data. In this work, we develop a machine learning algorithm for abbreviation definition identification in text which makes use of what we term naturally labeled data. Positive training examples are naturally occurring potential abbreviation-definition pairs in text. Negative training examples are generated by randomly mixing potential abbreviations with unrelated potential definitions. The machine learner is trained to distinguish between these two sets of examples. Then, the learned feature weights are used to identify the abbreviation full form. This approach does not require manually labeled training data. We evaluate the performance of our algorithm on the Ab3P, BIOADI and Medstract corpora. Our system demonstrated results that compare favourably to the existing Ab3P and BIOADI systems. We achieve an F-measure of 91.36% on Ab3P corpus, and an F-measure of 87.13% on BIOADI corpus which are superior to the results reported by Ab3P and BIOADI systems. Moreover, we outperform these systems in terms of recall, which is one of our goals.
Discovering body site and severity modifiers in clinical texts

PubMed Central

Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana K

2014-01-01

Objective To research computational methods for discovering body site and severity modifiers in clinical texts. Methods We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. Results The performance of our method for discovering body site modifiers achieves F1 of 0.740–0.908 and our method for discovering severity modifiers achieves F1 of 0.905–0.929. Discussion Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. Conclusions We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES). PMID:24091648
Discovering body site and severity modifiers in clinical texts.

PubMed

Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana K

2014-01-01

To research computational methods for discovering body site and severity modifiers in clinical texts. We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. The performance of our method for discovering body site modifiers achieves F1 of 0.740-0.908 and our method for discovering severity modifiers achieves F1 of 0.905-0.929. Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES).
The Timing and Construction of Preference: A Quantitative Study

ERIC Educational Resources Information Center

Kendrick, Kobin H.; Torreira, Francisco

2015-01-01

Conversation-analytic research has argued that the timing and construction of preferred responding actions (e.g., acceptances) differ from that of dispreferred responding actions (e.g., rejections), potentially enabling early response prediction by recipients. We examined 195 preferred and dispreferred responding actions in telephone corpora and…
Exploring Business Request Genres: Students' Rhetorical Choices

ERIC Educational Resources Information Center

Nguyen, Hai; Miller, Jennifer

2012-01-01

This article presents selective findings from an ongoing study that investigates rhetorical differences in business letter writing between Vietnamese students taking an English for Specific Purposes course in Vietnam and business professionals. Rhetorical analyses are based on two corpora, namely, scenario (N = 20) and authentic business letters…
U.S. Ratification of the Chemical Weapons Convention

DTIC Science & Technology

2011-12-01

safeguard trade secrets. Leading corpora- tions such as DuPont, Dow, and Monsanto also supported CWC ratification to improve the public image of the...represented large chemical companies such as Dow, DuPont, and Monsanto , was highly effective at contacting senators, putting out useful information, and

Some links on this page may take you to non-federal websites. Their policies may differ from this site.