language processing algorithms: Topics by Science.gov

Sample records for language processing algorithms

Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English.

PubMed

Li, Junfeng; Yang, Lin; Zhang, Jianping; Yan, Yonghong; Hu, Yi; Akagi, Masato; Loizou, Philipos C

2011-05-01

A large number of single-channel noise-reduction algorithms have been proposed based largely on mathematical principles. Most of these algorithms, however, have been evaluated with English speech. Given the different perceptual cues used by native listeners of different languages including tonal languages, it is of interest to examine whether there are any language effects when the same noise-reduction algorithm is used to process noisy speech in different languages. A comparative evaluation and investigation is taken in this study of various single-channel noise-reduction algorithms applied to noisy speech taken from three languages: Chinese, Japanese, and English. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five single-channel noise-reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluation showed that the majority of noise-reduction algorithms did not improve speech intelligibility. Consistent with a previous study with the English language, the Wiener filtering algorithm produced small, but statistically significant, improvements in intelligibility for car and white noise conditions. Significant differences between the performances of noise-reduction algorithms across the three languages were observed.
Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings.

PubMed

Dutta, Sayon; Long, William J; Brown, David F M; Reisner, Andrew T

2013-08-01

As use of radiology studies increases, there is a concurrent increase in incidental findings (eg, lung nodules) for which the radiologist issues recommendations for additional imaging for follow-up. Busy emergency physicians may be challenged to carefully communicate recommendations for additional imaging not relevant to the patient's primary evaluation. The emergence of electronic health records and natural language processing algorithms may help address this quality gap. We seek to describe recommendations for additional imaging from our institution and develop and validate an automated natural language processing algorithm to reliably identify recommendations for additional imaging. We developed a natural language processing algorithm to detect recommendations for additional imaging, using 3 iterative cycles of training and validation. The third cycle used 3,235 radiology reports (1,600 for algorithm training and 1,635 for validation) of discharged emergency department (ED) patients from which we determined the incidence of discharge-relevant recommendations for additional imaging and the frequency of appropriate discharge documentation. The test characteristics of the 3 natural language processing algorithm iterations were compared, using blinded chart review as the criterion standard. Discharge-relevant recommendations for additional imaging were found in 4.5% (95% confidence interval [CI] 3.5% to 5.5%) of ED radiology reports, but 51% (95% CI 43% to 59%) of discharge instructions failed to note those findings. The final natural language processing algorithm had 89% (95% CI 82% to 94%) sensitivity and 98% (95% CI 97% to 98%) specificity for detecting recommendations for additional imaging. For discharge-relevant recommendations for additional imaging, sensitivity improved to 97% (95% CI 89% to 100%). Recommendations for additional imaging are common, and failure to document relevant recommendations for additional imaging in ED discharge instructions occurs frequently. The natural language processing algorithm's performance improved with each iteration and offers a promising error-prevention tool. Copyright © 2013 American College of Emergency Physicians. Published by Mosby, Inc. All rights reserved.
Algorithms and programming tools for image processing on the MPP

NASA Technical Reports Server (NTRS)

Reeves, A. P.

1985-01-01

Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
A real time microcomputer implementation of sensor failure detection for turbofan engines

NASA Technical Reports Server (NTRS)

Delaat, John C.; Merrill, Walter C.

1989-01-01

An algorithm was developed which detects, isolates, and accommodates sensor failures using analytical redundancy. The performance of this algorithm was demonstrated on a full-scale F100 turbofan engine. The algorithm was implemented in real-time on a microprocessor-based controls computer which includes parallel processing and high order language programming. Parallel processing was used to achieve the required computational power for the real-time implementation. High order language programming was used in order to reduce the programming and maintenance costs of the algorithm implementation software. The sensor failure algorithm was combined with an existing multivariable control algorithm to give a complete control implementation with sensor analytical redundancy. The real-time microprocessor implementation of the algorithm which resulted in the successful completion of the algorithm engine demonstration, is described.
Advancing Research in Second Language Writing through Computational Tools and Machine Learning Techniques: A Research Agenda

ERIC Educational Resources Information Center

Crossley, Scott A.

2013-01-01

This paper provides an agenda for replication studies focusing on second language (L2) writing and the use of natural language processing (NLP) tools and machine learning algorithms. Specifically, it introduces a range of the available NLP tools and machine learning algorithms and demonstrates how these could be used to replicate seminal studies…
Searching Process with Raita Algorithm and its Application

NASA Astrophysics Data System (ADS)

Rahim, Robbi; Saleh Ahmar, Ansari; Abdullah, Dahlan; Hartama, Dedy; Napitupulu, Darmawan; Putera Utama Siahaan, Andysah; Hasan Siregar, Muhammad Noor; Nasution, Nurliana; Sundari, Siti; Sriadhi, S.

2018-04-01

Searching is a common process performed by many computer users, Raita algorithm is one algorithm that can be used to match and find information in accordance with the patterns entered. Raita algorithm applied to the file search application using java programming language and the results obtained from the testing process of the file search quickly and with accurate results and support many data types.
Good-enough linguistic representations and online cognitive equilibrium in language processing.

PubMed

Karimi, Hossein; Ferreira, Fernanda

2016-01-01

We review previous research showing that representations formed during language processing are sometimes just "good enough" for the task at hand and propose the "online cognitive equilibrium" hypothesis as the driving force behind the formation of good-enough representations in language processing. Based on this view, we assume that the language comprehension system by default prefers to achieve as early as possible and remain as long as possible in a state of cognitive equilibrium where linguistic representations are successfully incorporated with existing knowledge structures (i.e., schemata) so that a meaningful and coherent overall representation is formed, and uncertainty is resolved or at least minimized. We also argue that the online equilibrium hypothesis is consistent with current theories of language processing, which maintain that linguistic representations are formed through a complex interplay between simple heuristics and deep syntactic algorithms and also theories that hold that linguistic representations are often incomplete and lacking in detail. We also propose a model of language processing that makes use of both heuristic and algorithmic processing, is sensitive to online cognitive equilibrium, and, we argue, is capable of explaining the formation of underspecified representations. We review previous findings providing evidence for underspecification in relation to this hypothesis and the associated language processing model and argue that most of these findings are compatible with them.
Sign Language Recognition System using Neural Network for Digital Hardware Implementation

NASA Astrophysics Data System (ADS)

Vargas, Lorena P.; Barba, Leiner; Torres, C. O.; Mattos, L.

2011-01-01

This work presents an image pattern recognition system using neural network for the identification of sign language to deaf people. The system has several stored image that show the specific symbol in this kind of language, which is employed to teach a multilayer neural network using a back propagation algorithm. Initially, the images are processed to adapt them and to improve the performance of discriminating of the network, including in this process of filtering, reduction and elimination noise algorithms as well as edge detection. The system is evaluated using the signs without including movement in their representation.
Applying a visual language for image processing as a graphical teaching tool in medical imaging

NASA Astrophysics Data System (ADS)

Birchman, James J.; Tanimoto, Steven L.; Rowberg, Alan H.; Choi, Hyung-Sik; Kim, Yongmin

1992-05-01

Typical user interaction in image processing is with command line entries, pull-down menus, or text menu selections from a list, and as such is not generally graphical in nature. Although applying these interactive methods to construct more sophisticated algorithms from a series of simple image processing steps may be clear to engineers and programmers, it may not be clear to clinicians. A solution to this problem is to implement a visual programming language using visual representations to express image processing algorithms. Visual representations promote a more natural and rapid understanding of image processing algorithms by providing more visual insight into what the algorithms do than the interactive methods mentioned above can provide. Individuals accustomed to dealing with images will be more likely to understand an algorithm that is represented visually. This is especially true of referring physicians, such as surgeons in an intensive care unit. With the increasing acceptance of picture archiving and communications system (PACS) workstations and the trend toward increasing clinical use of image processing, referring physicians will need to learn more sophisticated concepts than simply image access and display. If the procedures that they perform commonly, such as window width and window level adjustment and image enhancement using unsharp masking, are depicted visually in an interactive environment, it will be easier for them to learn and apply these concepts. The software described in this paper is a visual programming language for imaging processing which has been implemented on the NeXT computer using NeXTstep user interface development tools and other tools in an object-oriented environment. The concept is based upon the description of a visual language titled `Visualization of Vision Algorithms' (VIVA). Iconic representations of simple image processing steps are placed into a workbench screen and connected together into a dataflow path by the user. As the user creates and edits a dataflow path, more complex algorithms can be built on the screen. Once the algorithm is built, it can be executed, its results can be reviewed, and operator parameters can be interactively adjusted until an optimized output is produced. The optimized algorithm can then be saved and added to the system as a new operator. This system has been evaluated as a graphical teaching tool for window width and window level adjustment, image enhancement using unsharp masking, and other techniques.
Combining Natural Language Processing and Statistical Text Mining: A Study of Specialized versus Common Languages

ERIC Educational Resources Information Center

Jarman, Jay

2011-01-01

This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms,…
Automated identification of patients with pulmonary nodules in an integrated health system using administrative health plan data, radiology reports, and natural language processing.

PubMed

Danforth, Kim N; Early, Megan I; Ngan, Sharon; Kosco, Anne E; Zheng, Chengyi; Gould, Michael K

2012-08-01

Lung nodules are commonly encountered in clinical practice, yet little is known about their management in community settings. An automated method for identifying patients with lung nodules would greatly facilitate research in this area. Using members of a large, community-based health plan from 2006 to 2010, we developed a method to identify patients with lung nodules, by combining five diagnostic codes, four procedural codes, and a natural language processing algorithm that performed free text searches of radiology transcripts. An experienced pulmonologist reviewed a random sample of 116 radiology transcripts, providing a reference standard for the natural language processing algorithm. With the use of an automated method, we identified 7112 unique members as having one or more incident lung nodules. The mean age of the patients was 65 years (standard deviation 14 years). There were slightly more women (54%) than men, and Hispanics and non-whites comprised 45% of the lung nodule cohort. Thirty-six percent were never smokers whereas 11% were current smokers. Fourteen percent of the patients were subsequently diagnosed with lung cancer. The sensitivity and specificity of the natural language processing algorithm for identifying the presence of lung nodules were 96% and 86%, respectively, compared with clinician review. Among the true positive transcripts in the validation sample, only 35% were solitary and unaccompanied by one or more associated findings, and 56% measured 8 to 30 mm in diameter. A combination of diagnostic codes, procedural codes, and a natural language processing algorithm for free text searching of radiology reports can accurately and efficiently identify patients with incident lung nodules, many of whom are subsequently diagnosed with lung cancer.
A graphically oriented specification language for automatic code generation. GRASP/Ada: A Graphical Representation of Algorithms, Structure, and Processes for Ada, phase 1

NASA Technical Reports Server (NTRS)

Cross, James H., II; Morrison, Kelly I.; May, Charles H., Jr.; Waddel, Kathryn C.

1989-01-01

The first phase of a three-phase effort to develop a new graphically oriented specification language which will facilitate the reverse engineering of Ada source code into graphical representations (GRs) as well as the automatic generation of Ada source code is described. A simplified view of the three phases of Graphical Representations for Algorithms, Structure, and Processes for Ada (GRASP/Ada) with respect to three basic classes of GRs is presented. Phase 1 concentrated on the derivation of an algorithmic diagram, the control structure diagram (CSD) (CRO88a) from Ada source code or Ada PDL. Phase 2 includes the generation of architectural and system level diagrams such as structure charts and data flow diagrams and should result in a requirements specification for a graphically oriented language able to support automatic code generation. Phase 3 will concentrate on the development of a prototype to demonstrate the feasibility of this new specification language.
Language Evolution by Iterated Learning with Bayesian Agents

ERIC Educational Resources Information Center

Griffiths, Thomas L.; Kalish, Michael L.

2007-01-01

Languages are transmitted from person to person and generation to generation via a process of iterated learning: people learn a language from other people who once learned that language themselves. We analyze the consequences of iterated learning for learning algorithms based on the principles of Bayesian inference, assuming that learners compute…
Metaphor Identification in Large Texts Corpora

PubMed Central

Neuman, Yair; Assaf, Dan; Cohen, Yohai; Last, Mark; Argamon, Shlomo; Howard, Newton; Frieder, Ophir

2013-01-01

Identifying metaphorical language-use (e.g., sweet child) is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms’ performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus. PMID:23658625
Algorithm and program for information processing with the filin apparatus

NASA Technical Reports Server (NTRS)

Gurin, L. S.; Morkrov, V. S.; Moskalenko, Y. I.; Tsoy, K. A.

1979-01-01

The reduction of spectral radiation data from space sources is described. The algorithm and program for identifying segments of information obtained from the Film telescope-spectrometer on the Salyut-4 are presented. The information segments represent suspected X-ray sources. The proposed algorithm is an algorithm of the lowest level. Following evaluation, information free of uninformative segments is subject to further processing with algorithms of a higher level. The language used is FORTRAN 4.
Computer Music

NASA Astrophysics Data System (ADS)

Cook, Perry R.

This chapter covers algorithms, technologies, computer languages, and systems for computer music. Computer music involves the application of computers and other digital/electronic technologies to music composition, performance, theory, history, and the study of perception. The field combines digital signal processing, computational algorithms, computer languages, hardware and software systems, acoustics, psychoacoustics (low-level perception of sounds from the raw acoustic signal), and music cognition (higher-level perception of musical style, form, emotion, etc.).
Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore.

PubMed

Hardjojo, Antony; Gunachandran, Arunan; Pang, Long; Abdullah, Mohammed Ridzwan Bin; Wah, Win; Chong, Joash Wen Chen; Goh, Ee Hui; Teo, Sok Huang; Lim, Gilbert; Lee, Mong Li; Hsu, Wynne; Lee, Vernon; Chen, Mark I-Cheng; Wong, Franco; Phang, Jonathan Siung King

2018-06-11

Free-text clinical records provide a source of information that complements traditional disease surveillance. To electronically harness these records, they need to be transformed into codified fields by natural language processing algorithms. The aim of this study was to develop, train, and validate Clinical History Extractor for Syndromic Surveillance (CHESS), an natural language processing algorithm to extract clinical information from free-text primary care records. CHESS is a keyword-based natural language processing algorithm to extract 48 signs and symptoms suggesting respiratory infections, gastrointestinal infections, constitutional, as well as other signs and symptoms potentially associated with infectious diseases. The algorithm also captured the assertion status (affirmed, negated, or suspected) and symptom duration. Electronic medical records from the National Healthcare Group Polyclinics, a major public sector primary care provider in Singapore, were randomly extracted and manually reviewed by 2 human reviewers, with a third reviewer as the adjudicator. The algorithm was evaluated based on 1680 notes against the human-coded result as the reference standard, with half of the data used for training and the other half for validation. The symptoms most commonly present within the 1680 clinical records at the episode level were those typically present in respiratory infections such as cough (744/7703, 9.66%), sore throat (591/7703, 7.67%), rhinorrhea (552/7703, 7.17%), and fever (928/7703, 12.04%). At the episode level, CHESS had an overall performance of 96.7% precision and 97.6% recall on the training dataset and 96.0% precision and 93.1% recall on the validation dataset. Symptoms suggesting respiratory and gastrointestinal infections were all detected with more than 90% precision and recall. CHESS correctly assigned the assertion status in 97.3%, 97.9%, and 89.8% of affirmed, negated, and suspected signs and symptoms, respectively (97.6% overall accuracy). Symptom episode duration was correctly identified in 81.2% of records with known duration status. We have developed an natural language processing algorithm dubbed CHESS that achieves good performance in extracting signs and symptoms from primary care free-text clinical records. In addition to the presence of symptoms, our algorithm can also accurately distinguish affirmed, negated, and suspected assertion statuses and extract symptom durations. ©Antony Hardjojo, Arunan Gunachandran, Long Pang, Mohammed Ridzwan Bin Abdullah, Win Wah, Joash Wen Chen Chong, Ee Hui Goh, Sok Huang Teo, Gilbert Lim, Mong Li Lee, Wynne Hsu, Vernon Lee, Mark I-Cheng Chen, Franco Wong, Jonathan Siung King Phang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 11.06.2018.
Combining Machine Learning and Natural Language Processing to Assess Literary Text Comprehension

ERIC Educational Resources Information Center

Balyan, Renu; McCarthy, Kathryn S.; McNamara, Danielle S.

2017-01-01

This study examined how machine learning and natural language processing (NLP) techniques can be leveraged to assess the interpretive behavior that is required for successful literary text comprehension. We compared the accuracy of seven different machine learning classification algorithms in predicting human ratings of student essays about…
Block Architecture Problem with Depth First Search Solution and Its Application

NASA Astrophysics Data System (ADS)

Rahim, Robbi; Abdullah, Dahlan; Simarmata, Janner; Pranolo, Andri; Saleh Ahmar, Ansari; Hidayat, Rahmat; Napitupulu, Darmawan; Nurdiyanto, Heri; Febriadi, Bayu; Zamzami, Z.

2018-01-01

Searching is a common process performed by many computer users, Raita algorithm is one algorithm that can be used to match and find information in accordance with the patterns entered. Raita algorithm applied to the file search application using java programming language and the results obtained from the testing process of the file search quickly and with accurate results and support many data types.
Graphical Language for Data Processing

NASA Technical Reports Server (NTRS)

Alphonso, Keith

2011-01-01

A graphical language for processing data allows processing elements to be connected with virtual wires that represent data flows between processing modules. The processing of complex data, such as lidar data, requires many different algorithms to be applied. The purpose of this innovation is to automate the processing of complex data, such as LIDAR, without the need for complex scripting and programming languages. The system consists of a set of user-interface components that allow the user to drag and drop various algorithmic and processing components onto a process graph. By working graphically, the user can completely visualize the process flow and create complex diagrams. This innovation supports the nesting of graphs, such that a graph can be included in another graph as a single step for processing. In addition to the user interface components, the system includes a set of .NET classes that represent the graph internally. These classes provide the internal system representation of the graphical user interface. The system includes a graph execution component that reads the internal representation of the graph (as described above) and executes that graph. The execution of the graph follows the interpreted model of execution in that each node is traversed and executed from the original internal representation. In addition, there are components that allow external code elements, such as algorithms, to be easily integrated into the system, thus making the system infinitely expandable.

Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech

NASA Astrophysics Data System (ADS)

Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.

1996-01-01

A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.
Mandarin Chinese Tone Identification in Cochlear Implants: Predictions from Acoustic Models

PubMed Central

Morton, Kenneth D.; Torrione, Peter A.; Throckmorton, Chandra S.; Collins, Leslie M.

2015-01-01

It has been established that current cochlear implants do not supply adequate spectral information for perception of tonal languages. Comprehension of a tonal language, such as Mandarin Chinese, requires recognition of lexical tones. New strategies of cochlear stimulation such as variable stimulation rate and current steering may provide the means of delivering more spectral information and thus may provide the auditory fine structure required for tone recognition. Several cochlear implant signal processing strategies are examined in this study, the continuous interleaved sampling (CIS) algorithm, the frequency amplitude modulation encoding (FAME) algorithm, and the multiple carrier frequency algorithm (MCFA). These strategies provide different types and amounts of spectral information. Pattern recognition techniques can be applied to data from Mandarin Chinese tone recognition tasks using acoustic models as a means of testing the abilities of these algorithms to transmit the changes in fundamental frequency indicative of the four lexical tones. The ability of processed Mandarin Chinese tones to be correctly classified may predict trends in the effectiveness of different signal processing algorithms in cochlear implants. The proposed techniques can predict trends in performance of the signal processing techniques in quiet conditions but fail to do so in noise. PMID:18706497
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing

NASA Astrophysics Data System (ADS)

Johl, John T.; Baker, Nick C.

1988-10-01

The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
The riddle of Tasmanian languages

PubMed Central

Bowern, Claire

2012-01-01

Recent work which combines methods from linguistics and evolutionary biology has been fruitful in discovering the history of major language families because of similarities in evolutionary processes. Such work opens up new possibilities for language research on previously unsolvable problems, especially in areas where information from other sources may be lacking. I use phylogenetic methods to investigate Tasmanian languages. Existing materials are so fragmentary that scholars have been unable to discover how many languages are represented in the sources. Using a clustering algorithm which identifies admixture, source materials representing more than one language are identified. Using the Neighbor-Net algorithm, 12 languages are identified in five clusters. Bayesian phylogenetic methods reveal that the families are not demonstrably related; an important result, given the importance of Tasmanian Aborigines for information about how societies have responded to population collapse in prehistory. This work provides insight into the societies of prehistoric Tasmania and illustrates a new utility of phylogenetics in reconstructing linguistic history. PMID:23015621
Accurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing.

PubMed

Redman, Joseph S; Natarajan, Yamini; Hou, Jason K; Wang, Jingqi; Hanif, Muzammil; Feng, Hua; Kramer, Jennifer R; Desiderio, Roxanne; Xu, Hua; El-Serag, Hashem B; Kanwal, Fasiha

2017-10-01

Natural language processing is a powerful technique of machine learning capable of maximizing data extraction from complex electronic medical records. We utilized this technique to develop algorithms capable of "reading" full-text radiology reports to accurately identify the presence of fatty liver disease. Abdominal ultrasound, computerized tomography, and magnetic resonance imaging reports were retrieved from the Veterans Affairs Corporate Data Warehouse from a random national sample of 652 patients. Radiographic fatty liver disease was determined by manual review by two physicians and verified with an expert radiologist. A split validation method was utilized for algorithm development. For all three imaging modalities, the algorithms could identify fatty liver disease with >90% recall and precision, with F-measures >90%. These algorithms could be used to rapidly screen patient records to establish a large cohort to facilitate epidemiological and clinical studies and examine the clinic course and outcomes of patients with radiographic hepatic steatosis.
Predicting Survey Responses: How and Why Semantics Shape Survey Statistics on Organizational Behaviour

PubMed Central

Arnulf, Jan Ketil; Larsen, Kai Rune; Martinsen, Øyvind Lund; Bong, Chih How

2014-01-01

Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60–86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain. PMID:25184672
Predicting survey responses: how and why semantics shape survey statistics on organizational behaviour.

PubMed

Arnulf, Jan Ketil; Larsen, Kai Rune; Martinsen, Øyvind Lund; Bong, Chih How

2014-01-01

Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60-86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.
The MINERVA Software Development Process

NASA Technical Reports Server (NTRS)

Narkawicz, Anthony; Munoz, Cesar A.; Dutle, Aaron M.

2017-01-01

This paper presents a software development process for safety-critical software components of cyber-physical systems. The process is called MINERVA, which stands for Mirrored Implementation Numerically Evaluated against Rigorously Verified Algorithms. The process relies on formal methods for rigorously validating code against its requirements. The software development process uses: (1) a formal specification language for describing the algorithms and their functional requirements, (2) an interactive theorem prover for formally verifying the correctness of the algorithms, (3) test cases that stress the code, and (4) numerical evaluation on these test cases of both the algorithm specifications and their implementations in code. The MINERVA process is illustrated in this paper with an application to geo-containment algorithms for unmanned aircraft systems. These algorithms ensure that the position of an aircraft never leaves a predetermined polygon region and provide recovery maneuvers when the region is inadvertently exited.
Indonesian Sign Language Number Recognition using SIFT Algorithm

NASA Astrophysics Data System (ADS)

Mahfudi, Isa; Sarosa, Moechammad; Andrie Asmara, Rosa; Azrino Gustalika, M.

2018-04-01

Indonesian sign language (ISL) is generally used for deaf individuals and poor people communication in communicating. They use sign language as their primary language which consists of 2 types of action: sign and finger spelling. However, not all people understand their sign language so that this becomes a problem for them to communicate with normal people. this problem also becomes a factor they are isolated feel from the social life. It needs a solution that can help them to be able to interacting with normal people. Many research that offers a variety of methods in solving the problem of sign language recognition based on image processing. SIFT (Scale Invariant Feature Transform) algorithm is one of the methods that can be used to identify an object. SIFT is claimed very resistant to scaling, rotation, illumination and noise. Using SIFT algorithm for Indonesian sign language recognition number result rate recognition to 82% with the use of a total of 100 samples image dataset consisting 50 sample for training data and 50 sample images for testing data. Change threshold value get affect the result of the recognition. The best value threshold is 0.45 with rate recognition of 94%.
The implement of Talmud property allocation algorithm based on graphic point-segment way

NASA Astrophysics Data System (ADS)

Cen, Haifeng

2017-04-01

Under the guidance of the Talmud allocation scheme's theory, the paper analyzes the algorithm implemented process via the perspective of graphic point-segment way, and designs the point-segment way's Talmud property allocation algorithm. Then it uses Java language to implement the core of allocation algorithm, by using Android programming to build a visual interface.
Building a Reference Resolution System Using Human Language Processing for Inspiration

ERIC Educational Resources Information Center

Watters, Shana Kay

2010-01-01

For over 30 years, reference resolution, the process of determining what a noun phrase including a pronoun refers to in written and spoken language, has been an important and on-going area of research. Most existing pronominal reference resolution algorithms and systems are designed to use syntactic information and surface features (e.g. number…
Image Processing Algorithms in the Secondary School Programming Education

ERIC Educational Resources Information Center

Gerják, István

2017-01-01

Learning computer programming for students of the age of 14-18 is difficult and requires endurance and engagement. Being familiar with the syntax of a computer language and writing programs in it are challenges for youngsters, not to mention that understanding algorithms is also a big challenge. To help students in the learning process, teachers…
The Research and Implementation of MUSER CLEAN Algorithm Based on OpenCL

NASA Astrophysics Data System (ADS)

Feng, Y.; Chen, K.; Deng, H.; Wang, F.; Mei, Y.; Wei, S. L.; Dai, W.; Yang, Q. P.; Liu, Y. B.; Wu, J. P.

2017-03-01

It's urgent to carry out high-performance data processing with a single machine in the development of astronomical software. However, due to the different configuration of the machine, traditional programming techniques such as multi-threading, and CUDA (Compute Unified Device Architecture)+GPU (Graphic Processing Unit) have obvious limitations in portability and seamlessness between different operation systems. The OpenCL (Open Computing Language) used in the development of MUSER (MingantU SpEctral Radioheliograph) data processing system is introduced. And the Högbom CLEAN algorithm is re-implemented into parallel CLEAN algorithm by the Python language and PyOpenCL extended package. The experimental results show that the CLEAN algorithm based on OpenCL has approximately equally operating efficiency compared with the former CLEAN algorithm based on CUDA. More important, the data processing in merely CPU (Central Processing Unit) environment of this system can also achieve high performance, which has solved the problem of environmental dependence of CUDA+GPU. Overall, the research improves the adaptability of the system with emphasis on performance of MUSER image clean computing. In the meanwhile, the realization of OpenCL in MUSER proves its availability in scientific data processing. In view of the high-performance computing features of OpenCL in heterogeneous environment, it will probably become the preferred technology in the future high-performance astronomical software development.
Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records.

PubMed

Luo, Yuan; Szolovits, Peter

2016-01-01

In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen's interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen's relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.
Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records

PubMed Central

Luo, Yuan; Szolovits, Peter

2016-01-01

In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. PMID:27478379
Unsupervised learning of natural languages

PubMed Central

Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

2005-01-01

We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics. PMID:16087885
Unsupervised learning of natural languages.

PubMed

Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

2005-08-16

We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
What Automated Vocal Analysis Reveals about the Vocal Production and Language Learning Environment of Young Children with Autism

ERIC Educational Resources Information Center

Warren, Steven F.; Gilkerson, Jill; Richards, Jeffrey A.; Oller, D. Kimbrough; Xu, Dongxin; Yapanel, Umit; Gray, Sharmistha

2010-01-01

The study compared the vocal production and language learning environments of 26 young children with autism spectrum disorder (ASD) to 78 typically developing children using measures derived from automated vocal analysis. A digital language processor and audio-processing algorithms measured the amount of adult words to children and the amount of…
FPGA Coprocessor for Accelerated Classification of Images

NASA Technical Reports Server (NTRS)

Pingree, Paula J.; Scharenbroich, Lucas J.; Werne, Thomas A.

2008-01-01

An effort related to that described in the preceding article focuses on developing a spaceborne processing platform for fast and accurate onboard classification of image data, a critical part of modern satellite image processing. The approach again has been to exploit the versatility of recently developed hybrid Virtex-4FX field-programmable gate array (FPGA) to run diverse science applications on embedded processors while taking advantage of the reconfigurable hardware resources of the FPGAs. In this case, the FPGA serves as a coprocessor that implements legacy C-language support-vector-machine (SVM) image-classification algorithms to detect and identify natural phenomena such as flooding, volcanic eruptions, and sea-ice break-up. The FPGA provides hardware acceleration for increased onboard processing capability than previously demonstrated in software. The original C-language program demonstrated on an imaging instrument aboard the Earth Observing-1 (EO-1) satellite implements a linear-kernel SVM algorithm for classifying parts of the images as snow, water, ice, land, or cloud or unclassified. Current onboard processors, such as on EO-1, have limited computing power, extremely limited active storage capability and are no longer considered state-of-the-art. Using commercially available software that translates C-language programs into hardware description language (HDL) files, the legacy C-language program, and two newly formulated programs for a more capable expanded-linear-kernel and a more accurate polynomial-kernel SVM algorithm, have been implemented in the Virtex-4FX FPGA. In tests, the FPGA implementations have exhibited significant speedups over conventional software implementations running on general-purpose hardware.
Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing.

PubMed

Sada, Yvonne; Hou, Jason; Richardson, Peter; El-Serag, Hashem; Davila, Jessica

2016-02-01

Accurate identification of hepatocellular cancer (HCC) cases from automated data is needed for efficient and valid quality improvement initiatives and research. We validated HCC International Classification of Diseases, 9th Revision (ICD-9) codes, and evaluated whether natural language processing by the Automated Retrieval Console (ARC) for document classification improves HCC identification. We identified a cohort of patients with ICD-9 codes for HCC during 2005-2010 from Veterans Affairs administrative data. Pathology and radiology reports were reviewed to confirm HCC. The positive predictive value (PPV), sensitivity, and specificity of ICD-9 codes were calculated. A split validation study of pathology and radiology reports was performed to develop and validate ARC algorithms. Reports were manually classified as diagnostic of HCC or not. ARC generated document classification algorithms using the Clinical Text Analysis and Knowledge Extraction System. ARC performance was compared with manual classification. PPV, sensitivity, and specificity of ARC were calculated. A total of 1138 patients with HCC were identified by ICD-9 codes. On the basis of manual review, 773 had HCC. The HCC ICD-9 code algorithm had a PPV of 0.67, sensitivity of 0.95, and specificity of 0.93. For a random subset of 619 patients, we identified 471 pathology reports for 323 patients and 943 radiology reports for 557 patients. The pathology ARC algorithm had PPV of 0.96, sensitivity of 0.96, and specificity of 0.97. The radiology ARC algorithm had PPV of 0.75, sensitivity of 0.94, and specificity of 0.68. A combined approach of ICD-9 codes and natural language processing of pathology and radiology reports improves HCC case identification in automated data.

The PlusCal Algorithm Language

NASA Astrophysics Data System (ADS)

Lamport, Leslie

Algorithms are different from programs and should not be described with programming languages. The only simple alternative to programming languages has been pseudo-code. PlusCal is an algorithm language that can be used right now to replace pseudo-code, for both sequential and concurrent algorithms. It is based on the TLA + specification language, and a PlusCal algorithm is automatically translated to a TLA + specification that can be checked with the TLC model checker and reasoned about formally.
Computational Evaluation of the Traceback Method

ERIC Educational Resources Information Center

Kol, Sheli; Nir, Bracha; Wintner, Shuly

2014-01-01

Several models of language acquisition have emerged in recent years that rely on computational algorithms for simulation and evaluation. Computational models are formal and precise, and can thus provide mathematically well-motivated insights into the process of language acquisition. Such models are amenable to robust computational evaluation,…
On-Demand Associative Cross-Language Information Retrieval

NASA Astrophysics Data System (ADS)

Geraldo, André Pinto; Moreira, Viviane P.; Gonçalves, Marcos A.

This paper proposes the use of algorithms for mining association rules as an approach for Cross-Language Information Retrieval. These algorithms have been widely used to analyse market basket data. The idea is to map the problem of finding associations between sales items to the problem of finding term translations over a parallel corpus. The proposal was validated by means of experiments using queries in two distinct languages: Portuguese and Finnish to retrieve documents in English. The results show that the performance of our proposed approach is comparable to the performance of the monolingual baseline and to query translation via machine translation, even though these systems employ more complex Natural Language Processing techniques. The combination between machine translation and our approach yielded the best results, even outperforming the monolingual baseline.
A real-time implementation of an advanced sensor failure detection, isolation, and accommodation algorithm

NASA Technical Reports Server (NTRS)

Delaat, J. C.; Merrill, W. C.

1983-01-01

A sensor failure detection, isolation, and accommodation algorithm was developed which incorporates analytic sensor redundancy through software. This algorithm was implemented in a high level language on a microprocessor based controls computer. Parallel processing and state-of-the-art 16-bit microprocessors are used along with efficient programming practices to achieve real-time operation.
Implementing Linear Algebra Related Algorithms on the TI-92+ Calculator.

ERIC Educational Resources Information Center

Alexopoulos, John; Abraham, Paul

2001-01-01

Demonstrates a less utilized feature of the TI-92+: its natural and powerful programming language. Shows how to implement several linear algebra related algorithms including the Gram-Schmidt process, Least Squares Approximations, Wronskians, Cholesky Decompositions, and Generalized Linear Least Square Approximations with QR Decompositions.…
[Research and realization of signal processing algorithms based on FPGA in digital ophthalmic ultrasonography imaging].

PubMed

Fang, Simin; Zhou, Sheng; Wang, Xiaochun; Ye, Qingsheng; Tian, Ling; Ji, Jianjun; Wang, Yanqun

2015-01-01

To design and improve signal processing algorithms of ophthalmic ultrasonography based on FPGA. Achieved three signal processing modules: full parallel distributed dynamic filter, digital quadrature demodulation, logarithmic compression, using Verilog HDL hardware language in Quartus II. Compared to the original system, the hardware cost is reduced, the whole image shows clearer and more information of the deep eyeball contained in the image, the depth of detection increases from 5 cm to 6 cm. The new algorithms meet the design requirements and achieve the system's optimization that they can effectively improve the image quality of existing equipment.
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML)

PubMed Central

Lechevalier, D.; Ak, R.; Ferguson, M.; Law, K. H.; Lee, Y.-T. T.; Rachuri, S.

2017-01-01

This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain. PMID:29202125
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML).

PubMed

Park, J; Lechevalier, D; Ak, R; Ferguson, M; Law, K H; Lee, Y-T T; Rachuri, S

2017-01-01

This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain.
Natural Language Processing Of Online Propaganda As A Means Of Passively Monitoring An Adversarial Ideology

DTIC Science & Technology

2017-03-01

Warfare. 14. SUBJECT TERMS data mining, natural language processing, machine learning, algorithm design , information warfare, propaganda 15. NUMBER OF...Speech Tags. Adapted from [12]. CC Coordinating conjunction PRP$ Possessive pronoun CD Cardinal number RB Adverb DT Determiner RBR Adverb, comparative ... comparative UH Interjection JJS Adjective, superlative VB Verb, base form LS List item marker VBD Verb, past tense MD Modal VBG Verb, gerund or
Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.

PubMed

Becker, Matthias; Böckmann, Britta

2016-01-01

Automatic information extraction of medical concepts and classification with semantic standards from medical reports is useful for standardization and for clinical research. This paper presents an approach for an UMLS concept extraction with a customized natural language processing pipeline for German clinical notes using Apache cTAKES. The objectives are, to test the natural language processing tool for German language if it is suitable to identify UMLS concepts and map these with SNOMED-CT. The German UMLS database and German OpenNLP models extended the natural language processing pipeline, so the pipeline can normalize to domain ontologies such as SNOMED-CT using the German concepts. For testing, the ShARe/CLEF eHealth 2013 training dataset translated into German was used. The implemented algorithms are tested with a set of 199 German reports, obtaining a result of average 0.36 F1 measure without German stemming, pre- and post-processing of the reports.
Objective speech quality assessment and the RPE-LTP coding algorithm in different noise and language conditions.

PubMed

Hansen, J H; Nandkumar, S

1995-01-01

The formulation of reliable signal processing algorithms for speech coding and synthesis require the selection of a prior criterion of performance. Though coding efficiency (bits/second) or computational requirements can be used, a final performance measure must always include speech quality. In this paper, three objective speech quality measures are considered with respect to quality assessment for American English, noisy American English, and noise-free versions of seven languages. The purpose is to determine whether objective quality measures can be used to quantify changes in quality for a given voice coding method, with a known subjective performance level, as background noise or language conditions are changed. The speech coding algorithm chosen is regular-pulse excitation with long-term prediction (RPE-LTP), which has been chosen as the standard voice compression algorithm for the European Digital Mobile Radio system. Three areas are considered for objective quality assessment which include: (i) vocoder performance for American English in a noise-free environment, (ii) speech quality variation for three additive background noise sources, and (iii) noise-free performance for seven languages which include English, Japanese, Finnish, German, Hindi, Spanish, and French. It is suggested that although existing objective quality measures will never replace subjective testing, they can be a useful means of assessing changes in performance, identifying areas for improvement in algorithm design, and augmenting subjective quality tests for voice coding/compression algorithms in noise-free, noisy, and/or non-English applications.
How Formal Methods Impels Discovery: A Short History of an Air Traffic Management Project

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Hagen, George; Maddalon, Jeffrey M.; Munoz, Cesar A.; Narkawicz, Anthony; Dowek, Gilles

2010-01-01

In this paper we describe a process of algorithmic discovery that was driven by our goal of achieving complete, mechanically verified algorithms that compute conflict prevention bands for use in en route air traffic management. The algorithms were originally defined in the PVS specification language and subsequently have been implemented in Java and C++. We do not present the proofs in this paper: instead, we describe the process of discovery and the key ideas that enabled the final formal proof of correctness
A portable approach for PIC on emerging architectures

NASA Astrophysics Data System (ADS)

Decyk, Viktor

2016-03-01

A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
The knowledge instinct, cognitive algorithms, modeling of language and cultural evolution

NASA Astrophysics Data System (ADS)

Perlovsky, Leonid I.

2008-04-01

The talk discusses mechanisms of the mind and their engineering applications. The past attempts at designing "intelligent systems" encountered mathematical difficulties related to algorithmic complexity. The culprit turned out to be logic, which in one way or another was used not only in logic rule systems, but also in statistical, neural, and fuzzy systems. Algorithmic complexity is related to Godel's theory, a most fundamental mathematical result. These difficulties were overcome by replacing logic with a dynamic process "from vague to crisp," dynamic logic. It leads to algorithms overcoming combinatorial complexity, and resulting in orders of magnitude improvement in classical problems of detection, tracking, fusion, and prediction in noise. I present engineering applications to pattern recognition, detection, tracking, fusion, financial predictions, and Internet search engines. Mathematical and engineering efficiency of dynamic logic can also be understood as cognitive algorithm, which describes fundamental property of the mind, the knowledge instinct responsible for all our higher cognitive functions: concepts, perception, cognition, instincts, imaginations, intuitions, emotions, including emotions of the beautiful. I present our latest results in modeling evolution of languages and cultures, their interactions in these processes, and role of music in cultural evolution. Experimental data is presented that support the theory. Future directions are outlined.
A novel robust Arabic light stemmer

NASA Astrophysics Data System (ADS)

Abainia, Kheireddine; Ouamour, Siham; Sayoud, Halim

2017-05-01

The stemming is the process of transforming a word into its root or stem, hence, it is considered as a crucial pre-processing step before tackling any task of natural language processing or information retrieval. However, in the case of Arabic language, finding an effective stemming algorithm seems to be quite difficult, since the Arabic language has a specific morphology, which is different from many other languages. Although, there exist several algorithms in literature addressing the Arabic stemming issue, unfortunately, most of them are restricted to a limited number of words, present some confusions between original letters and affixes, and usually employ dictionary of words or patterns. For that purpose, we propose the design and implementation of a novel Arabic light stemmer, which is based on some new rules for stripping prefixes, suffixes and infixes in a smart way. And in our knowledge, it is the first work dealing with Arabic infixes with regards to their irregular rules. The empirical evaluation was conducted on a new Arabic data-set (called ARASTEM), which was conceived and collected from several Arabic discussion forums containing dialectical Arabic and modern pseudo-Arabic languages. Hence, we present a comparative investigation between our new stemmer and other existing stemmers using Paice's parameters, namely: Under Stemming Index (UI), Over Stemming Index (OI) and Stemming Weight (SW). Results show that the proposed Arabic light stemmer maintains consistently high performances and outperforms several existing light stemmers.
Genetic algorithms using SISAL parallel programming language

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tejada, S.

1994-05-06

Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.
Language Classification using N-grams Accelerated by FPGA-based Bloom Filters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jacob, A; Gokhale, M

N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.
Sentiment analysis of Arabic tweets using text mining techniques

NASA Astrophysics Data System (ADS)

Al-Horaibi, Lamia; Khan, Muhammad Badruddin

2016-07-01

Sentiment analysis has become a flourishing field of text mining and natural language processing. Sentiment analysis aims to determine whether the text is written to express positive, negative, or neutral emotions about a certain domain. Most sentiment analysis researchers focus on English texts, with very limited resources available for other complex languages, such as Arabic. In this study, the target was to develop an initial model that performs satisfactorily and measures Arabic Twitter sentiment by using machine learning approach, Naïve Bayes and Decision Tree for classification algorithms. The datasets used contains more than 2,000 Arabic tweets collected from Twitter. We performed several experiments to check the performance of the two algorithms classifiers using different combinations of text-processing functions. We found that available facilities for Arabic text processing need to be made from scratch or improved to develop accurate classifiers. The small functionalities developed by us in a Python language environment helped improve the results and proved that sentiment analysis in the Arabic domain needs lot of work on the lexicon side.
Computer Music

NASA Astrophysics Data System (ADS)

Cook, Perry

This chapter covers algorithms, technologies, computer languages, and systems for computer music. Computer music involves the application of computers and other digital/electronic technologies to music composition, performance, theory, history, and perception. The field combines digital signal processing, computational algorithms, computer languages, hardware and software systems, acoustics, psychoacoustics (low-level perception of sounds from the raw acoustic signal), and music cognition (higher-level perception of musical style, form, emotion, etc.). Although most people would think that analog synthesizers and electronic music substantially predate the use of computers in music, many experiments and complete computer music systems were being constructed and used as early as the 1950s.
StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data.

PubMed

Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G; Khanna, Sanjeev

2017-06-01

Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings.

StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data*

PubMed Central

Mamouras, Konstantinos; Raghothaman, Mukund; Alur, Rajeev; Ives, Zachary G.; Khanna, Sanjeev

2017-01-01

Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data. Our experimental evaluation shows that (1) StreamQRE allows more natural and succinct specification of queries compared to existing frameworks, (2) the throughput of our implementation is higher than comparable systems (for example, two-to-four times greater than RxJava), and (3) the approximation algorithms supported by our implementation can lead to substantial memory savings. PMID:29151821
An Automated Method to Generate e-Learning Quizzes from Online Language Learner Writing

ERIC Educational Resources Information Center

Flanagan, Brendan; Yin, Chengjiu; Hirokawa, Sachio; Hashimoto, Kiyota; Tabata, Yoshiyuki

2013-01-01

In this paper, the entries of Lang-8, which is a Social Networking Site (SNS) site for learning and practicing foreign languages, were analyzed and found to contain similar rates of errors for most error categories reported in previous research. These similarly rated errors were then processed using an algorithm to determine corrections suggested…
The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data.

ERIC Educational Resources Information Center

Popovic, Mirko; Willett, Peter

1992-01-01

Reports on the use of stemming for Slovene language documents and queries in free-text retrieval systems and demonstrates that an appropriate stemming algorithm results in an increase in retrieval effectiveness when compared with nonstemming processing. A comparison is made with stemming of English versions of the same documents and queries. (24…
HDL Based FPGA Interface Library for Data Acquisition and Multipurpose Real Time Algorithms

NASA Astrophysics Data System (ADS)

Fernandes, Ana M.; Pereira, R. C.; Sousa, J.; Batista, A. J. N.; Combo, A.; Carvalho, B. B.; Correia, C. M. B. A.; Varandas, C. A. F.

2011-08-01

The inherent parallelism of the logic resources, the flexibility in its configuration and the performance at high processing frequencies makes the field programmable gate array (FPGA) the most suitable device to be used both for real time algorithm processing and data transfer in instrumentation modules. Moreover, the reconfigurability of these FPGA based modules enables exploiting different applications on the same module. When using a reconfigurable module for various applications, the availability of a common interface library for easier implementation of the algorithms on the FPGA leads to more efficient development. The FPGA configuration is usually specified in a hardware description language (HDL) or other higher level descriptive language. The critical paths, such as the management of internal hardware clocks that require deep knowledge of the module behavior shall be implemented in HDL to optimize the timing constraints. The common interface library should include these critical paths, freeing the application designer from hardware complexity and able to choose any of the available high-level abstraction languages for the algorithm implementation. With this purpose a modular Verilog code was developed for the Virtex 4 FPGA of the in-house Transient Recorder and Processor (TRP) hardware module, based on the Advanced Telecommunications Computing Architecture (ATCA), with eight channels sampling at up to 400 MSamples/s (MSPS). The TRP was designed to perform real time Pulse Height Analysis (PHA), Pulse Shape Discrimination (PSD) and Pile-Up Rejection (PUR) algorithms at a high count rate (few Mevent/s). A brief description of this modular code is presented and examples of its use as an interface with end user algorithms, including a PHA with PUR, are described.
ISLE (Image and Signal LISP Environment): A functional language interface for signal and image processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azevedo, S.G.; Fitch, J.P.

1987-10-21

Conventional software interfaces that use imperative computer commands or menu interactions are often restrictive environments when used for researching new algorithms or analyzing processed experimental data. We found this to be true with current signal-processing software (SIG). As an alternative, ''functional language'' interfaces provide features such as command nesting for a more natural interaction with the data. The Image and Signal LISP Environment (ISLE) is an example of an interpreted functional language interface based on common LISP. Advantages of ISLE include multidimensional and multiple data-type independence through dispatching functions, dynamic loading of new functions, and connections to artificial intelligence (AI)more » software. 10 refs.« less
ISLE (Image and Signal Lisp Environment): A functional language interface for signal and image processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azevedo, S.G.; Fitch, J.P.

1987-05-01

Conventional software interfaces which utilize imperative computer commands or menu interactions are often restrictive environments when used for researching new algorithms or analyzing processed experimental data. We found this to be true with current signal processing software (SIG). Existing ''functional language'' interfaces provide features such as command nesting for a more natural interaction with the data. The Image and Signal Lisp Environment (ISLE) will be discussed as an example of an interpreted functional language interface based on Common LISP. Additional benefits include multidimensional and multiple data-type independence through dispatching functions, dynamic loading of new functions, and connections to artificial intelligencemore » software.« less
Knowledge-Based Object Detection in Laser Scanning Point Clouds

NASA Astrophysics Data System (ADS)

Boochs, F.; Karmacharya, A.; Marbs, A.

2012-07-01

Object identification and object processing in 3D point clouds have always posed challenges in terms of effectiveness and efficiency. In practice, this process is highly dependent on human interpretation of the scene represented by the point cloud data, as well as the set of modeling tools available for use. Such modeling algorithms are data-driven and concentrate on specific features of the objects, being accessible to numerical models. We present an approach that brings the human expert knowledge about the scene, the objects inside, and their representation by the data and the behavior of algorithms to the machine. This "understanding" enables the machine to assist human interpretation of the scene inside the point cloud. Furthermore, it allows the machine to understand possibilities and limitations of algorithms and to take this into account within the processing chain. This not only assists the researchers in defining optimal processing steps, but also provides suggestions when certain changes or new details emerge from the point cloud. Our approach benefits from the advancement in knowledge technologies within the Semantic Web framework. This advancement has provided a strong base for applications based on knowledge management. In the article we will present and describe the knowledge technologies used for our approach such as Web Ontology Language (OWL), used for formulating the knowledge base and the Semantic Web Rule Language (SWRL) with 3D processing and topologic built-ins, aiming to combine geometrical analysis of 3D point clouds, and specialists' knowledge of the scene and algorithmic processing.
Enforcing Memory Policy Specifications in Reconfigurable Hardware

DTIC Science & Technology

2008-10-01

we explain the algorithms behind our reference monitor design flow. In Section 4, we describe our access policy language including several example...NFA from this regular expression using Thompson’s Algorithm [1] as implemented by Gerzic [19]. Figure 4 shows the NFA for our policy. Notice that the... Algorithm [1] as implemented by Grail [49] to minimize the DFA. Figure 5 shows the minimized DFA for our policy. Processing the Ranges Before we can
An engineering approach to automatic programming

NASA Technical Reports Server (NTRS)

Rubin, Stuart H.

1990-01-01

An exploratory study of the automatic generation and optimization of symbolic programs using DECOM - a prototypical requirement specification model implemented in pure LISP was undertaken. It was concluded, on the basis of this study, that symbolic processing languages such as LISP can support a style of programming based upon formal transformation and dependent upon the expression of constraints in an object-oriented environment. Such languages can represent all aspects of the software generation process (including heuristic algorithms for effecting parallel search) as dynamic processes since data and program are represented in a uniform format.
Interpreting Quantifier Scope Ambiguity: Evidence of Heuristic First, Algorithmic Second Processing

PubMed Central

Dwivedi, Veena D.

2013-01-01

The present work suggests that sentence processing requires both heuristic and algorithmic processing streams, where the heuristic processing strategy precedes the algorithmic phase. This conclusion is based on three self-paced reading experiments in which the processing of two-sentence discourses was investigated, where context sentences exhibited quantifier scope ambiguity. Experiment 1 demonstrates that such sentences are processed in a shallow manner. Experiment 2 uses the same stimuli as Experiment 1 but adds questions to ensure deeper processing. Results indicate that reading times are consistent with a lexical-pragmatic interpretation of number associated with context sentences, but responses to questions are consistent with the algorithmic computation of quantifier scope. Experiment 3 shows the same pattern of results as Experiment 2, despite using stimuli with different lexical-pragmatic biases. These effects suggest that language processing can be superficial, and that deeper processing, which is sensitive to structure, only occurs if required. Implications for recent studies of quantifier scope ambiguity are discussed. PMID:24278439
Performance Review of Harmony Search, Differential Evolution and Particle Swarm Optimization

NASA Astrophysics Data System (ADS)

Mohan Pandey, Hari

2017-08-01

Metaheuristic algorithms are effective in the design of an intelligent system. These algorithms are widely applied to solve complex optimization problems, including image processing, big data analytics, language processing, pattern recognition and others. This paper presents a performance comparison of three meta-heuristic algorithms, namely Harmony Search, Differential Evolution, and Particle Swarm Optimization. These algorithms are originated altogether from different fields of meta-heuristics yet share a common objective. The standard benchmark functions are used for the simulation. Statistical tests are conducted to derive a conclusion on the performance. The key motivation to conduct this research is to categorize the computational capabilities, which might be useful to the researchers.
VHP - An environment for the remote visualization of heuristic processes

NASA Technical Reports Server (NTRS)

Crawford, Stuart L.; Leiner, Barry M.

1991-01-01

A software system called VHP is introduced which permits the visualization of heuristic algorithms on both resident and remote hardware platforms. The VHP is based on the DCF tool for interprocess communication and is applicable to remote algorithms which can be on different types of hardware and in languages other than VHP. The VHP system is of particular interest to systems in which the visualization of remote processes is required such as robotics for telescience applications.
Human Memory Organization for Computer Programs.

ERIC Educational Resources Information Center

Norcio, A. F.; Kerst, Stephen M.

1983-01-01

Results of study investigating human memory organization in processing of computer programming languages indicate that algorithmic logic segments form a cognitive organizational structure in memory for programs. Statement indentation and internal program documentation did not enhance organizational process of recall of statements in five Fortran…
Parallel exploitation of a spatial-spectral classification approach for hyperspectral images on RVC-CAL

NASA Astrophysics Data System (ADS)

Lazcano, R.; Madroñal, D.; Fabelo, H.; Ortega, S.; Salvador, R.; Callicó, G. M.; Juárez, E.; Sanz, C.

2017-10-01

Hyperspectral Imaging (HI) assembles high resolution spectral information from hundreds of narrow bands across the electromagnetic spectrum, thus generating 3D data cubes in which each pixel gathers the spectral information of the reflectance of every spatial pixel. As a result, each image is composed of large volumes of data, which turns its processing into a challenge, as performance requirements have been continuously tightened. For instance, new HI applications demand real-time responses. Hence, parallel processing becomes a necessity to achieve this requirement, so the intrinsic parallelism of the algorithms must be exploited. In this paper, a spatial-spectral classification approach has been implemented using a dataflow language known as RVCCAL. This language represents a system as a set of functional units, and its main advantage is that it simplifies the parallelization process by mapping the different blocks over different processing units. The spatial-spectral classification approach aims at refining the classification results previously obtained by using a K-Nearest Neighbors (KNN) filtering process, in which both the pixel spectral value and the spatial coordinates are considered. To do so, KNN needs two inputs: a one-band representation of the hyperspectral image and the classification results provided by a pixel-wise classifier. Thus, spatial-spectral classification algorithm is divided into three different stages: a Principal Component Analysis (PCA) algorithm for computing the one-band representation of the image, a Support Vector Machine (SVM) classifier, and the KNN-based filtering algorithm. The parallelization of these algorithms shows promising results in terms of computational time, as the mapping of them over different cores presents a speedup of 2.69x when using 3 cores. Consequently, experimental results demonstrate that real-time processing of hyperspectral images is achievable.
Computer enhancement through interpretive techniques

NASA Technical Reports Server (NTRS)

Foster, G.; Spaanenburg, H. A. E.; Stumpf, W. E.

1972-01-01

The improvement in the usage of the digital computer through the use of the technique of interpretation rather than the compilation of higher ordered languages was investigated by studying the efficiency of coding and execution of programs written in FORTRAN, ALGOL, PL/I and COBOL. FORTRAN was selected as the high level language for examining programs which were compiled, and A Programming Language (APL) was chosen for the interpretive language. It is concluded that APL is competitive, not because it and the algorithms being executed are well written, but rather because the batch processing is less efficient than has been admitted. There is not a broad base of experience founded on trying different implementation strategies which have been targeted at open competition with traditional processing methods.
Multiprocessor architecture: Synthesis and evaluation

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1990-01-01

Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
A Portable Natural Language Interface.

DTIC Science & Technology

1987-09-01

regrets. - 27 - BIBLIOGRAPHY Bayer, Samuel. "A Theory of Linearization in Relational Grammar ," Senior essay, Yale University , 1984. Dyer, Michael. In... Grammar 1. Chicago: University Chicago Press, 1983. Rustin, R., ed., Natural Language Processing. New York: Algorithmics Press, 1973. Wasow, Tom...most notably, the theory of relational grammar developed by Perlmutter and his associates, and the theory of discourse developed by Barbara Grosz
A grammar-based semantic similarity algorithm for natural language sentences.

PubMed

Lee, Ming Che; Chang, Jia Wei; Hsieh, Tung Cheng

2014-01-01

This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to "artificial language", such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.
Process Materialization Using Templates and Rules to Design Flexible Process Models

NASA Astrophysics Data System (ADS)

Kumar, Akhil; Yao, Wen

The main idea in this paper is to show how flexible processes can be designed by combining generic process templates and business rules. We instantiate a process by applying rules to specific case data, and running a materialization algorithm. The customized process instance is then executed in an existing workflow engine. We present an architecture and also give an algorithm for process materialization. The rules are written in a logic-based language like Prolog. Our focus is on capturing deeper process knowledge and achieving a holistic approach to robust process design that encompasses control flow, resources and data, as well as makes it easier to accommodate changes to business policy.
Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.

PubMed

Chen, Po-Hao; Zafar, Hanna; Galperin-Aizenberg, Maya; Cook, Tessa

2018-04-01

A significant volume of medical data remains unstructured. Natural language processing (NLP) and machine learning (ML) techniques have shown to successfully extract insights from radiology reports. However, the codependent effects of NLP and ML in this context have not been well-studied. Between April 1, 2015 and November 1, 2016, 9418 cross-sectional abdomen/pelvis CT and MR examinations containing our internal structured reporting element for cancer were separated into four categories: Progression, Stable Disease, Improvement, or No Cancer. We combined each of three NLP techniques with five ML algorithms to predict the assigned label using the unstructured report text and compared the performance of each combination. The three NLP algorithms included term frequency-inverse document frequency (TF-IDF), term frequency weighting (TF), and 16-bit feature hashing. The ML algorithms included logistic regression (LR), random decision forest (RDF), one-vs-all support vector machine (SVM), one-vs-all Bayes point machine (BPM), and fully connected neural network (NN). The best-performing NLP model consisted of tokenized unigrams and bigrams with TF-IDF. Increasing N-gram length yielded little to no added benefit for most ML algorithms. With all parameters optimized, SVM had the best performance on the test dataset, with 90.6 average accuracy and F score of 0.813. The interplay between ML and NLP algorithms and their effect on interpretation accuracy is complex. The best accuracy is achieved when both algorithms are optimized concurrently.

Importance of multi-modal approaches to effectively identify cataract cases from electronic health records.

PubMed

Peissig, Peggy L; Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B

2012-01-01

There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.
A C Language Implementation of the SRO (Murdock) Detector/Analyzer

USGS Publications Warehouse

Murdock, James N.; Halbert, Scott E.

1991-01-01

A signal detector and analyzer algorithm was described by Murdock and Hutt in 1983. The algorithm emulates the performance of a human interpreter of seismograms. It estimates the signal onset, the direction of onset (positive or negative), the quality of these determinations, the period and amplitude of the signal, and the background noise at the time of the signal. The algorithm has been coded in C language for implementation as a 'blackbox' for data similar to that of the China Digital Seismic Network. A driver for the algorithm is included, as are suggestions for other drivers. In all of these routines, plus several FIR filters that are included as well, floating point operations are not required. Multichannel operation is supported. Although the primary use of the code has been for in-house processing of broadband and short period data of the China Digital Seismic Network, provisions have been made to process the long period and very long period data of that system as well. The code for the in-house detector, which runs on a mini-computer, is very similar to that of the field system, which runs on a microprocessor. The code is documented.
Neural-Network-Development Program

NASA Technical Reports Server (NTRS)

Phillips, Todd A.

1993-01-01

NETS, software tool for development and evaluation of neural networks, provides simulation of neural-network algorithms plus computing environment for development of such algorithms. Uses back-propagation learning method for all of networks it creates. Enables user to customize patterns of connections between layers of network. Also provides features for saving, during learning process, values of weights, providing more-precise control over learning process. Written in ANSI standard C language. Machine-independent version (MSC-21588) includes only code for command-line-interface version of NETS 3.0.
FGRAAL: FORTRAN extended graph algorithmic language

NASA Technical Reports Server (NTRS)

Basili, V. R.; Mesztenyi, C. K.; Rheinboldt, W. C.

1972-01-01

The FORTRAN version FGRAAL of the graph algorithmic language GRAAL as it has been implemented for the Univac 1108 is described. FBRAAL is an extension of FORTRAN 5 and is intended for describing and implementing graph algorithms of the type primarily arising in applications. The formal description contained in this report represents a supplement to the FORTRAN 5 manual for the Univac 1108 (UP-4060), that is, only the new features of the language are described. Several typical graph algorithms, written in FGRAAL, are included to illustrate various features of the language and to show its applicability.
Fringe pattern demodulation using the one-dimensional continuous wavelet transform: field-programmable gate array implementation.

PubMed

Abid, Abdulbasit

2013-03-01

This paper presents a thorough discussion of the proposed field-programmable gate array (FPGA) implementation for fringe pattern demodulation using the one-dimensional continuous wavelet transform (1D-CWT) algorithm. This algorithm is also known as wavelet transform profilometry. Initially, the 1D-CWT is programmed using the C programming language and compiled into VHDL using the ImpulseC tool. This VHDL code is implemented on the Altera Cyclone IV GX EP4CGX150DF31C7 FPGA. A fringe pattern image with a size of 512×512 pixels is presented to the FPGA, which processes the image using the 1D-CWT algorithm. The FPGA requires approximately 100 ms to process the image and produce a wrapped phase map. For performance comparison purposes, the 1D-CWT algorithm is programmed using the C language. The C code is then compiled using the Intel compiler version 13.0. The compiled code is run on a Dell Precision state-of-the-art workstation. The time required to process the fringe pattern image is approximately 1 s. In order to further reduce the execution time, the 1D-CWT is reprogramed using Intel Integrated Primitive Performance (IPP) Library Version 7.1. The execution time was reduced to approximately 650 ms. This confirms that at least sixfold speedup was gained using FPGA implementation over a state-of-the-art workstation that executes heavily optimized implementation of the 1D-CWT algorithm.
American Society of Functional Neuroradiology-Recommended fMRI Paradigm Algorithms for Presurgical Language Assessment.

PubMed

Black, D F; Vachha, B; Mian, A; Faro, S H; Maheshwari, M; Sair, H I; Petrella, J R; Pillai, J J; Welker, K

2017-10-01

Functional MR imaging is increasingly being used for presurgical language assessment in the treatment of patients with brain tumors, epilepsy, vascular malformations, and other conditions. The inherent complexity of fMRI, which includes numerous processing steps and selective analyses, is compounded by institution-unique approaches to patient training, paradigm choice, and an eclectic array of postprocessing options from various vendors. Consequently, institutions perform fMRI in such markedly different manners that data sharing, comparison, and generalization of results are difficult. The American Society of Functional Neuroradiology proposes widespread adoption of common fMRI language paradigms as the first step in countering this lost opportunity to advance our knowledge and improve patient care. A taskforce of American Society of Functional Neuroradiology members from multiple institutions used a broad literature review, member polls, and expert opinion to converge on 2 sets of standard language paradigms that strike a balance between ease of application and clinical usefulness. The taskforce generated an adult language paradigm algorithm for presurgical language assessment including the following tasks: Sentence Completion, Silent Word Generation, Rhyming, Object Naming, and/or Passive Story Listening. The pediatric algorithm includes the following tasks: Sentence Completion, Rhyming, Antonym Generation, or Passive Story Listening. Convergence of fMRI language paradigms across institutions offers the first step in providing a "Rosetta Stone" that provides a common reference point with which to compare and contrast the usefulness and reliability of fMRI data. From this common language task battery, future refinements and improvements are anticipated, particularly as objective measures of reliability become available. Some commonality of practice is a necessary first step to develop a foundation on which to improve the clinical utility of this field. © 2017 by American Journal of Neuroradiology.
Mining Peripheral Arterial Disease Cases from Narrative Clinical Notes Using Natural Language Processing

PubMed Central

Afzal, Naveed; Sohn, Sunghwan; Abram, Sara; Scott, Christopher G.; Chaudhry, Rajeev; Liu, Hongfang; Kullo, Iftikhar J.; Arruda-Olson, Adelaide M.

2016-01-01

Objective Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm to billing code algorithms, using ankle-brachial index (ABI) test results as the gold standard. Methods We compared the performance of the NLP algorithm to 1) results of gold standard ABI; 2) previously validated algorithms based on relevant ICD-9 diagnostic codes (simple model) and 3) a combination of ICD-9 codes with procedural codes (full model). A dataset of 1,569 PAD patients and controls was randomly divided into training (n= 935) and testing (n= 634) subsets. Results We iteratively refined the NLP algorithm in the training set including narrative note sections, note types and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP: 91.8%, full model: 81.8%, simple model: 83%, P<.001), PPV (NLP: 92.9%, full model: 74.3%, simple model: 79.9%, P<.001), and specificity (NLP: 92.5%, full model: 64.2%, simple model: 75.9%, P<.001). Conclusions A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support. PMID:28189359
ICAP: An Interactive Cluster Analysis Procedure for analyzing remotely sensed data. [to classify the radiance data to produce a thematic map

NASA Technical Reports Server (NTRS)

Wharton, S. W.

1980-01-01

An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. The algorithm interfaces the rapid numerical processing capacity of a computer with the human ability to integrate qualitative information. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters and the analyst, who evaluate and elect to modify the cluster structure. Clusters can be deleted or lumped pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The ICAP was implemented in APL (A Programming Language), an interactive computer language. The flexibility of the algorithm was evaluated using data from different LANDSAT scenes to simulate two situations: one in which the analyst is assumed to have no prior knowledge about the data and wishes to have the clusters formed more or less automatically; and the other in which the analyst is assumed to have some knowledge about the data structure and wishes to use that information to closely supervise the clustering process. For comparison, an existing clustering method was also applied to the two data sets.
Image Algebra Matlab language version 2.3 for image processing and compression research

NASA Astrophysics Data System (ADS)

Schmalz, Mark S.; Ritter, Gerhard X.; Hayden, Eric

2010-08-01

Image algebra is a rigorous, concise notation that unifies linear and nonlinear mathematics in the image domain. Image algebra was developed under DARPA and US Air Force sponsorship at University of Florida for over 15 years beginning in 1984. Image algebra has been implemented in a variety of programming languages designed specifically to support the development of image processing and computer vision algorithms and software. The University of Florida has been associated with development of the languages FORTRAN, Ada, Lisp, and C++. The latter implementation involved a class library, iac++, that supported image algebra programming in C++. Since image processing and computer vision are generally performed with operands that are array-based, the Matlab™ programming language is ideal for implementing the common subset of image algebra. Objects include sets and set operations, images and operations on images, as well as templates and image-template convolution operations. This implementation, called Image Algebra Matlab (IAM), has been found to be useful for research in data, image, and video compression, as described herein. Due to the widespread acceptance of the Matlab programming language in the computing community, IAM offers exciting possibilities for supporting a large group of users. The control over an object's computational resources provided to the algorithm designer by Matlab means that IAM programs can employ versatile representations for the operands and operations of the algebra, which are supported by the underlying libraries written in Matlab. In a previous publication, we showed how the functionality of IAC++ could be carried forth into a Matlab implementation, and provided practical details of a prototype implementation called IAM Version 1. In this paper, we further elaborate the purpose and structure of image algebra, then present a maturing implementation of Image Algebra Matlab called IAM Version 2.3, which extends the previous implementation of IAM to include polymorphic operations over different point sets, as well as recursive convolution operations and functional composition. We also show how image algebra and IAM can be employed in image processing and compression research, as well as algorithm development and analysis.
Iris unwrapping using the Bresenham circle algorithm for real-time iris recognition

NASA Astrophysics Data System (ADS)

Carothers, Matthew T.; Ngo, Hau T.; Rakvic, Ryan N.; Broussard, Randy P.

2015-02-01

An efficient parallel architecture design for the iris unwrapping process in a real-time iris recognition system using the Bresenham Circle Algorithm is presented in this paper. Based on the characteristics of the model parameters this algorithm was chosen over the widely used polar conversion technique as the iris unwrapping model. The architecture design is parallelized to increase the throughput of the system and is suitable for processing an inputted image size of 320 × 240 pixels in real-time using Field Programmable Gate Array (FPGA) technology. Quartus software is used to implement, verify, and analyze the design's performance using the VHSIC Hardware Description Language. The system's predicted processing time is faster than the modern iris unwrapping technique used today∗.
Vectorized algorithms for spiking neural network simulation.

PubMed

Brette, Romain; Goodman, Dan F M

2011-06-01

High-level languages (Matlab, Python) are popular in neuroscience because they are flexible and accelerate development. However, for simulating spiking neural networks, the cost of interpretation is a bottleneck. We describe a set of algorithms to simulate large spiking neural networks efficiently with high-level languages using vector-based operations. These algorithms constitute the core of Brian, a spiking neural network simulator written in the Python language. Vectorized simulation makes it possible to combine the flexibility of high-level languages with the computational efficiency usually associated with compiled languages.
Computer-Assisted Instruction: Authoring Languages. ERIC Digest.

ERIC Educational Resources Information Center

Reeves, Thomas C.

One of the most perplexing tasks in producing computer-assisted instruction (CAI) is the authoring process. Authoring is generally defined as the process of turning the flowcharts, control algorithms, format sheets, and other documentation of a CAI program's design into computer code that will operationalize the simulation on the delivery system.…
A comparison of common programming languages used in bioinformatics.

PubMed

Fourment, Mathieu; Gillings, Michael R

2008-02-05

The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from http://www.bioinformatics.org/benchmark/. This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.
A comparison of common programming languages used in bioinformatics

PubMed Central

Fourment, Mathieu; Gillings, Michael R

2008-01-01

Background The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Results Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from Conclusion This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language. PMID:18251993
Modelling of internal architecture of kinesin nanomotor as a machine language.

PubMed

Khataee, H R; Ibrahim, M Y

2012-09-01

Kinesin is a protein-based natural nanomotor that transports molecular cargoes within cells by walking along microtubules. Kinesin nanomotor is considered as a bio-nanoagent which is able to sense the cell through its sensors (i.e. its heads and tail), make the decision internally and perform actions on the cell through its actuator (i.e. its motor domain). The study maps the agent-based architectural model of internal decision-making process of kinesin nanomotor to a machine language using an automata algorithm. The applied automata algorithm receives the internal agent-based architectural model of kinesin nanomotor as a deterministic finite automaton (DFA) model and generates a regular machine language. The generated regular machine language was acceptable by the architectural DFA model of the nanomotor and also in good agreement with its natural behaviour. The internal agent-based architectural model of kinesin nanomotor indicates the degree of autonomy and intelligence of the nanomotor interactions with its cell. Thus, our developed regular machine language can model the degree of autonomy and intelligence of kinesin nanomotor interactions with its cell as a language. Modelling of internal architectures of autonomous and intelligent bio-nanosystems as machine languages can lay the foundation towards the concept of bio-nanoswarms and next phases of the bio-nanorobotic systems development.
Zsyntax: a formal language for molecular biology with projected applications in text mining and biological prediction.

PubMed

Boniolo, Giovanni; D'Agostino, Marcello; Di Fiore, Pier Paolo

2010-03-03

We propose a formal language that allows for transposing biological information precisely and rigorously into machine-readable information. This language, which we call Zsyntax (where Z stands for the Greek word zetaomegaeta, life), is grounded on a particular type of non-classical logic, and it can be used to write algorithms and computer programs. We present it as a first step towards a comprehensive formal language for molecular biology in which any biological process can be written and analyzed as a sort of logical "deduction". Moreover, we illustrate the potential value of this language, both in the field of text mining and in that of biological prediction.
XML in an Adaptive Framework for Instrument Control

NASA Technical Reports Server (NTRS)

Ames, Troy J.

2004-01-01

NASA Goddard Space Flight Center is developing an extensible framework for instrument command and control, known as Instrument Remote Control (IRC), that combines the platform independent processing capabilities of Java with the power of the Extensible Markup Language (XML). A key aspect of the architecture is software that is driven by an instrument description, written using the Instrument Markup Language (IML). IML is an XML dialect used to describe interfaces to control and monitor the instrument, command sets and command formats, data streams, communication mechanisms, and data processing algorithms.
A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

PubMed Central

Chang, Jia Wei; Hsieh, Tung Cheng

2014-01-01

This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure. PMID:24982952
Parallel VLSI architecture emulation and the organization of APSA/MPP

NASA Technical Reports Server (NTRS)

Odonnell, John T.

1987-01-01

The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.
BEYSIK: Language description and handbook for programmers (system for the collective use of the Institute of Space Research, Academy of Sciences USSR)

NASA Technical Reports Server (NTRS)

Orlov, I. G.

1979-01-01

The BASIC algorithmic language is described, and a guide is presented for the programmer using the language interpreter. The high-level algorithm BASIC is a problem-oriented programming language intended for solution of computational and engineering problems.

Importance of multi-modal approaches to effectively identify cataract cases from electronic health records

PubMed Central

Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B

2012-01-01

Objective There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. Materials and methods We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. Results An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. Discussion A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. Conclusion We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries. PMID:22319176
Wavelets for sign language translation

NASA Astrophysics Data System (ADS)

Wilson, Beth J.; Anspach, Gretel

1993-10-01

Wavelet techniques are applied to help extract the relevant parameters of sign language from video images of a person communicating in American Sign Language or Signed English. The compression and edge detection features of two-dimensional wavelet analysis are exploited to enhance the algorithms under development to classify the hand motion, hand location with respect to the body, and handshape. These three parameters have different processing requirements and complexity issues. The results are described for applying various quadrature mirror filter designs to a filterbank implementation of the desired wavelet transform. The overall project is to develop a system that will translate sign language to English to facilitate communication between deaf and hearing people.
Applying natural language processing techniques to develop a task-specific EMR interface for timely stroke thrombolysis: A feasibility study.

PubMed

Sung, Sheng-Feng; Chen, Kuanchin; Wu, Darren Philbert; Hung, Ling-Chien; Su, Yu-Hsiang; Hu, Ya-Han

2018-04-01

To reduce errors in determining eligibility for intravenous thrombolytic therapy (IVT) in stroke patients through use of an enhanced task-specific electronic medical record (EMR) interface powered by natural language processing (NLP) techniques. The information processing algorithm utilized MetaMap to extract medical concepts from IVT eligibility criteria and expanded the concepts using the Unified Medical Language System Metathesaurus. Concepts identified from clinical notes by MetaMap were compared to those from IVT eligibility criteria. The task-specific EMR interface displays IVT-relevant information by highlighting phrases that contain matched concepts. Clinical usability was assessed with clinicians staffing the acute stroke team by comparing user performance while using the task-specific and the current EMR interfaces. The algorithm identified IVT-relevant concepts with micro-averaged precisions, recalls, and F1 measures of 0.998, 0.812, and 0.895 at the phrase level and of 1, 0.972, and 0.986 at the document level. Users using the task-specific interface achieved a higher accuracy score than those using the current interface (91% versus 80%, p = 0.016) in assessing the IVT eligibility criteria. The completion time between the interfaces was statistically similar (2.46 min versus 1.70 min, p = 0.754). Although the information processing algorithm had room for improvement, the task-specific EMR interface significantly reduced errors in assessing IVT eligibility criteria. The study findings provide evidence to support an NLP enhanced EMR system to facilitate IVT decision-making by presenting meaningful and timely information to clinicians, thereby offering a new avenue for improvements in acute stroke care. Copyright © 2018 Elsevier B.V. All rights reserved.
Mining peripheral arterial disease cases from narrative clinical notes using natural language processing.

PubMed

Afzal, Naveed; Sohn, Sunghwan; Abram, Sara; Scott, Christopher G; Chaudhry, Rajeev; Liu, Hongfang; Kullo, Iftikhar J; Arruda-Olson, Adelaide M

2017-06-01

Lower extremity peripheral arterial disease (PAD) is highly prevalent and affects millions of individuals worldwide. We developed a natural language processing (NLP) system for automated ascertainment of PAD cases from clinical narrative notes and compared the performance of the NLP algorithm with billing code algorithms, using ankle-brachial index test results as the gold standard. We compared the performance of the NLP algorithm to (1) results of gold standard ankle-brachial index; (2) previously validated algorithms based on relevant International Classification of Diseases, Ninth Revision diagnostic codes (simple model); and (3) a combination of International Classification of Diseases, Ninth Revision codes with procedural codes (full model). A dataset of 1569 patients with PAD and controls was randomly divided into training (n = 935) and testing (n = 634) subsets. We iteratively refined the NLP algorithm in the training set including narrative note sections, note types, and service types, to maximize its accuracy. In the testing dataset, when compared with both simple and full models, the NLP algorithm had better accuracy (NLP, 91.8%; full model, 81.8%; simple model, 83%; P < .001), positive predictive value (NLP, 92.9%; full model, 74.3%; simple model, 79.9%; P < .001), and specificity (NLP, 92.5%; full model, 64.2%; simple model, 75.9%; P < .001). A knowledge-driven NLP algorithm for automatic ascertainment of PAD cases from clinical notes had greater accuracy than billing code algorithms. Our findings highlight the potential of NLP tools for rapid and efficient ascertainment of PAD cases from electronic health records to facilitate clinical investigation and eventually improve care by clinical decision support. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Comparison Between Manual Auditing and a Natural Language Process With Machine Learning Algorithm to Evaluate Faculty Use of Standardized Reports in Radiology.

PubMed

Guimaraes, Carolina V; Grzeszczuk, Robert; Bisset, George S; Donnelly, Lane F

2018-03-01

When implementing or monitoring department-sanctioned standardized radiology reports, feedback about individual faculty performance has been shown to be a useful driver of faculty compliance. Most commonly, these data are derived from manual audit, which can be both time-consuming and subject to sampling error. The purpose of this study was to evaluate whether a software program using natural language processing and machine learning could accurately audit radiologist compliance with the use of standardized reports compared with performed manual audits. Radiology reports from a 1-month period were loaded into such a software program, and faculty compliance with use of standardized reports was calculated. For that same period, manual audits were performed (25 reports audited for each of 42 faculty members). The mean compliance rates calculated by automated auditing were then compared with the confidence interval of the mean rate by manual audit. The mean compliance rate for use of standardized reports as determined by manual audit was 91.2% with a confidence interval between 89.3% and 92.8%. The mean compliance rate calculated by automated auditing was 92.0%, within that confidence interval. This study shows that by use of natural language processing and machine learning algorithms, an automated analysis can accurately define whether reports are compliant with use of standardized report templates and language, compared with manual audits. This may avoid significant labor costs related to conducting the manual auditing process. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Parallel Signal Processing and System Simulation using aCe

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2003-01-01

Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Vectorized Rebinning Algorithm for Fast Data Down-Sampling

NASA Technical Reports Server (NTRS)

Dean, Bruce; Aronstein, David; Smith, Jeffrey

2013-01-01

A vectorized rebinning (down-sampling) algorithm, applicable to N-dimensional data sets, has been developed that offers a significant reduction in computer run time when compared to conventional rebinning algorithms. For clarity, a two-dimensional version of the algorithm is discussed to illustrate some specific details of the algorithm content, and using the language of image processing, 2D data will be referred to as "images," and each value in an image as a "pixel." The new approach is fully vectorized, i.e., the down-sampling procedure is done as a single step over all image rows, and then as a single step over all image columns. Data rebinning (or down-sampling) is a procedure that uses a discretely sampled N-dimensional data set to create a representation of the same data, but with fewer discrete samples. Such data down-sampling is fundamental to digital signal processing, e.g., for data compression applications.
Baseline mathematics and geodetics for tracking operations

NASA Technical Reports Server (NTRS)

James, R.

1981-01-01

Various geodetic and mapping algorithms are analyzed as they apply to radar tracking systems and tested in extended BASIC computer language for real time computer applications. Closed-form approaches to the solution of converting Earth centered coordinates to latitude, longitude, and altitude are compared with classical approximations. A simplified approach to atmospheric refractivity called gradient refraction is compared with conventional ray tracing processes. An extremely detailed set of documentation which provides the theory, derivations, and application of algorithms used in the programs is included. Validation methods are also presented for testing the accuracy of the algorithms.
A parallelized binary search tree

USDA-ARS?s Scientific Manuscript database

PTTRNFNDR is an unsupervised statistical learning algorithm that detects patterns in DNA sequences, protein sequences, or any natural language texts that can be decomposed into letters of a finite alphabet. PTTRNFNDR performs complex mathematical computations and its processing time increases when i...
The 2nd Symposium on the Frontiers of Massively Parallel Computations

NASA Technical Reports Server (NTRS)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
A Subsystem Test Bed for Chinese Spectral Radioheliograph

NASA Astrophysics Data System (ADS)

Zhao, An; Yan, Yihua; Wang, Wei

2014-11-01

The Chinese Spectral Radioheliograph is a solar dedicated radio interferometric array that will produce high spatial resolution, high temporal resolution, and high spectral resolution images of the Sun simultaneously in decimetre and centimetre wave range. Digital processing of intermediate frequency signal is an important part in a radio telescope. This paper describes a flexible and high-speed digital down conversion system for the CSRH by applying complex mixing, parallel filtering, and extracting algorithms to process IF signal at the time of being designed and incorporates canonic-signed digit coding and bit-plane method to improve program efficiency. The DDC system is intended to be a subsystem test bed for simulation and testing for CSRH. Software algorithms for simulation and hardware language algorithms based on FPGA are written which use less hardware resources and at the same time achieve high performances such as processing high-speed data flow (1 GHz) with 10 MHz spectral resolution. An experiment with the test bed is illustrated by using geostationary satellite data observed on March 20, 2014. Due to the easy alterability of the algorithms on FPGA, the data can be recomputed with different digital signal processing algorithms for selecting optimum algorithm.
Using hybridization networks to retrace the evolution of Indo-European languages.

PubMed

Willems, Matthieu; Lord, Etienne; Laforest, Louise; Labelle, Gilbert; Lapointe, François-Joseph; Di Sciullo, Anna Maria; Makarenkov, Vladimir

2016-09-06

Curious parallels between the processes of species and language evolution have been observed by many researchers. Retracing the evolution of Indo-European (IE) languages remains one of the most intriguing intellectual challenges in historical linguistics. Most of the IE language studies use the traditional phylogenetic tree model to represent the evolution of natural languages, thus not taking into account reticulate evolutionary events, such as language hybridization and word borrowing which can be associated with species hybridization and horizontal gene transfer, respectively. More recently, implicit evolutionary networks, such as split graphs and minimal lateral networks, have been used to account for reticulate evolution in linguistics. Striking parallels existing between the evolution of species and natural languages allowed us to apply three computational biology methods for reconstruction of phylogenetic networks to model the evolution of IE languages. We show how the transfer of methods between the two disciplines can be achieved, making necessary methodological adaptations. Considering basic vocabulary data from the well-known Dyen's lexical database, which contains word forms in 84 IE languages for the meanings of a 200-meaning Swadesh list, we adapt a recently developed computational biology algorithm for building explicit hybridization networks to study the evolution of IE languages and compare our findings to the results provided by the split graph and galled network methods. We conclude that explicit phylogenetic networks can be successfully used to identify donors and recipients of lexical material as well as the degree of influence of each donor language on the corresponding recipient languages. We show that our algorithm is well suited to detect reticulate relationships among languages, and present some historical and linguistic justification for the results obtained. Our findings could be further refined if relevant syntactic, phonological and morphological data could be analyzed along with the available lexical data.
Modelling and simulating decision processes of linked lives: An approach based on concurrent processes and stochastic race.

PubMed

Warnke, Tom; Reinhardt, Oliver; Klabunde, Anna; Willekens, Frans; Uhrmacher, Adelinde M

2017-10-01

Individuals' decision processes play a central role in understanding modern migration phenomena and other demographic processes. Their integration into agent-based computational demography depends largely on suitable support by a modelling language. We are developing the Modelling Language for Linked Lives (ML3) to describe the diverse decision processes of linked lives succinctly in continuous time. The context of individuals is modelled by networks the individual is part of, such as family ties and other social networks. Central concepts, such as behaviour conditional on agent attributes, age-dependent behaviour, and stochastic waiting times, are tightly integrated in the language. Thereby, alternative decisions are modelled by concurrent processes that compete by stochastic race. Using a migration model, we demonstrate how this allows for compact description of complex decisions, here based on the Theory of Planned Behaviour. We describe the challenges for the simulation algorithm posed by stochastic race between multiple concurrent complex decisions.
Automatic Text Summarization for Indonesian Language Using TextTeaser

NASA Astrophysics Data System (ADS)

Gunawan, D.; Pasaribu, A.; Rahmat, R. F.; Budiarto, R.

2017-04-01

Text summarization is one of the solution for information overload. Reducing text without losing the meaning not only can save time to read, but also maintain the reader’s understanding. One of many algorithms to summarize text is TextTeaser. Originally, this algorithm is intended to be used for text in English. However, due to TextTeaser algorithm does not consider the meaning of the text, we implement this algorithm for text in Indonesian language. This algorithm calculates four elements, such as title feature, sentence length, sentence position and keyword frequency. We utilize TextRank, an unsupervised and language independent text summarization algorithm, to evaluate the summarized text yielded by TextTeaser. The result shows that the TextTeaser algorithm needs more improvement to obtain better accuracy.
Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing

PubMed Central

Cai, Tianxi; Karlson, Elizabeth W.

2013-01-01

Objectives To test whether data extracted from full text patient visit notes from an electronic medical record (EMR) would improve the classification of PsA compared to an algorithm based on codified data. Methods From the > 1,350,000 adults in a large academic EMR, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and three random forest algorithms trained using coded, narrative, and combined predictors. The receiver operator curve (ROC) was used to identify the optimal algorithm and a cut point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results The PPV of a single PsA code was 57% (95%CI 55%–58%). Using a combination of coded data and NLP the random forest algorithm reached a PPV of 90% (95%CI 86%–93%) at sensitivity of 87% (95% CI 83% – 91%) in the training data. The PPV was 93% (95%CI 89%–96%) in the validation set. Adding NLP predictors to codified data increased the area under the ROC (p < 0.001). Conclusions Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. PMID:20701955
Clustering Words to Match Conditions: An Algorithm for Stimuli Selection in Factorial Designs

ERIC Educational Resources Information Center

Guasch, Marc; Haro, Juan; Boada, Roger

2017-01-01

With the increasing refinement of language processing models and the new discoveries about which variables can modulate these processes, stimuli selection for experiments with a factorial design is becoming a tough task. Selecting sets of words that differ in one variable, while matching these same words into dozens of other confounding variables…
Natural language processing of clinical notes for identification of critical limb ischemia.

PubMed

Afzal, Naveed; Mallipeddi, Vishnu Priya; Sohn, Sunghwan; Liu, Hongfang; Chaudhry, Rajeev; Scott, Christopher G; Kullo, Iftikhar J; Arruda-Olson, Adelaide M

2018-03-01

Critical limb ischemia (CLI) is a complication of advanced peripheral artery disease (PAD) with diagnosis based on the presence of clinical signs and symptoms. However, automated identification of cases from electronic health records (EHRs) is challenging due to absence of a single definitive International Classification of Diseases (ICD-9 or ICD-10) code for CLI. In this study, we extend a previously validated natural language processing (NLP) algorithm for PAD identification to develop and validate a subphenotyping NLP algorithm (CLI-NLP) for identification of CLI cases from clinical notes. We compared performance of the CLI-NLP algorithm with CLI-related ICD-9 billing codes. The gold standard for validation was human abstraction of clinical notes from EHRs. Compared to billing codes the CLI-NLP algorithm had higher positive predictive value (PPV) (CLI-NLP 96%, billing codes 67%, p < 0.001), specificity (CLI-NLP 98%, billing codes 74%, p < 0.001) and F1-score (CLI-NLP 90%, billing codes 76%, p < 0.001). The sensitivity of these two methods was similar (CLI-NLP 84%; billing codes 88%; p < 0.12). The CLI-NLP algorithm for identification of CLI from narrative clinical notes in an EHR had excellent PPV and has potential for translation to patient care as it will enable automated identification of CLI cases for quality projects, clinical decision support tools and support a learning healthcare system. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Development and validation of a structured query language implementation of the Elixhauser comorbidity index.

PubMed

Epstein, Richard H; Dexter, Franklin

2017-07-01

Comorbidity adjustment is often performed during outcomes and health care resource utilization research. Our goal was to develop an efficient algorithm in structured query language (SQL) to determine the Elixhauser comorbidity index. We wrote an SQL algorithm to calculate the Elixhauser comorbidities from Diagnosis Related Group and International Classification of Diseases (ICD) codes. Validation was by comparison to expected comorbidities from combinations of these codes and to the 2013 Nationwide Readmissions Database (NRD). The SQL algorithm matched perfectly with expected comorbidities for all combinations of ICD-9 or ICD-10, and Diagnosis Related Groups. Of 13 585 859 evaluable NRD records, the algorithm matched 100% of the listed comorbidities. Processing time was ∼0.05 ms/record. The SQL Elixhauser code was efficient and computationally identical to the SAS algorithm used for the NRD. This algorithm may be useful where preprocessing of large datasets in a relational database environment and comorbidity determination is desired before statistical analysis. A validated SQL procedure to calculate Elixhauser comorbidities and the van Walraven index from ICD-9 or ICD-10 discharge diagnosis codes has been published. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Optical character recognition of handwritten Arabic using hidden Markov models

NASA Astrophysics Data System (ADS)

Aulama, Mohannad M.; Natsheh, Asem M.; Abandah, Gheith A.; Olama, Mohammed M.

2011-04-01

The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language is initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.
Optical character recognition of handwritten Arabic using hidden Markov models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aulama, Mohannad M.; Natsheh, Asem M.; Abandah, Gheith A.

2011-01-01

The problem of optical character recognition (OCR) of handwritten Arabic has not received a satisfactory solution yet. In this paper, an Arabic OCR algorithm is developed based on Hidden Markov Models (HMMs) combined with the Viterbi algorithm, which results in an improved and more robust recognition of characters at the sub-word level. Integrating the HMMs represents another step of the overall OCR trends being currently researched in the literature. The proposed approach exploits the structure of characters in the Arabic language in addition to their extracted features to achieve improved recognition rates. Useful statistical information of the Arabic language ismore » initially extracted and then used to estimate the probabilistic parameters of the mathematical HMM. A new custom implementation of the HMM is developed in this study, where the transition matrix is built based on the collected large corpus, and the emission matrix is built based on the results obtained via the extracted character features. The recognition process is triggered using the Viterbi algorithm which employs the most probable sequence of sub-words. The model was implemented to recognize the sub-word unit of Arabic text raising the recognition rate from being linked to the worst recognition rate for any character to the overall structure of the Arabic language. Numerical results show that there is a potentially large recognition improvement by using the proposed algorithms.« less

Speech endpoint detection with non-language speech sounds for generic speech processing applications

NASA Astrophysics Data System (ADS)

McClain, Matthew; Romanowski, Brian

2009-05-01

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Specification and Design Methodologies for High-Speed Fault-Tolerant Array Algorithms and Structures for VLSI.

DTIC Science & Technology

1987-06-01

evaluation and chip layout planning for VLSI digital systems. A high-level applicative (functional) language, implemented at UCLA, allows combining of...operating system. 2.1 Introduction The complexity of VLSI requires the application of CAD tools at all levels of the design process. In order to be...effective, these tools must be adaptive to the specific design. In this project we studied a design method based on the use of applicative languages
Formal Analysis of BPMN Models Using Event-B

NASA Astrophysics Data System (ADS)

Bryans, Jeremy W.; Wei, Wei

The use of business process models has gone far beyond documentation purposes. In the development of business applications, they can play the role of an artifact on which high level properties can be verified and design errors can be revealed in an effort to reduce overhead at later software development and diagnosis stages. This paper demonstrates how formal verification may add value to the specification, design and development of business process models in an industrial setting. The analysis of these models is achieved via an algorithmic translation from the de-facto standard business process modeling language BPMN to Event-B, a widely used formal language supported by the Rodin platform which offers a range of simulation and verification technologies.
Advanced detection, isolation, and accommodation of sensor failures in turbofan engines: Real-time microcomputer implementation

NASA Technical Reports Server (NTRS)

Delaat, John C.; Merrill, Walter C.

1990-01-01

The objective of the Advanced Detection, Isolation, and Accommodation Program is to improve the overall demonstrated reliability of digital electronic control systems for turbine engines. For this purpose, an algorithm was developed which detects, isolates, and accommodates sensor failures by using analytical redundancy. The performance of this algorithm was evaluated on a real time engine simulation and was demonstrated on a full scale F100 turbofan engine. The real time implementation of the algorithm is described. The implementation used state-of-the-art microprocessor hardware and software, including parallel processing and high order language programming.
An integrated framework for high level design of high performance signal processing circuits on FPGAs

NASA Astrophysics Data System (ADS)

Benkrid, K.; Belkacemi, S.; Sukhsawas, S.

2005-06-01

This paper proposes an integrated framework for the high level design of high performance signal processing algorithms' implementations on FPGAs. The framework emerged from a constant need to rapidly implement increasingly complicated algorithms on FPGAs while maintaining the high performance needed in many real time digital signal processing applications. This is particularly important for application developers who often rely on iterative and interactive development methodologies. The central idea behind the proposed framework is to dynamically integrate high performance structural hardware description languages with higher level hardware languages in other to help satisfy the dual requirement of high level design and high performance implementation. The paper illustrates this by integrating two environments: Celoxica's Handel-C language, and HIDE, a structural hardware environment developed at the Queen's University of Belfast. On the one hand, Handel-C has been proven to be very useful in the rapid design and prototyping of FPGA circuits, especially control intensive ones. On the other hand, HIDE, has been used extensively, and successfully, in the generation of highly optimised parameterisable FPGA cores. In this paper, this is illustrated in the construction of a scalable and fully parameterisable core for image algebra's five core neighbourhood operations, where fully floorplanned efficient FPGA configurations, in the form of EDIF netlists, are generated automatically for instances of the core. In the proposed combined framework, highly optimised data paths are invoked dynamically from within Handel-C, and are synthesized using HIDE. Although the idea might seem simple prima facie, it could have serious implications on the design of future generations of hardware description languages.
Matrix product operators, matrix product states, and ab initio density matrix renormalization group algorithms

NASA Astrophysics Data System (ADS)

Chan, Garnet Kin-Lic; Keselman, Anna; Nakatani, Naoki; Li, Zhendong; White, Steven R.

2016-07-01

Current descriptions of the ab initio density matrix renormalization group (DMRG) algorithm use two superficially different languages: an older language of the renormalization group and renormalized operators, and a more recent language of matrix product states and matrix product operators. The same algorithm can appear dramatically different when written in the two different vocabularies. In this work, we carefully describe the translation between the two languages in several contexts. First, we describe how to efficiently implement the ab initio DMRG sweep using a matrix product operator based code, and the equivalence to the original renormalized operator implementation. Next we describe how to implement the general matrix product operator/matrix product state algebra within a pure renormalized operator-based DMRG code. Finally, we discuss two improvements of the ab initio DMRG sweep algorithm motivated by matrix product operator language: Hamiltonian compression, and a sum over operators representation that allows for perfect computational parallelism. The connections and correspondences described here serve to link the future developments with the past and are important in the efficient implementation of continuing advances in ab initio DMRG and related algorithms.
Matrix product operators, matrix product states, and ab initio density matrix renormalization group algorithms.

PubMed

Chan, Garnet Kin-Lic; Keselman, Anna; Nakatani, Naoki; Li, Zhendong; White, Steven R

2016-07-07

Current descriptions of the ab initio density matrix renormalization group (DMRG) algorithm use two superficially different languages: an older language of the renormalization group and renormalized operators, and a more recent language of matrix product states and matrix product operators. The same algorithm can appear dramatically different when written in the two different vocabularies. In this work, we carefully describe the translation between the two languages in several contexts. First, we describe how to efficiently implement the ab initio DMRG sweep using a matrix product operator based code, and the equivalence to the original renormalized operator implementation. Next we describe how to implement the general matrix product operator/matrix product state algebra within a pure renormalized operator-based DMRG code. Finally, we discuss two improvements of the ab initio DMRG sweep algorithm motivated by matrix product operator language: Hamiltonian compression, and a sum over operators representation that allows for perfect computational parallelism. The connections and correspondences described here serve to link the future developments with the past and are important in the efficient implementation of continuing advances in ab initio DMRG and related algorithms.
Prime Numbers Comparison using Sieve of Eratosthenes and Sieve of Sundaram Algorithm

NASA Astrophysics Data System (ADS)

Abdullah, D.; Rahim, R.; Apdilah, D.; Efendi, S.; Tulus, T.; Suwilo, S.

2018-03-01

Prime numbers are numbers that have their appeal to researchers due to the complexity of these numbers, many algorithms that can be used to generate prime numbers ranging from simple to complex computations, Sieve of Eratosthenes and Sieve of Sundaram are two algorithm that can be used to generate Prime numbers of randomly generated or sequential numbered random numbers, testing in this study to find out which algorithm is better used for large primes in terms of time complexity, the test also assisted with applications designed using Java language with code optimization and Maximum memory usage so that the testing process can be simultaneously and the results obtained can be objective
Applying Semantic Web Concepts to Support Net-Centric Warfare Using the Tactical Assessment Markup Language (TAML)

DTIC Science & Technology

2006-06-01

SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SUMO Suggested Upper Merged Ontology SW... Query optimization algorithms are implemented in the Pellet reasoner in order to ensure querying a knowledge base is efficient . These algorithms...memory as a treelike structure in order for the data to be queried . XML Query (XQuery) is the standard language used when querying XML
Knowledge acquisition from natural language for expert systems based on classification problem-solving methods

NASA Technical Reports Server (NTRS)

Gomez, Fernando

1989-01-01

It is shown how certain kinds of domain independent expert systems based on classification problem-solving methods can be constructed directly from natural language descriptions by a human expert. The expert knowledge is not translated into production rules. Rather, it is mapped into conceptual structures which are integrated into long-term memory (LTM). The resulting system is one in which problem-solving, retrieval and memory organization are integrated processes. In other words, the same algorithm and knowledge representation structures are shared by these processes. As a result of this, the system can answer questions, solve problems or reorganize LTM.
Morpheme matching based text tokenization for a scarce resourced language.

PubMed

Rehman, Zobia; Anwar, Waqas; Bajwa, Usama Ijaz; Xuan, Wang; Chaoying, Zhou

2013-01-01

Text tokenization is a fundamental pre-processing step for almost all the information processing applications. This task is nontrivial for the scarce resourced languages such as Urdu, as there is inconsistent use of space between words. In this paper a morpheme matching based approach has been proposed for Urdu text tokenization, along with some other algorithms to solve the additional issues of boundary detection of compound words, affixation, reduplication, names and abbreviations. This study resulted into 97.28% precision, 93.71% recall, and 95.46% F1-measure; while tokenizing a corpus of 57000 words by using a morpheme list with 6400 entries.
Morpheme Matching Based Text Tokenization for a Scarce Resourced Language

PubMed Central

Rehman, Zobia; Anwar, Waqas; Bajwa, Usama Ijaz; Xuan, Wang; Chaoying, Zhou

2013-01-01

Text tokenization is a fundamental pre-processing step for almost all the information processing applications. This task is nontrivial for the scarce resourced languages such as Urdu, as there is inconsistent use of space between words. In this paper a morpheme matching based approach has been proposed for Urdu text tokenization, along with some other algorithms to solve the additional issues of boundary detection of compound words, affixation, reduplication, names and abbreviations. This study resulted into 97.28% precision, 93.71% recall, and 95.46% F1-measure; while tokenizing a corpus of 57000 words by using a morpheme list with 6400 entries. PMID:23990871
Answer Markup Algorithms for Southeast Asian Languages.

ERIC Educational Resources Information Center

Henry, George M.

1991-01-01

Typical markup methods for providing feedback to foreign language learners are not applicable to languages not written in a strictly linear fashion. A modification of Hart's edit markup software is described, along with a second variation based on a simple edit distance algorithm adapted to a general Southeast Asian font system. (10 references)…
Validation of Case Finding Algorithms for Hepatocellular Cancer from Administrative Data and Electronic Health Records using Natural Language Processing

PubMed Central

Sada, Yvonne; Hou, Jason; Richardson, Peter; El-Serag, Hashem; Davila, Jessica

2013-01-01

Background Accurate identification of hepatocellular cancer (HCC) cases from automated data is needed for efficient and valid quality improvement initiatives and research. We validated HCC ICD-9 codes, and evaluated whether natural language processing (NLP) by the Automated Retrieval Console (ARC) for document classification improves HCC identification. Methods We identified a cohort of patients with ICD-9 codes for HCC during 2005–2010 from Veterans Affairs administrative data. Pathology and radiology reports were reviewed to confirm HCC. The positive predictive value (PPV), sensitivity, and specificity of ICD-9 codes were calculated. A split validation study of pathology and radiology reports was performed to develop and validate ARC algorithms. Reports were manually classified as diagnostic of HCC or not. ARC generated document classification algorithms using the Clinical Text Analysis and Knowledge Extraction System. ARC performance was compared to manual classification. PPV, sensitivity, and specificity of ARC were calculated. Results 1138 patients with HCC were identified by ICD-9 codes. Based on manual review, 773 had HCC. The HCC ICD-9 code algorithm had a PPV of 0.67, sensitivity of 0.95, and specificity of 0.93. For a random subset of 619 patients, we identified 471 pathology reports for 323 patients and 943 radiology reports for 557 patients. The pathology ARC algorithm had PPV of 0.96, sensitivity of 0.96, and specificity of 0.97. The radiology ARC algorithm had PPV of 0.75, sensitivity of 0.94, and specificity of 0.68. Conclusion A combined approach of ICD-9 codes and NLP of pathology and radiology reports improves HCC case identification in automated data. PMID:23929403
Teaching Non-Recursive Binary Searching: Establishing a Conceptual Framework.

ERIC Educational Resources Information Center

Magel, E. Terry

1989-01-01

Discusses problems associated with teaching non-recursive binary searching in computer language classes, and describes a teacher-directed dialog based on dictionary use that helps students use their previous searching experiences to conceptualize the binary search process. Algorithmic development is discussed and appropriate classroom discussion…
On a programming language for graph algorithms

NASA Technical Reports Server (NTRS)

Rheinboldt, W. C.; Basili, V. R.; Mesztenyi, C. K.

1971-01-01

An algorithmic language, GRAAL, is presented for describing and implementing graph algorithms of the type primarily arising in applications. The language is based on a set algebraic model of graph theory which defines the graph structure in terms of morphisms between certain set algebraic structures over the node set and arc set. GRAAL is modular in the sense that the user specifies which of these mappings are available with any graph. This allows flexibility in the selection of the storage representation for different graph structures. In line with its set theoretic foundation, the language introduces sets as a basic data type and provides for the efficient execution of all set and graph operators. At present, GRAAL is defined as an extension of ALGOL 60 (revised) and its formal description is given as a supplement to the syntactic and semantic definition of ALGOL. Several typical graph algorithms are written in GRAAL to illustrate various features of the language and to show its applicability.
Fiji: an open-source platform for biological-image analysis.

PubMed

Schindelin, Johannes; Arganda-Carreras, Ignacio; Frise, Erwin; Kaynig, Verena; Longair, Mark; Pietzsch, Tobias; Preibisch, Stephan; Rueden, Curtis; Saalfeld, Stephan; Schmid, Benjamin; Tinevez, Jean-Yves; White, Daniel James; Hartenstein, Volker; Eliceiri, Kevin; Tomancak, Pavel; Cardona, Albert

2012-06-28

Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system. We propose Fiji as a platform for productive collaboration between computer science and biology research communities.
Desiderata for computable representations of electronic health records-driven phenotype algorithms

PubMed Central

Mo, Huan; Thompson, William K; Rasmussen, Luke V; Pacheco, Jennifer A; Jiang, Guoqian; Kiefer, Richard; Zhu, Qian; Xu, Jie; Montague, Enid; Carrell, David S; Lingren, Todd; Mentch, Frank D; Ni, Yizhao; Wehbe, Firas H; Peissig, Peggy L; Tromp, Gerard; Larson, Eric B; Chute, Christopher G; Pathak, Jyotishman; Speltz, Peter; Kho, Abel N; Jarvik, Gail P; Bejan, Cosmin A; Williams, Marc S; Borthwick, Kenneth; Kitchner, Terrie E; Roden, Dan M; Harris, Paul A

2015-01-01

Background Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). Methods A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. Results We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. Conclusion A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages. PMID:26342218
Tcl as a Software Environment for a TCS

NASA Astrophysics Data System (ADS)

Terrett, David L.

2002-12-01

This paper describes how the Tcl scripting language and C API has been used as the software environment for a telescope pointing kernel so that new pointing algorithms and software architectures can be developed and tested without needing a real-time operating system or real-time software environment. It has enabled development to continue outside the framework of a specific telescope project while continuing to build a system that is sufficiently complete to be capable of controlling real hardware but expending minimum effort on replacing the services that would normally by provided by a real-time software environment. Tcl is used as a scripting language for configuring the system at startup and then as the command interface for controlling the running system; the Tcl C language API is used to provided a system independent interface to file and socket I/O and other operating system services. The pointing algorithms themselves are implemented as a set of C++ objects calling C library functions that implement the algorithms described in [2]. Although originally designed as a test and development environment, the system, running as a soft real-time process on Linux, has been used to test the SOAR mount control system and will be used as the pointing kernel of the SOAR telescope control system
Studying Associations Between Heart Failure Self-Management and Rehospitalizations Using Natural Language Processing.

PubMed

Topaz, Maxim; Radhakrishnan, Kavita; Blackley, Suzanne; Lei, Victor; Lai, Kenneth; Zhou, Li

2016-09-14

This study developed an innovative natural language processing algorithm to automatically identify heart failure (HF) patients with ineffective self-management status (in the domains of diet, physical activity, medication adherence, and adherence to clinician appointments) from narrative discharge summary notes. We also analyzed the association between self-management status and preventable 30-day hospital readmissions. Our natural language system achieved relatively high accuracy (F-measure = 86.3%; precision = 95%; recall = 79.2%) on a testing sample of 300 notes annotated by two human reviewers. In a sample of 8,901 HF patients admitted to our healthcare system, 14.4% (n = 1,282) had documentation of ineffective HF self-management. Adjusted regression analyses indicated that presence of any skill-related self-management deficit (odds ratio [OR] = 1.3, 95% confidence interval [CI] = [1.1, 1.6]) and non-specific ineffective self-management (OR = 1.5, 95% CI = [1.2, 2]) was significantly associated with readmissions. We have demonstrated the feasibility of identifying ineffective HF self-management from electronic discharge summaries with natural language processing. © The Author(s) 2016.

STAR (Simple Tool for Automated Reasoning): Tutorial guide and reference manual

NASA Technical Reports Server (NTRS)

Borchardt, G. C.

1985-01-01

STAR is an interactive, interpreted programming language for the development and operation of Artificial Intelligence application systems. The language is intended for use primarily in the development of software application systems which rely on a combination of symbolic processing, central to the vast majority of AI algorithms, with routines and data structures defined in compiled languages such as C, FORTRAN and PASCAL. References to routines and data structures defined in compiled languages are intermixed with symbolic structures in STAR, resulting in a hybrid operating environment in which symbolic and non-symbolic processing and organization of data may interact to a high degree within the execution of particular application systems. The STAR language was developed in the course of a project involving AI techniques in the interpretation of imaging spectrometer data and is derived in part from a previous language called CLIP. The interpreter for STAR is implemented as a program defined in the language C and has been made available for distribution in source code form through NASA's Computer Software Management and Information Center (COSMIC). Contained within this report are the STAR Tutorial Guide, which introduces the language in a step-by-step manner, and the STAR Reference Manual, which provides a detailed summary of the features of STAR.
Automatic Lung-RADS™ classification with a natural language processing system.

PubMed

Beyer, Sebastian E; McKee, Brady J; Regis, Shawn M; McKee, Andrea B; Flacke, Sebastian; El Saadawi, Gilan; Wald, Christoph

2017-09-01

Our aim was to train a natural language processing (NLP) algorithm to capture imaging characteristics of lung nodules reported in a structured CT report and suggest the applicable Lung-RADS™ (LR) category. Our study included structured, clinical reports of consecutive CT lung screening (CTLS) exams performed from 08/2014 to 08/2015 at an ACR accredited Lung Cancer Screening Center. All patients screened were at high-risk for lung cancer according to the NCCN Guidelines ® . All exams were interpreted by one of three radiologists credentialed to read CTLS exams using LR using a standard reporting template. Training and test sets consisted of consecutive exams. Lung screening exams were divided into two groups: three training sets (500, 120, and 383 reports each) and one final evaluation set (498 reports). NLP algorithm results were compared with the gold standard of LR category assigned by the radiologist. The sensitivity/specificity of the NLP algorithm to correctly assign LR categories for suspicious nodules (LR 4) and positive nodules (LR 3/4) were 74.1%/98.6% and 75.0%/98.8% respectively. The majority of mismatches occurred in cases where pulmonary findings were present not currently addressed by LR. Misclassifications also resulted from the failure to identify exams as follow-up and the failure to completely characterize part-solid nodules. In a sub-group analysis among structured reports with standardized language, the sensitivity and specificity to detect LR 4 nodules were 87.0% and 99.5%, respectively. An NLP system can accurately suggest the appropriate LR category from CTLS exam findings when standardized reporting is used.
Automatic Lung-RADS™ classification with a natural language processing system

PubMed Central

Beyer, Sebastian E.; Regis, Shawn M.; McKee, Andrea B.; Flacke, Sebastian; El Saadawi, Gilan; Wald, Christoph

2017-01-01

Background Our aim was to train a natural language processing (NLP) algorithm to capture imaging characteristics of lung nodules reported in a structured CT report and suggest the applicable Lung-RADS™ (LR) category. Methods Our study included structured, clinical reports of consecutive CT lung screening (CTLS) exams performed from 08/2014 to 08/2015 at an ACR accredited Lung Cancer Screening Center. All patients screened were at high-risk for lung cancer according to the NCCN Guidelines®. All exams were interpreted by one of three radiologists credentialed to read CTLS exams using LR using a standard reporting template. Training and test sets consisted of consecutive exams. Lung screening exams were divided into two groups: three training sets (500, 120, and 383 reports each) and one final evaluation set (498 reports). NLP algorithm results were compared with the gold standard of LR category assigned by the radiologist. Results The sensitivity/specificity of the NLP algorithm to correctly assign LR categories for suspicious nodules (LR 4) and positive nodules (LR 3/4) were 74.1%/98.6% and 75.0%/98.8% respectively. The majority of mismatches occurred in cases where pulmonary findings were present not currently addressed by LR. Misclassifications also resulted from the failure to identify exams as follow-up and the failure to completely characterize part-solid nodules. In a sub-group analysis among structured reports with standardized language, the sensitivity and specificity to detect LR 4 nodules were 87.0% and 99.5%, respectively. Conclusions An NLP system can accurately suggest the appropriate LR category from CTLS exam findings when standardized reporting is used. PMID:29221286
Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

PubMed Central

Narayanan, Shrikanth; Georgiou, Panayiotis G.

2013-01-01

The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277
MIA - A free and open source software for gray scale medical image analysis

PubMed Central

2013-01-01

Background Gray scale images make the bulk of data in bio-medical image analysis, and hence, the main focus of many image processing tasks lies in the processing of these monochrome images. With ever improving acquisition devices, spatial and temporal image resolution increases, and data sets become very large. Various image processing frameworks exists that make the development of new algorithms easy by using high level programming languages or visual programming. These frameworks are also accessable to researchers that have no background or little in software development because they take care of otherwise complex tasks. Specifically, the management of working memory is taken care of automatically, usually at the price of requiring more it. As a result, processing large data sets with these tools becomes increasingly difficult on work station class computers. One alternative to using these high level processing tools is the development of new algorithms in a languages like C++, that gives the developer full control over how memory is handled, but the resulting workflow for the prototyping of new algorithms is rather time intensive, and also not appropriate for a researcher with little or no knowledge in software development. Another alternative is in using command line tools that run image processing tasks, use the hard disk to store intermediate results, and provide automation by using shell scripts. Although not as convenient as, e.g. visual programming, this approach is still accessable to researchers without a background in computer science. However, only few tools exist that provide this kind of processing interface, they are usually quite task specific, and don’t provide an clear approach when one wants to shape a new command line tool from a prototype shell script. Results The proposed framework, MIA, provides a combination of command line tools, plug-ins, and libraries that make it possible to run image processing tasks interactively in a command shell and to prototype by using the according shell scripting language. Since the hard disk becomes the temporal storage memory management is usually a non-issue in the prototyping phase. By using string-based descriptions for filters, optimizers, and the likes, the transition from shell scripts to full fledged programs implemented in C++ is also made easy. In addition, its design based on atomic plug-ins and single tasks command line tools makes it easy to extend MIA, usually without the requirement to touch or recompile existing code. Conclusion In this article, we describe the general design of MIA, a general purpouse framework for gray scale image processing. We demonstrated the applicability of the software with example applications from three different research scenarios, namely motion compensation in myocardial perfusion imaging, the processing of high resolution image data that arises in virtual anthropology, and retrospective analysis of treatment outcome in orthognathic surgery. With MIA prototyping algorithms by using shell scripts that combine small, single-task command line tools is a viable alternative to the use of high level languages, an approach that is especially useful when large data sets need to be processed. PMID:24119305
MIA - A free and open source software for gray scale medical image analysis.

PubMed

Wollny, Gert; Kellman, Peter; Ledesma-Carbayo, María-Jesus; Skinner, Matthew M; Hublin, Jean-Jaques; Hierl, Thomas

2013-10-11

Gray scale images make the bulk of data in bio-medical image analysis, and hence, the main focus of many image processing tasks lies in the processing of these monochrome images. With ever improving acquisition devices, spatial and temporal image resolution increases, and data sets become very large.Various image processing frameworks exists that make the development of new algorithms easy by using high level programming languages or visual programming. These frameworks are also accessable to researchers that have no background or little in software development because they take care of otherwise complex tasks. Specifically, the management of working memory is taken care of automatically, usually at the price of requiring more it. As a result, processing large data sets with these tools becomes increasingly difficult on work station class computers.One alternative to using these high level processing tools is the development of new algorithms in a languages like C++, that gives the developer full control over how memory is handled, but the resulting workflow for the prototyping of new algorithms is rather time intensive, and also not appropriate for a researcher with little or no knowledge in software development.Another alternative is in using command line tools that run image processing tasks, use the hard disk to store intermediate results, and provide automation by using shell scripts. Although not as convenient as, e.g. visual programming, this approach is still accessable to researchers without a background in computer science. However, only few tools exist that provide this kind of processing interface, they are usually quite task specific, and don't provide an clear approach when one wants to shape a new command line tool from a prototype shell script. The proposed framework, MIA, provides a combination of command line tools, plug-ins, and libraries that make it possible to run image processing tasks interactively in a command shell and to prototype by using the according shell scripting language. Since the hard disk becomes the temporal storage memory management is usually a non-issue in the prototyping phase. By using string-based descriptions for filters, optimizers, and the likes, the transition from shell scripts to full fledged programs implemented in C++ is also made easy. In addition, its design based on atomic plug-ins and single tasks command line tools makes it easy to extend MIA, usually without the requirement to touch or recompile existing code. In this article, we describe the general design of MIA, a general purpouse framework for gray scale image processing. We demonstrated the applicability of the software with example applications from three different research scenarios, namely motion compensation in myocardial perfusion imaging, the processing of high resolution image data that arises in virtual anthropology, and retrospective analysis of treatment outcome in orthognathic surgery. With MIA prototyping algorithms by using shell scripts that combine small, single-task command line tools is a viable alternative to the use of high level languages, an approach that is especially useful when large data sets need to be processed.
Algorithm Building and Learning Programming Languages Using a New Educational Paradigm

NASA Astrophysics Data System (ADS)

Jain, Anshul K.; Singhal, Manik; Gupta, Manu Sheel

2011-08-01

This research paper presents a new concept of using a single tool to associate syntax of various programming languages, algorithms and basic coding techniques. A simple framework has been programmed in Python that helps students learn skills to develop algorithms, and implement them in various programming languages. The tool provides an innovative and a unified graphical user interface for development of multimedia objects, educational games and applications. It also aids collaborative learning amongst students and teachers through an integrated mechanism based on Remote Procedure Calls. The paper also elucidates an innovative method for code generation to enable students to learn the basics of programming languages using drag-n-drop methods for image objects.
Natural language processing, pragmatics, and verbal behavior

PubMed Central

Cherpas, Chris

1992-01-01

Natural Language Processing (NLP) is that part of Artificial Intelligence (AI) concerned with endowing computers with verbal and listener repertoires, so that people can interact with them more easily. Most attention has been given to accurately parsing and generating syntactic structures, although NLP researchers are finding ways of handling the semantic content of language as well. It is increasingly apparent that understanding the pragmatic (contextual and consequential) dimension of natural language is critical for producing effective NLP systems. While there are some techniques for applying pragmatics in computer systems, they are piecemeal, crude, and lack an integrated theoretical foundation. Unfortunately, there is little awareness that Skinner's (1957) Verbal Behavior provides an extensive, principled pragmatic analysis of language. The implications of Skinner's functional analysis for NLP and for verbal aspects of epistemology lead to a proposal for a “user expert”—a computer system whose area of expertise is the long-term computer user. The evolutionary nature of behavior suggests an AI technology known as genetic algorithms/programming for implementing such a system. ImagesFig. 1 PMID:22477052
Common Ground: An Interactive Visual Exploration and Discovery for Complex Health Data

DTIC Science & Technology

2015-04-01

working with Intermountain Healthcare on a new rich dataset extracted directly from medical notes using natural language processing ( NLP ) algorithms...probabilities based on a state- of-the-art NLP classifiers. At that stage the data did not include geographic information or temporal information but we
A High-Level Language for Modeling Algorithms and Their Properties

NASA Astrophysics Data System (ADS)

Akhtar, Sabina; Merz, Stephan; Quinson, Martin

Designers of concurrent and distributed algorithms usually express them using pseudo-code. In contrast, most verification techniques are based on more mathematically-oriented formalisms such as state transition systems. This conceptual gap contributes to hinder the use of formal verification techniques. Leslie Lamport introduced PlusCal, a high-level algorithmic language that has the "look and feel" of pseudo-code, but is equipped with a precise semantics and includes a high-level expression language based on set theory. PlusCal models can be compiled to TLA + and verified using the model checker tlc.
A Cognitive Computing Approach for Classification of Complaints in the Insurance Industry

NASA Astrophysics Data System (ADS)

Forster, J.; Entrup, B.

2017-10-01

In this paper we present and evaluate a cognitive computing approach for classification of dissatisfaction and four complaint specific complaint classes in correspondence documents between insurance clients and an insurance company. A cognitive computing approach includes the combination classical natural language processing methods, machine learning algorithms and the evaluation of hypothesis. The approach combines a MaxEnt machine learning algorithm with language modelling, tf-idf and sentiment analytics to create a multi-label text classification model. The result is trained and tested with a set of 2500 original insurance communication documents written in German, which have been manually annotated by the partnering insurance company. With a F1-Score of 0.9, a reliable text classification component has been implemented and evaluated. A final outlook towards a cognitive computing insurance assistant is given in the end.
Supercomputing on massively parallel bit-serial architectures

NASA Technical Reports Server (NTRS)

Iobst, Ken

1985-01-01

Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Relaxation of selection, niche construction, and the Baldwin effect in language evolution.

PubMed

Yamauchi, Hajime; Hashimoto, Takashi

2010-01-01

Deacon has suggested that one of the key factors of language evolution is not characterized by an increase in genetic contribution, often known as the Baldwin effect, but rather by a decrease. This process effectively increases linguistic learning capability by organizing a novel synergy of multiple lower-order functions previously irrelevant to the process of language acquisition. Deacon posits that this transition is not caused by natural selection. Rather, it is due to the relaxation of natural selection. While there are some cases in which relaxation caused by some external factors indeed induces the transition, we do not know what kind of relaxation has worked in language evolution. In this article, a genetic-algorithm-based computer simulation is used to investigate how the niche-constructing aspect of linguistic behavior may trigger the degradation of genetic predisposition related to language learning. The results show that agents initially increase their genetic predisposition for language learning—the Baldwin effect. They create a highly uniform sociolinguistic environment—a linguistic niche construction. This means that later generations constantly receive very similar inputs from adult agents, and subsequently the selective pressure to retain the genetic predisposition is relaxed.
Efficient lossy compression implementations of hyperspectral images: tools, hardware platforms, and comparisons

NASA Astrophysics Data System (ADS)

García, Aday; Santos, Lucana; López, Sebastián.; Callicó, Gustavo M.; Lopez, Jose F.; Sarmiento, Roberto

2014-05-01

Efficient onboard satellite hyperspectral image compression represents a necessity and a challenge for current and future space missions. Therefore, it is mandatory to provide hardware implementations for this type of algorithms in order to achieve the constraints required for onboard compression. In this work, we implement the Lossy Compression for Exomars (LCE) algorithm on an FPGA by means of high-level synthesis (HSL) in order to shorten the design cycle. Specifically, we use CatapultC HLS tool to obtain a VHDL description of the LCE algorithm from C-language specifications. Two different approaches are followed for HLS: on one hand, introducing the whole C-language description in CatapultC and on the other hand, splitting the C-language description in functional modules to be implemented independently with CatapultC, connecting and controlling them by an RTL description code without HLS. In both cases the goal is to obtain an FPGA implementation. We explain the several changes applied to the original Clanguage source code in order to optimize the results obtained by CatapultC for both approaches. Experimental results show low area occupancy of less than 15% for a SRAM-based Virtex-5 FPGA and a maximum frequency above 80 MHz. Additionally, the LCE compressor was implemented into an RTAX2000S antifuse-based FPGA, showing an area occupancy of 75% and a frequency around 53 MHz. All these serve to demonstrate that the LCE algorithm can be efficiently executed on an FPGA onboard a satellite. A comparison between both implementation approaches is also provided. The performance of the algorithm is finally compared with implementations on other technologies, specifically a graphics processing unit (GPU) and a single-threaded CPU.
What automated vocal analysis reveals about the vocal production and language learning environment of young children with autism.

PubMed

Warren, Steven F; Gilkerson, Jill; Richards, Jeffrey A; Oller, D Kimbrough; Xu, Dongxin; Yapanel, Umit; Gray, Sharmistha

2010-05-01

The study compared the vocal production and language learning environments of 26 young children with autism spectrum disorder (ASD) to 78 typically developing children using measures derived from automated vocal analysis. A digital language processor and audio-processing algorithms measured the amount of adult words to children and the amount of vocalizations they produced during 12-h recording periods in their natural environments. The results indicated significant differences between typically developing children and children with ASD in the characteristics of conversations, the number of conversational turns, and in child vocalizations that correlated with parent measures of various child characteristics. Automated measurement of the language learning environment of young children with ASD reveals important differences from the environments experienced by typically developing children.
A software framework for pipelined arithmetic algorithms in field programmable gate arrays

NASA Astrophysics Data System (ADS)

Kim, J. B.; Won, E.

2018-03-01

Pipelined algorithms implemented in field programmable gate arrays are extensively used for hardware triggers in the modern experimental high energy physics field and the complexity of such algorithms increases rapidly. For development of such hardware triggers, algorithms are developed in C++, ported to hardware description language for synthesizing firmware, and then ported back to C++ for simulating the firmware response down to the single bit level. We present a C++ software framework which automatically simulates and generates hardware description language code for pipelined arithmetic algorithms.
Rapid prototyping of update algorithm of discrete Fourier transform for real-time signal processing

NASA Astrophysics Data System (ADS)

Kakad, Yogendra P.; Sherlock, Barry G.; Chatapuram, Krishnan V.; Bishop, Stephen

2001-10-01

An algorithm is developed in the companion paper, to update the existing DFT to represent the new data series that results when a new signal point is received. Updating the DFT in this way uses less computation than directly evaluating the DFT using the FFT algorithm, This reduces the computational order by a factor of log2 N. The algorithm is able to work in the presence of data window function, for use with rectangular window, the split triangular, Hanning, Hamming, and Blackman windows. In this paper, a hardware implementation of this algorithm, using FPGA technology, is outlined. Unlike traditional fully customized VLSI circuits, FPGAs represent a technical break through in the corresponding industry. The FPGA implements thousands of gates of logic in a single IC chip and it can be programmed by users at their site in a few seconds or less depending on the type of device used. The risk is low and the development time is short. The advantages have made FPGAs very popular for rapid prototyping of algorithms in the area of digital communication, digital signal processing, and image processing. Our paper addresses the related issues of implementation using hardware descriptive language in the development of the design and the subsequent downloading on the programmable hardware chip.
Three-pass protocol scheme for bitmap image security by using vernam cipher algorithm

NASA Astrophysics Data System (ADS)

Rachmawati, D.; Budiman, M. A.; Aulya, L.

2018-02-01

Confidentiality, integrity, and efficiency are the crucial aspects of data security. Among the other digital data, image data is too prone to abuse of operation like duplication, modification, etc. There are some data security techniques, one of them is cryptography. The security of Vernam Cipher cryptography algorithm is very dependent on the key exchange process. If the key is leaked, security of this algorithm will collapse. Therefore, a method that minimizes key leakage during the exchange of messages is required. The method which is used, is known as Three-Pass Protocol. This protocol enables message delivery process without the key exchange. Therefore, the sending messages process can reach the receiver safely without fear of key leakage. The system is built by using Java programming language. The materials which are used for system testing are image in size 200×200 pixel, 300×300 pixel, 500×500 pixel, 800×800 pixel and 1000×1000 pixel. The result of experiments showed that Vernam Cipher algorithm in Three-Pass Protocol scheme could restore the original image.
A Proposal of 3-dimensional Self-organizing Memory and Its Application to Knowledge Extraction from Natural Language

NASA Astrophysics Data System (ADS)

Sakakibara, Kai; Hagiwara, Masafumi

In this paper, we propose a 3-dimensional self-organizing memory and describe its application to knowledge extraction from natural language. First, the proposed system extracts a relation between words by JUMAN (morpheme analysis system) and KNP (syntax analysis system), and stores it in short-term memory. In the short-term memory, the relations are attenuated with the passage of processing. However, the relations with high frequency of appearance are stored in the long-term memory without attenuation. The relations in the long-term memory are placed to the proposed 3-dimensional self-organizing memory. We used a new learning algorithm called ``Potential Firing'' in the learning phase. In the recall phase, the proposed system recalls relational knowledge from the learned knowledge based on the input sentence. We used a new recall algorithm called ``Waterfall Recall'' in the recall phase. We added a function to respond to questions in natural language with ``yes/no'' in order to confirm the validity of proposed system by evaluating the quantity of correct answers.
Scheduling language and algorithm development study. Appendix: Study approach and activity summary

NASA Technical Reports Server (NTRS)

1974-01-01

The approach and organization of the study to develop a high level computer programming language and a program library are presented. The algorithm and problem modeling analyses are summarized. The approach used to identify and specify the capabilities required in the basic language is described. Results of the analyses used to define specifications for the scheduling module library are presented.

Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.

PubMed

Narayanan, Shrikanth; Georgiou, Panayiotis G

2013-02-07

The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion.
Standardized languages and notations for graphical modelling of patient care processes: a systematic review.

PubMed

Mincarone, Pierpaolo; Leo, Carlo Giacomo; Trujillo-Martín, Maria Del Mar; Manson, Jan; Guarino, Roberto; Ponzini, Giuseppe; Sabina, Saverio

2018-04-01

The importance of working toward quality improvement in healthcare implies an increasing interest in analysing, understanding and optimizing process logic and sequences of activities embedded in healthcare processes. Their graphical representation promotes faster learning, higher retention and better compliance. The study identifies standardized graphical languages and notations applied to patient care processes and investigates their usefulness in the healthcare setting. Peer-reviewed literature up to 19 May 2016. Information complemented by a questionnaire sent to the authors of selected studies. Systematic review conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. Five authors extracted results of selected studies. Ten articles met the inclusion criteria. One notation and language for healthcare process modelling were identified with an application to patient care processes: Business Process Model and Notation and Unified Modeling Language™. One of the authors of every selected study completed the questionnaire. Users' comprehensibility and facilitation of inter-professional analysis of processes have been recognized, in the filled in questionnaires, as major strengths for process modelling in healthcare. Both the notation and the language could increase the clarity of presentation thanks to their visual properties, the capacity of easily managing macro and micro scenarios, the possibility of clearly and precisely representing the process logic. Both could increase guidelines/pathways applicability by representing complex scenarios through charts and algorithms hence contributing to reduce unjustified practice variations which negatively impact on quality of care and patient safety.
The semantic web and computer vision: old AI meets new AI

NASA Astrophysics Data System (ADS)

Mundy, J. L.; Dong, Y.; Gilliam, A.; Wagner, R.

2018-04-01

There has been vast process in linking semantic information across the billions of web pages through the use of ontologies encoded in the Web Ontology Language (OWL) based on the Resource Description Framework (RDF). A prime example is the Wikipedia where the knowledge contained in its more than four million pages is encoded in an ontological database called DBPedia http://wiki.dbpedia.org/. Web-based query tools can retrieve semantic information from DBPedia encoded in interlinked ontologies that can be accessed using natural language. This paper will show how this vast context can be used to automate the process of querying images and other geospatial data in support of report changes in structures and activities. Computer vision algorithms are selected and provided with context based on natural language requests for monitoring and analysis. The resulting reports provide semantically linked observations from images and 3D surface models.
Desiderata for computable representations of electronic health records-driven phenotype algorithms.

PubMed

Mo, Huan; Thompson, William K; Rasmussen, Luke V; Pacheco, Jennifer A; Jiang, Guoqian; Kiefer, Richard; Zhu, Qian; Xu, Jie; Montague, Enid; Carrell, David S; Lingren, Todd; Mentch, Frank D; Ni, Yizhao; Wehbe, Firas H; Peissig, Peggy L; Tromp, Gerard; Larson, Eric B; Chute, Christopher G; Pathak, Jyotishman; Denny, Joshua C; Speltz, Peter; Kho, Abel N; Jarvik, Gail P; Bejan, Cosmin A; Williams, Marc S; Borthwick, Kenneth; Kitchner, Terrie E; Roden, Dan M; Harris, Paul A

2015-11-01

Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Real time mitigation of atmospheric turbulence in long distance imaging using the lucky region fusion algorithm with FPGA and GPU hardware acceleration

NASA Astrophysics Data System (ADS)

Jackson, Christopher Robert

"Lucky-region" fusion (LRF) is a synthetic imaging technique that has proven successful in enhancing the quality of images distorted by atmospheric turbulence. The LRF algorithm selects sharp regions of an image obtained from a series of short exposure frames, and fuses the sharp regions into a final, improved image. In previous research, the LRF algorithm had been implemented on a PC using the C programming language. However, the PC did not have sufficient sequential processing power to handle real-time extraction, processing and reduction required when the LRF algorithm was applied to real-time video from fast, high-resolution image sensors. This thesis describes two hardware implementations of the LRF algorithm to achieve real-time image processing. The first was created with a VIRTEX-7 field programmable gate array (FPGA). The other developed using the graphics processing unit (GPU) of a NVIDIA GeForce GTX 690 video card. The novelty in the FPGA approach is the creation of a "black box" LRF video processing system with a general camera link input, a user controller interface, and a camera link video output. We also describe a custom hardware simulation environment we have built to test the FPGA LRF implementation. The advantage of the GPU approach is significantly improved development time, integration of image stabilization into the system, and comparable atmospheric turbulence mitigation.
Super-Resolution in Plenoptic Cameras Using FPGAs

PubMed Central

Pérez, Joel; Magdaleno, Eduardo; Pérez, Fernando; Rodríguez, Manuel; Hernández, David; Corrales, Jaime

2014-01-01

Plenoptic cameras are a new type of sensor that extend the possibilities of current commercial cameras allowing 3D refocusing or the capture of 3D depths. One of the limitations of plenoptic cameras is their limited spatial resolution. In this paper we describe a fast, specialized hardware implementation of a super-resolution algorithm for plenoptic cameras. The algorithm has been designed for field programmable graphic array (FPGA) devices using VHDL (very high speed integrated circuit (VHSIC) hardware description language). With this technology, we obtain an acceleration of several orders of magnitude using its extremely high-performance signal processing capability through parallelism and pipeline architecture. The system has been developed using generics of the VHDL language. This allows a very versatile and parameterizable system. The system user can easily modify parameters such as data width, number of microlenses of the plenoptic camera, their size and shape, and the super-resolution factor. The speed of the algorithm in FPGA has been successfully compared with the execution using a conventional computer for several image sizes and different 3D refocusing planes. PMID:24841246
Super-resolution in plenoptic cameras using FPGAs.

PubMed

Pérez, Joel; Magdaleno, Eduardo; Pérez, Fernando; Rodríguez, Manuel; Hernández, David; Corrales, Jaime

2014-05-16

Plenoptic cameras are a new type of sensor that extend the possibilities of current commercial cameras allowing 3D refocusing or the capture of 3D depths. One of the limitations of plenoptic cameras is their limited spatial resolution. In this paper we describe a fast, specialized hardware implementation of a super-resolution algorithm for plenoptic cameras. The algorithm has been designed for field programmable graphic array (FPGA) devices using VHDL (very high speed integrated circuit (VHSIC) hardware description language). With this technology, we obtain an acceleration of several orders of magnitude using its extremely high-performance signal processing capability through parallelism and pipeline architecture. The system has been developed using generics of the VHDL language. This allows a very versatile and parameterizable system. The system user can easily modify parameters such as data width, number of microlenses of the plenoptic camera, their size and shape, and the super-resolution factor. The speed of the algorithm in FPGA has been successfully compared with the execution using a conventional computer for several image sizes and different 3D refocusing planes.
Description of the AILS Alerting Algorithm

NASA Technical Reports Server (NTRS)

Samanant, Paul; Jackson, Mike

2000-01-01

This document provides a complete description of the Airborne Information for Lateral Spacing (AILS) alerting algorithms. The purpose of AILS is to provide separation assurance between aircraft during simultaneous approaches to closely spaced parallel runways. AILS will allow independent approaches to be flown in such situations where dependent approaches were previously required (typically under Instrument Meteorological Conditions (IMC)). This is achieved by providing multiple levels of alerting for pairs of aircraft that are in parallel approach situations. This document#s scope is comprehensive and covers everything from general overviews, definitions, and concepts down to algorithmic elements and equations. The entire algorithm is presented in complete and detailed pseudo-code format. This can be used by software programmers to program AILS into a software language. Additional supporting information is provided in the form of coordinate frame definitions, data requirements, calling requirements as well as all necessary pre-processing and post-processing requirements. This is important and required information for the implementation of AILS into an analysis, a simulation, or a real-time system.
Adaptive control for eye-gaze input system

NASA Astrophysics Data System (ADS)

Zhao, Qijie; Tu, Dawei; Yin, Hairong

2004-01-01

The characteristics of the vision-based human-computer interaction system have been analyzed, and the practical application and its limited factors at present time have also been mentioned. The information process methods have been put forward. In order to make the communication flexible and spontaneous, the algorithms to adaptive control of user"s head movement has been designed, and the events-based methods and object-oriented computer language is used to develop the system software, by experiment testing, we found that under given condition, these methods and algorithms can meet the need of the HCI.
The paradigm compiler: Mapping a functional language for the connection machine

NASA Technical Reports Server (NTRS)

Dennis, Jack B.

1989-01-01

The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
Vietnamese Document Representation and Classification

NASA Astrophysics Data System (ADS)

Nguyen, Giang-Son; Gao, Xiaoying; Andreae, Peter

Vietnamese is very different from English and little research has been done on Vietnamese document classification, or indeed, on any kind of Vietnamese language processing, and only a few small corpora are available for research. We created a large Vietnamese text corpus with about 18000 documents, and manually classified them based on different criteria such as topics and styles, giving several classification tasks of different difficulty levels. This paper introduces a new syllable-based document representation at the morphological level of the language for efficient classification. We tested the representation on our corpus with different classification tasks using six classification algorithms and two feature selection techniques. Our experiments show that the new representation is effective for Vietnamese categorization, and suggest that best performance can be achieved using syllable-pair document representation, an SVM with a polynomial kernel as the learning algorithm, and using Information gain and an external dictionary for feature selection.
Programming in HAL/S

NASA Technical Reports Server (NTRS)

Ryer, M. J.

1978-01-01

HAL/S is a computer programming language; it is a representation for algorithms which can be interpreted by either a person or a computer. HAL/S compilers transform blocks of HAL/S code into machine language which can then be directly executed by a computer. When the machine language is executed, the algorithm specified by the HAL/S code (source) is performed. This document describes how to read and write HAL/S source.
Stemming Malay Text and Its Application in Automatic Text Categorization

NASA Astrophysics Data System (ADS)

Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.
Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

NASA Technical Reports Server (NTRS)

Weeks, Cindy Lou

1986-01-01

Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Brian hears: online auditory processing using vectorization over channels.

PubMed

Fontaine, Bertrand; Goodman, Dan F M; Benichoux, Victor; Brette, Romain

2011-01-01

The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in "Brian Hears," a library for the spiking neural network simulator package "Brian." This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations.
Internet (WWW) based system of ultrasonic image processing tools for remote image analysis.

PubMed

Zeng, Hong; Fei, Ding-Yu; Fu, Cai-Ting; Kraft, Kenneth A

2003-07-01

Ultrasonic Doppler color imaging can provide anatomic information and simultaneously render flow information within blood vessels for diagnostic purpose. Many researchers are currently developing ultrasound image processing algorithms in order to provide physicians with accurate clinical parameters from the images. Because researchers use a variety of computer languages and work on different computer platforms to implement their algorithms, it is difficult for other researchers and physicians to access those programs. A system has been developed using World Wide Web (WWW) technologies and HTTP communication protocols to publish our ultrasonic Angle Independent Doppler Color Image (AIDCI) processing algorithm and several general measurement tools on the Internet, where authorized researchers and physicians can easily access the program using web browsers to carry out remote analysis of their local ultrasonic images or images provided from the database. In order to overcome potential incompatibility between programs and users' computer platforms, ActiveX technology was used in this project. The technique developed may also be used for other research fields.
Modeling virtual organizations with Latent Dirichlet Allocation: a case for natural language processing.

PubMed

Gross, Alexander; Murthy, Dhiraj

2014-10-01

This paper explores a variety of methods for applying the Latent Dirichlet Allocation (LDA) automated topic modeling algorithm to the modeling of the structure and behavior of virtual organizations found within modern social media and social networking environments. As the field of Big Data reveals, an increase in the scale of social data available presents new challenges which are not tackled by merely scaling up hardware and software. Rather, they necessitate new methods and, indeed, new areas of expertise. Natural language processing provides one such method. This paper applies LDA to the study of scientific virtual organizations whose members employ social technologies. Because of the vast data footprint in these virtual platforms, we found that natural language processing was needed to 'unlock' and render visible latent, previously unseen conversational connections across large textual corpora (spanning profiles, discussion threads, forums, and other social media incarnations). We introduce variants of LDA and ultimately make the argument that natural language processing is a critical interdisciplinary methodology to make better sense of social 'Big Data' and we were able to successfully model nested discussion topics from forums and blog posts using LDA. Importantly, we found that LDA can move us beyond the state-of-the-art in conventional Social Network Analysis techniques. Copyright © 2014 Elsevier Ltd. All rights reserved.
Inheritance on processes, exemplified on distributed termination detection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomsen, K.S.

1987-02-01

A multiple inheritance mechanism on processes is designed and presented within the framework of a small object oriented language. Processes are described in classes, and the different action parts of a process inherited from different classes are executed in a coroutine-like style called alternation. The inheritance mechanism is a useful tool for factorizing the description of common aspects of processes. This is demonstrated within the domain of distributed programming by using the inheritance mechanism to factorize the description of distributed termination detection algorithms from the description of the distributed main computations for which termination is to be detected. A clearmore » separation of concerns is obtained, and arbitrary combinations of terminations detection algorithms and main computations can be formed. The same termination detection classes can also be used for more general purposes within distributed programming, such as detecting termination of each phase in a multi-phase main computation.« less
SSL: A software specification language

NASA Technical Reports Server (NTRS)

Austin, S. L.; Buckles, B. P.; Ryan, J. P.

1976-01-01

SSL (Software Specification Language) is a new formalism for the definition of specifications for software systems. The language provides a linear format for the representation of the information normally displayed in a two-dimensional module inter-dependency diagram. In comparing SSL to FORTRAN or ALGOL, it is found to be largely complementary to the algorithmic (procedural) languages. SSL is capable of representing explicitly module interconnections and global data flow, information which is deeply imbedded in the algorithmic languages. On the other hand, SSL is not designed to depict the control flow within modules. The SSL level of software design explicitly depicts intermodule data flow as a functional specification.
DOE Office of Scientific and Technical Information (OSTI.GOV)

P-Mart was designed specifically to allow cancer researchers to perform robust statistical processing of publicly available cancer proteomic datasets. To date an online statistical processing suite for proteomics does not exist. The P-Mart software is designed to allow statistical programmers to utilize these algorithms through packages in the R programming language as well as offering a web-based interface using the Azure cloud technology. The Azure cloud technology also allows the release of the software via Docker containers.

Image based book cover recognition and retrieval

NASA Astrophysics Data System (ADS)

Sukhadan, Kalyani; Vijayarajan, V.; Krishnamoorthi, A.; Bessie Amali, D. Geraldine

2017-11-01

In this we are developing a graphical user interface using MATLAB for the users to check the information related to books in real time. We are taking the photos of the book cover using GUI, then by using MSER algorithm it will automatically detect all the features from the input image, after this it will filter bifurcate non-text features which will be based on morphological difference between text and non-text regions. We implemented a text character alignment algorithm which will improve the accuracy of the original text detection. We will also have a look upon the built in MATLAB OCR recognition algorithm and an open source OCR which is commonly used to perform better detection results, post detection algorithm is implemented and natural language processing to perform word correction and false detection inhibition. Finally, the detection result will be linked to internet to perform online matching. More than 86% accuracy can be obtained by this algorithm.
Image-algebraic design of multispectral target recognition algorithms

NASA Astrophysics Data System (ADS)

Schmalz, Mark S.; Ritter, Gerhard X.

1994-06-01

In this paper, we discuss methods for multispectral ATR (Automated Target Recognition) of small targets that are sensed under suboptimal conditions, such as haze, smoke, and low light levels. In particular, we discuss our ongoing development of algorithms and software that effect intelligent object recognition by selecting ATR filter parameters according to ambient conditions. Our algorithms are expressed in terms of IA (image algebra), a concise, rigorous notation that unifies linear and nonlinear mathematics in the image processing domain. IA has been implemented on a variety of parallel computers, with preprocessors available for the Ada and FORTRAN languages. An image algebra C++ class library has recently been made available. Thus, our algorithms are both feasible implementationally and portable to numerous machines. Analyses emphasize the aspects of image algebra that aid the design of multispectral vision algorithms, such as parameterized templates that facilitate the flexible specification of ATR filters.
QRev—Software for computation and quality assurance of acoustic doppler current profiler moving-boat streamflow measurements—Technical manual for version 2.8

USGS Publications Warehouse

Mueller, David S.

2016-06-21

The software program, QRev applies common and consistent computational algorithms combined with automated filtering and quality assessment of the data to improve the quality and efficiency of streamflow measurements and helps ensure that U.S. Geological Survey streamflow measurements are consistent, accurate, and independent of the manufacturer of the instrument used to make the measurement. Software from different manufacturers uses different algorithms for various aspects of the data processing and discharge computation. The algorithms used by QRev to filter data, interpolate data, and compute discharge are documented and compared to the algorithms used in the manufacturers’ software. QRev applies consistent algorithms and creates a data structure that is independent of the data source. QRev saves an extensible markup language (XML) file that can be imported into databases or electronic field notes software. This report is the technical manual for version 2.8 of QRev.
Robot Path Planning in Uncertain Environments: A Language Measure-theoretic Approach

DTIC Science & Technology

2014-01-01

Paper DS-14-1028 to appear in the Special Issue on Stochastic Models, Control and Algorithms in Robotics, ASME Journal of Dynamic Systems...Measurement and Control Robot Path Planning in Uncertain Environments: A Language Measure-theoretic Approach⋆ Devesh K. Jha† Yue Li† Thomas A. Wettergren‡† Asok...algorithm, called ν⋆, that was formulated in the framework of probabilistic finite state automata (PFSA) and language measure from a control -theoretic
Mr.CAS-A minimalistic (pure) Ruby CAS for fast prototyping and code generation

NASA Astrophysics Data System (ADS)

Ragni, Matteo

There are Computer Algebra System (CAS) systems on the market with complete solutions for manipulation of analytical models. But exporting a model that implements specific algorithms on specific platforms, for target languages or for particular numerical library, is often a rigid procedure that requires manual post-processing. This work presents a Ruby library that exposes core CAS capabilities, i.e. simplification, substitution, evaluation, etc. The library aims at programmers that need to rapidly prototype and generate numerical code for different target languages, while keeping separated mathematical expression from the code generation rules, where best practices for numerical conditioning are implemented. The library is written in pure Ruby language and is compatible with most Ruby interpreters.
Using natural language processing for identification of herpes zoster ophthalmicus cases to support population-based study.

PubMed

Zheng, Chengyi; Luo, Yi; Mercado, Cheryl; Sy, Lina; Jacobsen, Steven J; Ackerson, Brad; Lewin, Bruno; Tseng, Hung Fu

2018-06-19

Diagnosis codes are inadequate for accurately identifying herpes zoster ophthalmicus (HZO). There is significant lack of population-based studies on HZO due to the high expense of manual review of medical records. To assess whether HZO can be identified from the clinical notes using natural language processing (NLP). To investigate the epidemiology of HZO among HZ population based on the developed approach. A retrospective cohort analysis. A total of 49,914 southern California residents aged over 18 years, who had a new diagnosis of HZ. An NLP-based algorithm was developed and validated with the manually curated validation dataset (n=461). The algorithm was applied on over 1 million clinical notes associated with the study population. HZO versus non-HZO cases were compared by age, sex, race, and comorbidities. We measured the accuracy of NLP algorithm. NLP algorithm achieved 95.6% sensitivity and 99.3% specificity. Compared to the diagnosis codes, NLP identified significant more HZO cases among HZ population (13.9% versus 1.7%). Compared to the non-HZO group, the HZO group was older, had more males, had more Whites, and had more outpatient visits. We developed and validated an automatic method to identify HZO cases with high accuracy. As one of the largest studies on HZO, our finding emphasizes the importance of preventing HZ in the elderly population. This method can be a valuable tool to support population-based studies and clinical care of HZO in the era of big data. This article is protected by copyright. All rights reserved.
A programmable computational image sensor for high-speed vision

NASA Astrophysics Data System (ADS)

Yang, Jie; Shi, Cong; Long, Xitian; Wu, Nanjian

2013-08-01

In this paper we present a programmable computational image sensor for high-speed vision. This computational image sensor contains four main blocks: an image pixel array, a massively parallel processing element (PE) array, a row processor (RP) array and a RISC core. The pixel-parallel PE is responsible for transferring, storing and processing image raw data in a SIMD fashion with its own programming language. The RPs are one dimensional array of simplified RISC cores, it can carry out complex arithmetic and logic operations. The PE array and RP array can finish great amount of computation with few instruction cycles and therefore satisfy the low- and middle-level high-speed image processing requirement. The RISC core controls the whole system operation and finishes some high-level image processing algorithms. We utilize a simplified AHB bus as the system bus to connect our major components. Programming language and corresponding tool chain for this computational image sensor are also developed.
The science of computing - Parallel computation

NASA Technical Reports Server (NTRS)

Denning, P. J.

1985-01-01

Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
MATH77, Version 4.0

NASA Technical Reports Server (NTRS)

Lawson, Charles L.; Krogh, Fred; Van Snyder, W.; Oken, Carol A.; Mccreary, Faith A.; Lieske, Jay H.; Perrine, Jack; Coffin, Ralph S.; Wayne, Warren J.

1994-01-01

MATH77 is high-quality library of ANSI FORTRAN 77 subprograms implementing contemporary algorithms for basic computational processes of science and engineering. Release 4.0 of MATH77 contains 454 user-callable and 136 lower-level subprograms. MATH77 release 4.0 subroutine library designed to be usable on any computer system supporting full ANSI standard FORTRAN 77 language.
A Kinect-Based Sign Language Hand Gesture Recognition System for Hearing- and Speech-Impaired: A Pilot Study of Pakistani Sign Language.

PubMed

Halim, Zahid; Abbas, Ghulam

2015-01-01

Sign language provides hearing and speech impaired individuals with an interface to communicate with other members of the society. Unfortunately, sign language is not understood by most of the common people. For this, a gadget based on image processing and pattern recognition can provide with a vital aid for detecting and translating sign language into a vocal language. This work presents a system for detecting and understanding the sign language gestures by a custom built software tool and later translating the gesture into a vocal language. For the purpose of recognizing a particular gesture, the system employs a Dynamic Time Warping (DTW) algorithm and an off-the-shelf software tool is employed for vocal language generation. Microsoft(®) Kinect is the primary tool used to capture video stream of a user. The proposed method is capable of successfully detecting gestures stored in the dictionary with an accuracy of 91%. The proposed system has the ability to define and add custom made gestures. Based on an experiment in which 10 individuals with impairments used the system to communicate with 5 people with no disability, 87% agreed that the system was useful.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1989-01-01

A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. The authors focus on tensor product array computations, a simple but important class of numerical algorithms. They consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and then look at how such parallel kernels can be combined to form parallel tensor product algorithms.
Accelerated speckle imaging with the ATST visible broadband imager

NASA Astrophysics Data System (ADS)

Wöger, Friedrich; Ferayorni, Andrew

2012-09-01

The Advanced Technology Solar Telescope (ATST), a 4 meter class telescope for observations of the solar atmosphere currently in construction phase, will generate data at rates of the order of 10 TB/day with its state of the art instrumentation. The high-priority ATST Visible Broadband Imager (VBI) instrument alone will create two data streams with a bandwidth of 960 MB/s each. Because of the related data handling issues, these data will be post-processed with speckle interferometry algorithms in near-real time at the telescope using the cost-effective Graphics Processing Unit (GPU) technology that is supported by the ATST Data Handling System. In this contribution, we lay out the VBI-specific approach to its image processing pipeline, put this into the context of the underlying ATST Data Handling System infrastructure, and finally describe the details of how the algorithms were redesigned to exploit data parallelism in the speckle image reconstruction algorithms. An algorithm re-design is often required to efficiently speed up an application using GPU technology; we have chosen NVIDIA's CUDA language as basis for our implementation. We present our preliminary results of the algorithm performance using our test facilities, and base a conservative estimate on the requirements of a full system that could achieve near real-time performance at ATST on these results.
Generalising Ward's Method for Use with Manhattan Distances.

PubMed

Strauss, Trudie; von Maltitz, Michael Johan

2017-01-01

The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.
Optical Method For Monitoring Tool Control For Green Burnishing With Using Of Algorithms With Adaptive Settings

NASA Astrophysics Data System (ADS)

Lukyanov, A. A.; Grigoriev, S. N.; Bobrovskij, I. N.; Melnikov, P. A.; Bobrovskij, N. M.

2017-05-01

With regard to the complexity of the new technology and increase its reliability requirements laboriousness of control operations in industrial quality control systems increases significantly. The importance of quality management control due to the fact that its promotes the correct use of production conditions, the relevant requirements are required. Digital image processing allows to reach a new technological level of production (new technological way). The most complicated automated interpretation of information is the basis for decision-making in the management of production processes. In the case of surface analysis of tools used for processing with the using of metalworking fluids (MWF) it is more complicated. The authors suggest new algorithm for optical inspection of the wear of the cylinder tool for burnishing, which used in surface plastic deformation without using of MWF. The main advantage of proposed algorithm is the possibility of automatic recognition of images of burnisher tool with the subsequent allocation of its boundaries, finding a working surface and automatically allocating the defects and wear area. Software that implements the algorithm was developed by the authors in Matlab programming environment, but can be implemented using other programming languages.
A UMLS-based spell checker for natural language processing in vaccine safety.

PubMed

Tolentino, Herman D; Matters, Michael D; Walop, Wikke; Law, Barbara; Tong, Wesley; Liu, Fang; Fontelo, Paul; Kohl, Katrin; Payne, Daniel C

2007-02-12

The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74-75), 100% (95% CI: 100-100), and 47% (95% CI: 46%-48%), respectively. We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.
A UMLS-based spell checker for natural language processing in vaccine safety

PubMed Central

Tolentino, Herman D; Matters, Michael D; Walop, Wikke; Law, Barbara; Tong, Wesley; Liu, Fang; Fontelo, Paul; Kohl, Katrin; Payne, Daniel C

2007-01-01

Background The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. Methods We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. Results We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. Conclusion We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest. PMID:17295907
An evaluation of the NQF Quality Data Model for representing Electronic Health Record driven phenotyping algorithms.

PubMed

Thompson, William K; Rasmussen, Luke V; Pacheco, Jennifer A; Peissig, Peggy L; Denny, Joshua C; Kho, Abel N; Miller, Aaron; Pathak, Jyotishman

2012-01-01

The development of Electronic Health Record (EHR)-based phenotype selection algorithms is a non-trivial and highly iterative process involving domain experts and informaticians. To make it easier to port algorithms across institutions, it is desirable to represent them using an unambiguous formal specification language. For this purpose we evaluated the recently developed National Quality Forum (NQF) information model designed for EHR-based quality measures: the Quality Data Model (QDM). We selected 9 phenotyping algorithms that had been previously developed as part of the eMERGE consortium and translated them into QDM format. Our study concluded that the QDM contains several core elements that make it a promising format for EHR-driven phenotyping algorithms for clinical research. However, we also found areas in which the QDM could be usefully extended, such as representing information extracted from clinical text, and the ability to handle algorithms that do not consist of Boolean combinations of criteria.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1989-01-01

Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
Generating Language Activities in Real-Time for English Learners Using Language Muse

ERIC Educational Resources Information Center

Burstein, Jill; Madnani, Nitin; Sabatini, John; McCaffrey, Dan; Biggers, Kietha; Dreier, Kelsey

2017-01-01

K-12 education standards in the U.S. require all students to read complex texts across many subject areas. The "Language Muse™ Activity Palette" is a web-based language-instruction application that uses NLP algorithms and lexical resources to automatically generate language activities and support English language learners' content…
PORTABLE LISP; a list-processing interpreter. [CDC7600; PASCAL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Taylor, W.P.

The program constitutes a complete, basic LISP (LIST-Processing language) interpreter. LISP expressions are evaluated one by one with both the input expression and the resulting evaluated expression printed. Expressions are evaluated until a FIN card is encountered. Between expression evaluations a garbage-collection algorithm is invoked to recover list space used in the previous evaluation.CDC7600; PASCAL; SCOPE; The sample problem was executed in 7000 (octal) words of memory on a CDC7600.

MIQSTURE: An Experimental Online Language for Army Tactical Intelligence Information Processing

DTIC Science & Technology

1978-07-01

algorithms. The most critical component of an active information processing model for Army tactical intelligence is the user interface, which must be based on...1976)** defined some preliminary notions of an active information model centered around a data base that can introspect about its contents and...34An Introspective Data Base for an Active Information Model." OSI Technical Note N76-017, 17 November 1976 1-4 L4 beyond optimistic expectations and
The Krigifier: A Procedure for Generating Pseudorandom Nonlinear Objective Functions for Computational Experimentation

NASA Technical Reports Server (NTRS)

Trosset, Michael W.

1999-01-01

Comprehensive computational experiments to assess the performance of algorithms for numerical optimization require (among other things) a practical procedure for generating pseudorandom nonlinear objective functions. We propose a procedure that is based on the convenient fiction that objective functions are realizations of stochastic processes. This report details the calculations necessary to implement our procedure for the case of certain stationary Gaussian processes and presents a specific implementation in the statistical programming language S-PLUS.
Multiscale Simulations of Magnetic Island Coalescence

NASA Technical Reports Server (NTRS)

Dorelli, John C.

2010-01-01

We describe a new interactive parallel Adaptive Mesh Refinement (AMR) framework written in the Python programming language. This new framework, PyAMR, hides the details of parallel AMR data structures and algorithms (e.g., domain decomposition, grid partition, and inter-process communication), allowing the user to focus on the development of algorithms for advancing the solution of a systems of partial differential equations on a single uniform mesh. We demonstrate the use of PyAMR by simulating the pairwise coalescence of magnetic islands using the resistive Hall MHD equations. Techniques for coupling different physics models on different levels of the AMR grid hierarchy are discussed.
Program Helps Simulate Neural Networks

NASA Technical Reports Server (NTRS)

Villarreal, James; Mcintire, Gary

1993-01-01

Neural Network Environment on Transputer System (NNETS) computer program provides users high degree of flexibility in creating and manipulating wide variety of neural-network topologies at processing speeds not found in conventional computing environments. Supports back-propagation and back-propagation-related algorithms. Back-propagation algorithm used is implementation of Rumelhart's generalized delta rule. NNETS developed on INMOS Transputer(R). Predefines back-propagation network, Jordan network, and reinforcement network to assist users in learning and defining own networks. Also enables users to configure other neural-network paradigms from NNETS basic architecture. Small portion of software written in OCCAM(R) language.
A Vehicle Management End-to-End Testing and Analysis Platform for Validation of Mission and Fault Management Algorithms to Reduce Risk for NASA's Space Launch System

NASA Technical Reports Server (NTRS)

Trevino, Luis; Patterson, Jonathan; Teare, David; Johnson, Stephen

2015-01-01

The engineering development of the new Space Launch System (SLS) launch vehicle requires cross discipline teams with extensive knowledge of launch vehicle subsystems, information theory, and autonomous algorithms dealing with all operations from pre-launch through on orbit operations. The characteristics of these spacecraft systems must be matched with the autonomous algorithm monitoring and mitigation capabilities for accurate control and response to abnormal conditions throughout all vehicle mission flight phases, including precipitating safing actions and crew aborts. This presents a large and complex system engineering challenge, which is being addressed in part by focusing on the specific subsystems involved in the handling of off-nominal mission and fault tolerance with response management. Using traditional model based system and software engineering design principles from the Unified Modeling Language (UML) and Systems Modeling Language (SysML), the Mission and Fault Management (M&FM) algorithms for the vehicle are crafted and vetted in specialized Integrated Development Teams (IDTs) composed of multiple development disciplines such as Systems Engineering (SE), Flight Software (FSW), Safety and Mission Assurance (S&MA) and the major subsystems and vehicle elements such as Main Propulsion Systems (MPS), boosters, avionics, Guidance, Navigation, and Control (GNC), Thrust Vector Control (TVC), and liquid engines. These model based algorithms and their development lifecycle from inception through Flight Software certification are an important focus of this development effort to further insure reliable detection and response to off-nominal vehicle states during all phases of vehicle operation from pre-launch through end of flight. NASA formed a dedicated M&FM team for addressing fault management early in the development lifecycle for the SLS initiative. As part of the development of the M&FM capabilities, this team has developed a dedicated testbed that integrates specific M&FM algorithms, specialized nominal and off-nominal test cases, and vendor-supplied physics-based launch vehicle subsystem models. Additionally, the team has developed processes for implementing and validating these algorithms for concept validation and risk reduction for the SLS program. The flexibility of the Vehicle Management End-to-end Testbed (VMET) enables thorough testing of the M&FM algorithms by providing configurable suites of both nominal and off-nominal test cases to validate the developed algorithms utilizing actual subsystem models such as MPS. The intent of VMET is to validate the M&FM algorithms and substantiate them with performance baselines for each of the target vehicle subsystems in an independent platform exterior to the flight software development infrastructure and its related testing entities. In any software development process there is inherent risk in the interpretation and implementation of concepts into software through requirements and test cases into flight software compounded with potential human errors throughout the development lifecycle. Risk reduction is addressed by the M&FM analysis group working with other organizations such as S&MA, Structures and Environments, GNC, Orion, the Crew Office, Flight Operations, and Ground Operations by assessing performance of the M&FM algorithms in terms of their ability to reduce Loss of Mission and Loss of Crew probabilities. In addition, through state machine and diagnostic modeling, analysis efforts investigate a broader suite of failure effects and associated detection and responses that can be tested in VMET to ensure that failures can be detected, and confirm that responses do not create additional risks or cause undesired states through interactive dynamic effects with other algorithms and systems. VMET further contributes to risk reduction by prototyping and exercising the M&FM algorithms early in their implementation and without any inherent hindrances such as meeting FSW processor scheduling constraints due to their target platform - ARINC 653 partitioned OS, resource limitations, and other factors related to integration with other subsystems not directly involved with M&FM such as telemetry packing and processing. The baseline plan for use of VMET encompasses testing the original M&FM algorithms coded in the same C++ language and state machine architectural concepts as that used by Flight Software. This enables the development of performance standards and test cases to characterize the M&FM algorithms and sets a benchmark from which to measure the effectiveness of M&FM algorithms performance in the FSW development and test processes.
An Empirical Generative Framework for Computational Modeling of Language Acquisition

ERIC Educational Resources Information Center

Waterfall, Heidi R.; Sandbank, Ben; Onnis, Luca; Edelman, Shimon

2010-01-01

This paper reports progress in developing a computer model of language acquisition in the form of (1) a generative grammar that is (2) algorithmically learnable from realistic corpus data, (3) viable in its large-scale quantitative performance and (4) psychologically real. First, we describe new algorithmic methods for unsupervised learning of…
Algorithms used in the Airborne Lidar Processing System (ALPS)

USGS Publications Warehouse

Nagle, David B.; Wright, C. Wayne

2016-05-23

The Airborne Lidar Processing System (ALPS) analyzes Experimental Advanced Airborne Research Lidar (EAARL) data—digitized laser-return waveforms, position, and attitude data—to derive point clouds of target surfaces. A full-waveform airborne lidar system, the EAARL seamlessly and simultaneously collects mixed environment data, including submerged, sub-aerial bare earth, and vegetation-covered topographies.ALPS uses three waveform target-detection algorithms to determine target positions within a given waveform: centroid analysis, leading edge detection, and bottom detection using water-column backscatter modeling. The centroid analysis algorithm detects opaque hard surfaces. The leading edge algorithm detects topography beneath vegetation and shallow, submerged topography. The bottom detection algorithm uses water-column backscatter modeling for deeper submerged topography in turbid water.The report describes slant range calculations and explains how ALPS uses laser range and orientation measurements to project measurement points into the Universal Transverse Mercator coordinate system. Parameters used for coordinate transformations in ALPS are described, as are Interactive Data Language-based methods for gridding EAARL point cloud data to derive digital elevation models. Noise reduction in point clouds through use of a random consensus filter is explained, and detailed pseudocode, mathematical equations, and Yorick source code accompany the report.
Strategic Control Algorithm Development : Volume 3. Strategic Algorithm Report.

DOT National Transportation Integrated Search

1974-08-01

The strategic algorithm report presents a detailed description of the functional basic strategic control arrival algorithm. This description is independent of a particular computer or language. Contained in this discussion are the geometrical and env...
Decoupling function and anatomy in atlases of functional connectivity patterns: language mapping in tumor patients.

PubMed

Langs, Georg; Sweet, Andrew; Lashkari, Danial; Tie, Yanmei; Rigolo, Laura; Golby, Alexandra J; Golland, Polina

2014-12-01

In this paper we construct an atlas that summarizes functional connectivity characteristics of a cognitive process from a population of individuals. The atlas encodes functional connectivity structure in a low-dimensional embedding space that is derived from a diffusion process on a graph that represents correlations of fMRI time courses. The functional atlas is decoupled from the anatomical space, and thus can represent functional networks with variable spatial distribution in a population. In practice the atlas is represented by a common prior distribution for the embedded fMRI signals of all subjects. We derive an algorithm for fitting this generative model to the observed data in a population. Our results in a language fMRI study demonstrate that the method identifies coherent and functionally equivalent regions across subjects. The method also successfully maps functional networks from a healthy population used as a training set to individuals whose language networks are affected by tumors. Copyright © 2014. Published by Elsevier Inc.
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
scikit-image: image processing in Python.

PubMed

van der Walt, Stéfan; Schönberger, Johannes L; Nunez-Iglesias, Juan; Boulogne, François; Warner, Joshua D; Yager, Neil; Gouillart, Emmanuelle; Yu, Tony

2014-01-01

scikit-image is an image processing library that implements algorithms and utilities for use in research, education and industry applications. It is released under the liberal Modified BSD open source license, provides a well-documented API in the Python programming language, and is developed by an active, international team of collaborators. In this paper we highlight the advantages of open source to achieve the goals of the scikit-image library, and we showcase several real-world image processing applications that use scikit-image. More information can be found on the project homepage, http://scikit-image.org.
A novel speech-processing strategy incorporating tonal information for cochlear implants.

PubMed

Lan, N; Nie, K B; Gao, S K; Zeng, F G

2004-05-01

Good performance in cochlear implant users depends in large part on the ability of a speech processor to effectively decompose speech signals into multiple channels of narrow-band electrical pulses for stimulation of the auditory nerve. Speech processors that extract only envelopes of the narrow-band signals (e.g., the continuous interleaved sampling (CIS) processor) may not provide sufficient information to encode the tonal cues in languages such as Chinese. To improve the performance in cochlear implant users who speak tonal language, we proposed and developed a novel speech-processing strategy, which extracted both the envelopes of the narrow-band signals and the fundamental frequency (F0) of the speech signal, and used them to modulate both the amplitude and the frequency of the electrical pulses delivered to stimulation electrodes. We developed an algorithm to extract the fundatmental frequency and identified the general patterns of pitch variations of four typical tones in Chinese speech. The effectiveness of the extraction algorithm was verified with an artificial neural network that recognized the tonal patterns from the extracted F0 information. We then compared the novel strategy with the envelope-extraction CIS strategy in human subjects with normal hearing. The novel strategy produced significant improvement in perception of Chinese tones, phrases, and sentences. This novel processor with dynamic modulation of both frequency and amplitude is encouraging for the design of a cochlear implant device for sensorineurally deaf patients who speak tonal languages.
Improved pulse laser ranging algorithm based on high speed sampling

NASA Astrophysics Data System (ADS)

Gao, Xuan-yi; Qian, Rui-hai; Zhang, Yan-mei; Li, Huan; Guo, Hai-chao; He, Shi-jie; Guo, Xiao-kang

2016-10-01

Narrow pulse laser ranging achieves long-range target detection using laser pulse with low divergent beams. Pulse laser ranging is widely used in military, industrial, civil, engineering and transportation field. In this paper, an improved narrow pulse laser ranging algorithm is studied based on the high speed sampling. Firstly, theoretical simulation models have been built and analyzed including the laser emission and pulse laser ranging algorithm. An improved pulse ranging algorithm is developed. This new algorithm combines the matched filter algorithm and the constant fraction discrimination (CFD) algorithm. After the algorithm simulation, a laser ranging hardware system is set up to implement the improved algorithm. The laser ranging hardware system includes a laser diode, a laser detector and a high sample rate data logging circuit. Subsequently, using Verilog HDL language, the improved algorithm is implemented in the FPGA chip based on fusion of the matched filter algorithm and the CFD algorithm. Finally, the laser ranging experiment is carried out to test the improved algorithm ranging performance comparing to the matched filter algorithm and the CFD algorithm using the laser ranging hardware system. The test analysis result demonstrates that the laser ranging hardware system realized the high speed processing and high speed sampling data transmission. The algorithm analysis result presents that the improved algorithm achieves 0.3m distance ranging precision. The improved algorithm analysis result meets the expected effect, which is consistent with the theoretical simulation.
Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

PubMed

Weng, Wei-Hung; Wagholikar, Kavishwar B; McCray, Alexa T; Szolovits, Peter; Chueh, Henry C

2017-12-01

The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Our study shows that a supervised learning-based NLP approach is useful to develop medical subdomain classifiers. The deep learning algorithm with distributed word representation yields better performance yet shallow learning algorithms with the word and concept representation achieves comparable performance with better clinical interpretability. Portable classifiers may also be used across datasets from different institutions.
Embedded algorithms within an FPGA-based system to process nonlinear time series data

NASA Astrophysics Data System (ADS)

Jones, Jonathan D.; Pei, Jin-Song; Tull, Monte P.

2008-03-01

This paper presents some preliminary results of an ongoing project. A pattern classification algorithm is being developed and embedded into a Field-Programmable Gate Array (FPGA) and microprocessor-based data processing core in this project. The goal is to enable and optimize the functionality of onboard data processing of nonlinear, nonstationary data for smart wireless sensing in structural health monitoring. Compared with traditional microprocessor-based systems, fast growing FPGA technology offers a more powerful, efficient, and flexible hardware platform including on-site (field-programmable) reconfiguration capability of hardware. An existing nonlinear identification algorithm is used as the baseline in this study. The implementation within a hardware-based system is presented in this paper, detailing the design requirements, validation, tradeoffs, optimization, and challenges in embedding this algorithm. An off-the-shelf high-level abstraction tool along with the Matlab/Simulink environment is utilized to program the FPGA, rather than coding the hardware description language (HDL) manually. The implementation is validated by comparing the simulation results with those from Matlab. In particular, the Hilbert Transform is embedded into the FPGA hardware and applied to the baseline algorithm as the centerpiece in processing nonlinear time histories and extracting instantaneous features of nonstationary dynamic data. The selection of proper numerical methods for the hardware execution of the selected identification algorithm and consideration of the fixed-point representation are elaborated. Other challenges include the issues of the timing in the hardware execution cycle of the design, resource consumption, approximation accuracy, and user flexibility of input data types limited by the simplicity of this preliminary design. Future work includes making an FPGA and microprocessor operate together to embed a further developed algorithm that yields better computational and power efficiency.
Nassi-Schneiderman Diagram in HTML Based on AML

ERIC Educational Resources Information Center

Menyhárt, László

2013-01-01

In an earlier work I defined an extension of XML called Algorithm Markup Language (AML) for easy and understandable coding in an IDE which supports XML editing (e.g. NetBeans). The AML extension contains annotations and native language (English or Hungarian) tag names used when coding our algorithm. This paper presents a drawing tool with which…
Natural Language Processing for Asthma Ascertainment in Different Practice Settings.

PubMed

Wi, Chung-Il; Sohn, Sunghwan; Ali, Mir; Krusemark, Elizabeth; Ryu, Euijung; Liu, Hongfang; Juhn, Young J

We developed and validated NLP-PAC, a natural language processing (NLP) algorithm based on predetermined asthma criteria (PAC) for asthma ascertainment using electronic health records at Mayo Clinic. To adapt NLP-PAC in a different health care setting, Sanford Children Hospital, by assessing its external validity. The study was designed as a retrospective cohort study that used a random sample of 2011-2012 Sanford Birth cohort (n = 595). Manual chart review was performed on the cohort for asthma ascertainment on the basis of the PAC. We then used half of the cohort as a training cohort (n = 298) and the other half as a blind test cohort to evaluate the adapted NLP-PAC algorithm. Association of known asthma-related risk factors with the Sanford-NLP algorithm-driven asthma ascertainment was tested. Among the eligible test cohort (n = 297), 160 (53%) were males, 268 (90%) white, and the median age was 2.3 years (range, 1.5-3.1 years). NLP-PAC, after adaptation, and the human abstractor identified 74 (25%) and 72 (24%) subjects, respectively, with 66 subjects identified by both approaches. Sensitivity, specificity, positive predictive value, and negative predictive value for the NLP algorithm in predicting asthma status were 92%, 96%, 89%, and 97%, respectively. The known risk factors for asthma identified by NLP (eg, smoking history) were similar to the ones identified by manual chart review. Successful implementation of NLP-PAC for asthma ascertainment in 2 different practice settings demonstrates the feasibility of automated asthma ascertainment leveraging electronic health record data with a potential to enable large-scale, multisite asthma studies to improve asthma care and research. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Brian Hears: Online Auditory Processing Using Vectorization Over Channels

PubMed Central

Fontaine, Bertrand; Goodman, Dan F. M.; Benichoux, Victor; Brette, Romain

2011-01-01

The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in “Brian Hears,” a library for the spiking neural network simulator package “Brian.” This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations. PMID:21811453
Investigation of High-Level Synthesis tools’ applicability to data acquisition systems design based on the CMS ECAL Data Concentrator Card example

NASA Astrophysics Data System (ADS)

HUSEJKO, Michal; EVANS, John; RASTEIRO DA SILVA, Jose Carlos

2015-12-01

High-Level Synthesis (HLS) for Field-Programmable Logic Array (FPGA) programming is becoming a practical alternative to well-established VHDL and Verilog languages. This paper describes a case study in the use of HLS tools to design FPGA-based data acquisition systems (DAQ). We will present the implementation of the CERN CMS detector ECAL Data Concentrator Card (DCC) functionality in HLS and lessons learned from using HLS design flow. The DCC functionality and a definition of the initial system-level performance requirements (latency, bandwidth, and throughput) will be presented. We will describe how its packet processing control centric algorithm was implemented with VHDL and Verilog languages. We will then show how the HLS flow could speed up design-space exploration by providing loose coupling between functions interface design and functions algorithm implementation. We conclude with results of real-life hardware tests performed with the HLS flow-generated design with a DCC Tester system.
Proceedings of the second SISAL users` conference

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J T; Frerking, C; Miller, P J

1992-12-01

This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less

Fast Inference with Min-Sum Matrix Product.

PubMed

Felzenszwalb, Pedro F; McAuley, Julian J

2011-12-01

The MAP inference problem in many graphical models can be solved efficiently using a fast algorithm for computing min-sum products of n × n matrices. The class of models in question includes cyclic and skip-chain models that arise in many applications. Although the worst-case complexity of the min-sum product operation is not known to be much better than O(n(3)), an O(n(2.5)) expected time algorithm was recently given, subject to some constraints on the input matrices. In this paper, we give an algorithm that runs in O(n(2) log n) expected time, assuming that the entries in the input matrices are independent samples from a uniform distribution. We also show that two variants of our algorithm are quite fast for inputs that arise in several applications. This leads to significant performance gains over previous methods in applications within computer vision and natural language processing.
Dichotomy in the definition of prescriptive information suggests both prescribed data and prescribed algorithms: biosemiotics applications in genomic systems.

PubMed

D'Onofrio, David J; Abel, David L; Johnson, Donald E

2012-03-14

The fields of molecular biology and computer science have cooperated over recent years to create a synergy between the cybernetic and biosemiotic relationship found in cellular genomics to that of information and language found in computational systems. Biological information frequently manifests its "meaning" through instruction or actual production of formal bio-function. Such information is called prescriptive information (PI). PI programs organize and execute a prescribed set of choices. Closer examination of this term in cellular systems has led to a dichotomy in its definition suggesting both prescribed data and prescribed algorithms are constituents of PI. This paper looks at this dichotomy as expressed in both the genetic code and in the central dogma of protein synthesis. An example of a genetic algorithm is modeled after the ribosome, and an examination of the protein synthesis process is used to differentiate PI data from PI algorithms.
Inverting the parameters of an earthquake-ruptured fault with a genetic algorithm

NASA Astrophysics Data System (ADS)

Yu, Ting-To; Fernàndez, Josè; Rundle, John B.

1998-03-01

Natural selection is the spirit of the genetic algorithm (GA): by keeping the good genes in the current generation, thereby producing better offspring during evolution. The crossover function ensures the heritage of good genes from parent to offspring. Meanwhile, the process of mutation creates a special gene, the character of which does not exist in the parent generation. A program based on genetic algorithms using C language is constructed to invert the parameters of an earthquake-ruptured fault. The verification and application of this code is shown to demonstrate its capabilities. It is determined that this code is able to find the global extreme and can be used to solve more practical problems with constraints gathered from other sources. It is shown that GA is superior to other inverting schema in many aspects. This easy handling and yet powerful algorithm should have many suitable applications in the field of geosciences.
An approach in building a chemical compound search engine in oracle database.

PubMed

Wang, H; Volarath, P; Harrison, R

2005-01-01

A searching or identifying of chemical compounds is an important process in drug design and in chemistry research. An efficient search engine involves a close coupling of the search algorithm and database implementation. The database must process chemical structures, which demands the approaches to represent, store, and retrieve structures in a database system. In this paper, a general database framework for working as a chemical compound search engine in Oracle database is described. The framework is devoted to eliminate data type constrains for potential search algorithms, which is a crucial step toward building a domain specific query language on top of SQL. A search engine implementation based on the database framework is also demonstrated. The convenience of the implementation emphasizes the efficiency and simplicity of the framework.
A Vehicle Management End-to-End Testing and Analysis Platform for Validation of Mission and Fault Management Algorithms to Reduce Risk for NASA's Space Launch System

NASA Technical Reports Server (NTRS)

Trevino, Luis; Johnson, Stephen B.; Patterson, Jonathan; Teare, David

2015-01-01

The development of the Space Launch System (SLS) launch vehicle requires cross discipline teams with extensive knowledge of launch vehicle subsystems, information theory, and autonomous algorithms dealing with all operations from pre-launch through on orbit operations. The characteristics of these systems must be matched with the autonomous algorithm monitoring and mitigation capabilities for accurate control and response to abnormal conditions throughout all vehicle mission flight phases, including precipitating safing actions and crew aborts. This presents a large complex systems engineering challenge being addressed in part by focusing on the specific subsystems handling of off-nominal mission and fault tolerance. Using traditional model based system and software engineering design principles from the Unified Modeling Language (UML), the Mission and Fault Management (M&FM) algorithms are crafted and vetted in specialized Integrated Development Teams composed of multiple development disciplines. NASA also has formed an M&FM team for addressing fault management early in the development lifecycle. This team has developed a dedicated Vehicle Management End-to-End Testbed (VMET) that integrates specific M&FM algorithms, specialized nominal and off-nominal test cases, and vendor-supplied physics-based launch vehicle subsystem models. The flexibility of VMET enables thorough testing of the M&FM algorithms by providing configurable suites of both nominal and off-nominal test cases to validate the algorithms utilizing actual subsystem models. The intent is to validate the algorithms and substantiate them with performance baselines for each of the vehicle subsystems in an independent platform exterior to flight software test processes. In any software development process there is inherent risk in the interpretation and implementation of concepts into software through requirements and test processes. Risk reduction is addressed by working with other organizations such as S&MA, Structures and Environments, GNC, Orion, the Crew Office, Flight Operations, and Ground Operations by assessing performance of the M&FM algorithms in terms of their ability to reduce Loss of Mission and Loss of Crew probabilities. In addition, through state machine and diagnostic modeling, analysis efforts investigate a broader suite of failure effects and detection and responses that can be tested in VMET and confirm that responses do not create additional risks or cause undesired states through interactive dynamic effects with other algorithms and systems. VMET further contributes to risk reduction by prototyping and exercising the M&FM algorithms early in their implementation and without any inherent hindrances such as meeting FSW processor scheduling constraints due to their target platform - ARINC 653 partitioned OS, resource limitations, and other factors related to integration with other subsystems not directly involved with M&FM. The plan for VMET encompasses testing the original M&FM algorithms coded in the same C++ language and state machine architectural concepts as that used by Flight Software. This enables the development of performance standards and test cases to characterize the M&FM algorithms and sets a benchmark from which to measure the effectiveness of M&FM algorithms performance in the FSW development and test processes. This paper is outlined in a systematic fashion analogous to a lifecycle process flow for engineering development of algorithms into software and testing. Section I describes the NASA SLS M&FM context, presenting the current infrastructure, leading principles, methods, and participants. Section II defines the testing philosophy of the M&FM algorithms as related to VMET followed by section III, which presents the modeling methods of the algorithms to be tested and validated in VMET. Its details are then further presented in section IV followed by Section V presenting integration, test status, and state analysis. Finally, section VI addresses the summary and forward directions followed by the appendices presenting relevant information on terminology and documentation.
Automated Software Acceleration in Programmable Logic for an Efficient NFFT Algorithm Implementation: A Case Study.

PubMed

Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian

2017-03-28

Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation.
Automated Software Acceleration in Programmable Logic for an Efficient NFFT Algorithm Implementation: A Case Study

PubMed Central

Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian

2017-01-01

Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation. PMID:28350358
Rapid Assessment of Aircraft Structural Topologies for Multidisciplinary Optimization and Weight Estimation

NASA Technical Reports Server (NTRS)

Samareh, Jamshid A.; Sensmeier, mark D.; Stewart, Bret A.

2006-01-01

Algorithms for rapid generation of moderate-fidelity structural finite element models of air vehicle structures to allow more accurate weight estimation earlier in the vehicle design process have been developed. Application of these algorithms should help to rapidly assess many structural layouts before the start of the preliminary design phase and eliminate weight penalties imposed when actual structure weights exceed those estimated during conceptual design. By defining the structural topology in a fully parametric manner, the structure can be mapped to arbitrary vehicle configurations being considered during conceptual design optimization. Recent enhancements to this approach include the porting of the algorithms to a platform-independent software language Python, and modifications to specifically consider morphing aircraft-type configurations. Two sample cases which illustrate these recent developments are presented.
Application of a distributed systems architecture for increased speed in image processing on an autonomous ground vehicle

NASA Astrophysics Data System (ADS)

Wright, Adam A.; Momin, Orko; Shin, Young Ho; Shakya, Rahul; Nepal, Kumud; Ahlgren, David J.

2010-01-01

This paper presents the application of a distributed systems architecture to an autonomous ground vehicle, Q, that participates in both the autonomous and navigation challenges of the Intelligent Ground Vehicle Competition. In the autonomous challenge the vehicle is required to follow a course, while avoiding obstacles and staying within the course boundaries, which are marked by white lines. For the navigation challenge, the vehicle is required to reach a set of target destinations, known as way points, with given GPS coordinates and avoid obstacles that it encounters in the process. Previously the vehicle utilized a single laptop to execute all processing activities including image processing, sensor interfacing and data processing, path planning and navigation algorithms and motor control. National Instruments' (NI) LabVIEW served as the programming language for software implementation. As an upgrade to last year's design, a NI compact Reconfigurable Input/Output system (cRIO) was incorporated to the system architecture. The cRIO is NI's solution for rapid prototyping that is equipped with a real time processor, an FPGA and modular input/output. Under the current system, the real time processor handles the path planning and navigation algorithms, the FPGA gathers and processes sensor data. This setup leaves the laptop to focus on running the image processing algorithm. Image processing as previously presented by Nepal et. al. is a multi-step line extraction algorithm and constitutes the largest processor load. This distributed approach results in a faster image processing algorithm which was previously Q's bottleneck. Additionally, the path planning and navigation algorithms are executed more reliably on the real time processor due to the deterministic nature of operation. The implementation of this architecture required exploration of various inter-system communication techniques. Data transfer between the laptop and the real time processor using UDP packets was established as the most reliable protocol after testing various options. Improvement can be made to the system by migrating more algorithms to the hardware based FPGA to further speed up the operations of the vehicle.
X-Graphs: Language and Algorithms for Heterogeneous Graph Streams

DTIC Science & Technology

2017-09-01

INTRODUCTION 1 3 METHODS , ASUMPTIONS, AND PROCEDURES 2 Software Abstractions for Graph Analytic Applications 2 High performance Platforms for Graph Processing...data is stored in a distributed file system. 3 METHODS , ASUMPTIONS, AND PROCEDURES Software Abstractions for Graph Analytic Applications To...implementations of novel methods for networks analysis: several methods for detection of overlapping communities, personalized PageRank, node embeddings into a d
Collaborative Filtering for Expansion of Learner's Background Knowledge in Online Language Learning: Does "Top-Down" Processing Improve Vocabulary Proficiency?

ERIC Educational Resources Information Center

Yamada, Masanori; Kitamura, Satoshi; Matsukawa, Hideya; Misono, Tadashi; Kitani, Noriko; Yamauchi, Yuhei

2014-01-01

In recent years, collaborative filtering, a recommendation algorithm that incorporates a user's data such as interest, has received worldwide attention as an advanced learning support system. However, accurate recommendations along with a user's interest cannot be ideal as an effective learning environment. This study aims to develop and…
The MITLL NIST LRE 2015 Language Recognition System

DTIC Science & Technology

2016-05-06

The MITLL NIST LRE 2015 Language Recognition System Pedro Torres-Carrasquillo, Najim Dehak*, Elizabeth Godoy, Douglas Reynolds, Fred Richardson...most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission...Task The National Institute of Science and Technology ( NIST ) has conducted formal evaluations of language detection algorithms since 1994. In
The MITLL NIST LRE 2015 Language Recognition system

DTIC Science & Technology

2016-02-05

The MITLL NIST LRE 2015 Language Recognition System Pedro Torres-Carrasquillo, Najim Dehak*, Elizabeth Godoy, Douglas Reynolds, Fred Richardson...recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a...National Institute of Science and Technology ( NIST ) has conducted formal evaluations of language detection algorithms since 1994. In previous
Multiprocessor and memory architecture of the neurocomputer SYNAPSE-1.

PubMed

Ramacher, U; Raab, W; Anlauf, J; Hachmann, U; Beichter, J; Brüls, N; Wesseling, M; Sicheneder, E; Männer, R; Glass, J

1993-12-01

A general purpose neurocomputer, SYNAPSE-1, which exhibits a multiprocessor and memory architecture is presented. It offers wide flexibility with respect to neural algorithms and a speed-up factor of several orders of magnitude--including learning. The computational power is provided by a 2-dimensional systolic array of neural signal processors. Since the weights are stored outside these NSPs, memory size and processing power can be adapted individually to the application needs. A neural algorithms programming language, embedded in C(+2) has been defined for the user to cope with the neurocomputer. In a benchmark test, the prototype of SYNAPSE-1 was 8000 times as fast as a standard workstation.
On non-negative matrix factorization algorithms for signal-dependent noise with application to electromyography data

PubMed Central

Devarajan, Karthik; Cheung, Vincent C.K.

2017-01-01

Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two nonnegative matrices, W and H where V ~ WH. It has been successfully applied in the analysis and interpretation of large-scale data arising in neuroscience, computational biology and natural language processing, among other areas. A distinctive feature of NMF is its nonnegativity constraints that allow only additive linear combinations of the data, thus enabling it to learn parts that have distinct physical representations in reality. In this paper, we describe an information-theoretic approach to NMF for signal-dependent noise based on the generalized inverse Gaussian model. Specifically, we propose three novel algorithms in this setting, each based on multiplicative updates and prove monotonicity of updates using the EM algorithm. In addition, we develop algorithm-specific measures to evaluate their goodness-of-fit on data. Our methods are demonstrated using experimental data from electromyography studies as well as simulated data in the extraction of muscle synergies, and compared with existing algorithms for signal-dependent noise. PMID:24684448
Modular Chemical Descriptor Language (MCDL): Stereochemical modules

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gakh, Andrei A; Burnett, Michael N; Trepalin, Sergei V.

2011-01-01

In our previous papers we introduced the Modular Chemical Descriptor Language (MCDL) for providing a linear representation of chemical information. A subsequent development was the MCDL Java Chemical Structure Editor which is capable of drawing chemical structures from linear representations and generating MCDL descriptors from structures. In this paper we present MCDL modules and accompanying software that incorporate unique representation of molecular stereochemistry based on Cahn-Ingold-Prelog and Fischer ideas in constructing stereoisomer descriptors. The paper also contains additional discussions regarding canonical representation of stereochemical isomers, and brief algorithm descriptions of the open source LINDES, Java applet, and Open Babel MCDLmore » processing module software packages. Testing of the upgraded MCDL Java Chemical Structure Editor on compounds taken from several large and diverse chemical databases demonstrated satisfactory performance for storage and processing of stereochemical information in MCDL format.« less
NLP based congestive heart failure case finding: A prospective analysis on statewide electronic medical records.

PubMed

Wang, Yue; Luo, Jin; Hao, Shiying; Xu, Haihua; Shin, Andrew Young; Jin, Bo; Liu, Rui; Deng, Xiaohong; Wang, Lijuan; Zheng, Le; Zhao, Yifan; Zhu, Chunqing; Hu, Zhongkai; Fu, Changlin; Hao, Yanpeng; Zhao, Yingzhen; Jiang, Yunliang; Dai, Dorothy; Culver, Devore S; Alfreds, Shaun T; Todd, Rogow; Stearns, Frank; Sylvester, Karl G; Widen, Eric; Ling, Xuefeng B

2015-12-01

In order to proactively manage congestive heart failure (CHF) patients, an effective CHF case finding algorithm is required to process both structured and unstructured electronic medical records (EMR) to allow complementary and cost-efficient identification of CHF patients. We set to identify CHF cases from both EMR codified and natural language processing (NLP) found cases. Using narrative clinical notes from all Maine Health Information Exchange (HIE) patients, the NLP case finding algorithm was retrospectively (July 1, 2012-June 30, 2013) developed with a random subset of HIE associated facilities, and blind-tested with the remaining facilities. The NLP based method was integrated into a live HIE population exploration system and validated prospectively (July 1, 2013-June 30, 2014). Total of 18,295 codified CHF patients were included in Maine HIE. Among the 253,803 subjects without CHF codings, our case finding algorithm prospectively identified 2411 uncodified CHF cases. The positive predictive value (PPV) is 0.914, and 70.1% of these 2411 cases were found to be with CHF histories in the clinical notes. A CHF case finding algorithm was developed, tested and prospectively validated. The successful integration of the CHF case findings algorithm into the Maine HIE live system is expected to improve the Maine CHF care. Copyright © 2015. Published by Elsevier Ireland Ltd.
scikit-image: image processing in Python

PubMed Central

Schönberger, Johannes L.; Nunez-Iglesias, Juan; Boulogne, François; Warner, Joshua D.; Yager, Neil; Gouillart, Emmanuelle; Yu, Tony

2014-01-01

scikit-image is an image processing library that implements algorithms and utilities for use in research, education and industry applications. It is released under the liberal Modified BSD open source license, provides a well-documented API in the Python programming language, and is developed by an active, international team of collaborators. In this paper we highlight the advantages of open source to achieve the goals of the scikit-image library, and we showcase several real-world image processing applications that use scikit-image. More information can be found on the project homepage, http://scikit-image.org. PMID:25024921
A Program That Acquires Language Using Positive and Negative Feedback.

ERIC Educational Resources Information Center

Brand, James

1987-01-01

Describes the language learning program "Acquire," which is a sample of grammar induction. It is a learning algorithm based on a pattern-matching scheme, using both a positive and negative network to reduce overgeneration. Language learning programs may be useful as tutorials for learning the syntax of a foreign language. (Author/LMO)
A Vehicle Management End-to-End Testing and Analysis Platform for Validation of Mission and Fault Management Algorithms to Reduce Risk for NASAs Space Launch System

NASA Technical Reports Server (NTRS)

Trevino, Luis; Johnson, Stephen B.; Patterson, Jonathan; Teare, David

2015-01-01

The engineering development of the National Aeronautics and Space Administration's (NASA) new Space Launch System (SLS) requires cross discipline teams with extensive knowledge of launch vehicle subsystems, information theory, and autonomous algorithms dealing with all operations from pre-launch through on orbit operations. The nominal and off-nominal characteristics of SLS's elements and subsystems must be understood and matched with the autonomous algorithm monitoring and mitigation capabilities for accurate control and response to abnormal conditions throughout all vehicle mission flight phases, including precipitating safing actions and crew aborts. This presents a large and complex systems engineering challenge, which is being addressed in part by focusing on the specific subsystems involved in the handling of off-nominal mission and fault tolerance with response management. Using traditional model-based system and software engineering design principles from the Unified Modeling Language (UML) and Systems Modeling Language (SysML), the Mission and Fault Management (M&FM) algorithms for the vehicle are crafted and vetted in Integrated Development Teams (IDTs) composed of multiple development disciplines such as Systems Engineering (SE), Flight Software (FSW), Safety and Mission Assurance (S&MA) and the major subsystems and vehicle elements such as Main Propulsion Systems (MPS), boosters, avionics, Guidance, Navigation, and Control (GNC), Thrust Vector Control (TVC), and liquid engines. These model-based algorithms and their development lifecycle from inception through FSW certification are an important focus of SLS's development effort to further ensure reliable detection and response to off-nominal vehicle states during all phases of vehicle operation from pre-launch through end of flight. To test and validate these M&FM algorithms a dedicated test-bed was developed for full Vehicle Management End-to-End Testing (VMET). For addressing fault management (FM) early in the development lifecycle for the SLS program, NASA formed the M&FM team as part of the Integrated Systems Health Management and Automation Branch under the Spacecraft Vehicle Systems Department at the Marshall Space Flight Center (MSFC). To support the development of the FM algorithms, the VMET developed by the M&FM team provides the ability to integrate the algorithms, perform test cases, and integrate vendor-supplied physics-based launch vehicle (LV) subsystem models. Additionally, the team has developed processes for implementing and validating the M&FM algorithms for concept validation and risk reduction. The flexibility of the VMET capabilities enables thorough testing of the M&FM algorithms by providing configurable suites of both nominal and off-nominal test cases to validate the developed algorithms utilizing actual subsystem models such as MPS, GNC, and others. One of the principal functions of VMET is to validate the M&FM algorithms and substantiate them with performance baselines for each of the target vehicle subsystems in an independent platform exterior to the flight software test and validation processes. In any software development process there is inherent risk in the interpretation and implementation of concepts from requirements and test cases into flight software compounded with potential human errors throughout the development and regression testing lifecycle. Risk reduction is addressed by the M&FM group but in particular by the Analysis Team working with other organizations such as S&MA, Structures and Environments, GNC, Orion, Crew Office, Flight Operations, and Ground Operations by assessing performance of the M&FM algorithms in terms of their ability to reduce Loss of Mission (LOM) and Loss of Crew (LOC) probabilities. In addition, through state machine and diagnostic modeling, analysis efforts investigate a broader suite of failure effects and associated detection and responses to be tested in VMET to ensure reliable failure detection, and confirm responses do not create additional risks or cause undesired states through interactive dynamic effects with other algorithms and systems. VMET further contributes to risk reduction by prototyping and exercising the M&FM algorithms early in their implementation and without any inherent hindrances such as meeting FSW processor scheduling constraints due to their target platform - the ARINC 6535-partitioned Operating System, resource limitations, and other factors related to integration with other subsystems not directly involved with M&FM such as telemetry packing and processing. The baseline plan for use of VMET encompasses testing the original M&FM algorithms coded in the same C++ language and state machine architectural concepts as that used by FSW. This enables the development of performance standards and test cases to characterize the M&FM algorithms and sets a benchmark from which to measure their effectiveness and performance in the exterior FSW development and test processes. This paper is outlined in a systematic fashion analogous to a lifecycle process flow for engineering development of algorithms into software and testing. Section I describes the NASA SLS M&FM context, presenting the current infrastructure, leading principles, methods, and participants. Section II defines the testing philosophy of the M&FM algorithms as related to VMET followed by section III, which presents the modeling methods of the algorithms to be tested and validated in VMET. Its details are then further presented in section IV followed by Section V presenting integration, test status, and state analysis. Finally, section VI addresses the summary and forward directions followed by the appendices presenting relevant information on terminology and documentation.

Enhancing Web applications in radiology with Java: estimating MR imaging relaxation times.

PubMed

Dagher, A P; Fitzpatrick, M; Flanders, A E; Eng, J

1998-01-01

Java is a relatively new programming language that has been used to develop a World Wide Web-based tool for estimating magnetic resonance (MR) imaging relaxation times, thereby demonstrating how Java may be used for Web-based radiology applications beyond improving the user interface of teaching files. A standard processing algorithm coded with Java is downloaded along with the hypertext markup language (HTML) document. The user (client) selects the desired pulse sequence and inputs data obtained from a region of interest on the MR images. The algorithm is used to modify selected MR imaging parameters in an equation that models the phenomenon being evaluated. MR imaging relaxation times are estimated, and confidence intervals and a P value expressing the accuracy of the final results are calculated. Design features such as simplicity, object-oriented programming, and security restrictions allow Java to expand the capabilities of HTML by offering a more versatile user interface that includes dynamic annotations and graphics. Java also allows the client to perform more sophisticated information processing and computation than is usually associated with Web applications. Java is likely to become a standard programming option, and the development of stand-alone Java applications may become more common as Java is integrated into future versions of computer operating systems.
Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning

PubMed Central

Cohen, Kevin Bretonnel; Glass, Benjamin; Greiner, Hansel M.; Holland-Bouley, Katherine; Standridge, Shannon; Arya, Ravindra; Faist, Robert; Morita, Diego; Mangano, Francesco; Connolly, Brian; Glauser, Tracy; Pestian, John

2016-01-01

Objective: We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient’s status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral. PMID:27257386
Semantically-based priors and nuanced knowledge core for Big Data, Social AI, and language understanding.

PubMed

Olsher, Daniel

2014-10-01

Noise-resistant and nuanced, COGBASE makes 10 million pieces of commonsense data and a host of novel reasoning algorithms available via a family of semantically-driven prior probability distributions. Machine learning, Big Data, natural language understanding/processing, and social AI can draw on COGBASE to determine lexical semantics, infer goals and interests, simulate emotion and affect, calculate document gists and topic models, and link commonsense knowledge to domain models and social, spatial, cultural, and psychological data. COGBASE is especially ideal for social Big Data, which tends to involve highly implicit contexts, cognitive artifacts, difficult-to-parse texts, and deep domain knowledge dependencies. Copyright © 2014 Elsevier Ltd. All rights reserved.
On the Accuracy of Language Trees

PubMed Central

Pompei, Simone; Loreto, Vittorio; Tria, Francesca

2011-01-01

Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it. PMID:21674034
Teaching computer interfacing with virtual instruments in an object-oriented language.

PubMed Central

Gulotta, M

1995-01-01

LabVIEW is a graphic object-oriented computer language developed to facilitate hardware/software communication. LabVIEW is a complete computer language that can be used like Basic, FORTRAN, or C. In LabVIEW one creates virtual instruments that aesthetically look like real instruments but are controlled by sophisticated computer programs. There are several levels of data acquisition VIs that make it easy to control data flow, and many signal processing and analysis algorithms come with the software as premade VIs. In the classroom, the similarity between virtual and real instruments helps students understand how information is passed between the computer and attached instruments. The software may be used in the absence of hardware so that students can work at home as well as in the classroom. This article demonstrates how LabVIEW can be used to control data flow between computers and instruments, points out important features for signal processing and analysis, and shows how virtual instruments may be used in place of physical instrumentation. Applications of LabVIEW to the teaching laboratory are also discussed, and a plausible course outline is given. PMID:8580361
Teaching computer interfacing with virtual instruments in an object-oriented language.

PubMed

Gulotta, M

1995-11-01

LabVIEW is a graphic object-oriented computer language developed to facilitate hardware/software communication. LabVIEW is a complete computer language that can be used like Basic, FORTRAN, or C. In LabVIEW one creates virtual instruments that aesthetically look like real instruments but are controlled by sophisticated computer programs. There are several levels of data acquisition VIs that make it easy to control data flow, and many signal processing and analysis algorithms come with the software as premade VIs. In the classroom, the similarity between virtual and real instruments helps students understand how information is passed between the computer and attached instruments. The software may be used in the absence of hardware so that students can work at home as well as in the classroom. This article demonstrates how LabVIEW can be used to control data flow between computers and instruments, points out important features for signal processing and analysis, and shows how virtual instruments may be used in place of physical instrumentation. Applications of LabVIEW to the teaching laboratory are also discussed, and a plausible course outline is given.
Polyglot Programming in Applications Used for Genetic Data Analysis

PubMed Central

Nowak, Robert M.

2014-01-01

Applications used for the analysis of genetic data process large volumes of data with complex algorithms. High performance, flexibility, and a user interface with a web browser are required by these solutions, which can be achieved by using multiple programming languages. In this study, I developed a freely available framework for building software to analyze genetic data, which uses C++, Python, JavaScript, and several libraries. This system was used to build a number of genetic data processing applications and it reduced the time and costs of development. PMID:25197633
Polyglot programming in applications used for genetic data analysis.

PubMed

Nowak, Robert M

2014-01-01

Applications used for the analysis of genetic data process large volumes of data with complex algorithms. High performance, flexibility, and a user interface with a web browser are required by these solutions, which can be achieved by using multiple programming languages. In this study, I developed a freely available framework for building software to analyze genetic data, which uses C++, Python, JavaScript, and several libraries. This system was used to build a number of genetic data processing applications and it reduced the time and costs of development.
Object-oriented software for evaluating measurement uncertainty

NASA Astrophysics Data System (ADS)

Hall, B. D.

2013-05-01

An earlier publication (Hall 2006 Metrologia 43 L56-61) introduced the notion of an uncertain number that can be used in data processing to represent quantity estimates with associated uncertainty. The approach can be automated, allowing data processing algorithms to be decomposed into convenient steps, so that complicated measurement procedures can be handled. This paper illustrates the uncertain-number approach using several simple measurement scenarios and two different software tools. One is an extension library for Microsoft Excel®. The other is a special-purpose calculator using the Python programming language.
Automatic classification of schizophrenia using resting-state functional language network via an adaptive learning algorithm

NASA Astrophysics Data System (ADS)

Zhu, Maohu; Jie, Nanfeng; Jiang, Tianzi

2014-03-01

A reliable and precise classification of schizophrenia is significant for its diagnosis and treatment of schizophrenia. Functional magnetic resonance imaging (fMRI) is a novel tool increasingly used in schizophrenia research. Recent advances in statistical learning theory have led to applying pattern classification algorithms to access the diagnostic value of functional brain networks, discovered from resting state fMRI data. The aim of this study was to propose an adaptive learning algorithm to distinguish schizophrenia patients from normal controls using resting-state functional language network. Furthermore, here the classification of schizophrenia was regarded as a sample selection problem where a sparse subset of samples was chosen from the labeled training set. Using these selected samples, which we call informative vectors, a classifier for the clinic diagnosis of schizophrenia was established. We experimentally demonstrated that the proposed algorithm incorporating resting-state functional language network achieved 83.6% leaveone- out accuracy on resting-state fMRI data of 27 schizophrenia patients and 28 normal controls. In contrast with KNearest- Neighbor (KNN), Support Vector Machine (SVM) and l1-norm, our method yielded better classification performance. Moreover, our results suggested that a dysfunction of resting-state functional language network plays an important role in the clinic diagnosis of schizophrenia.
Using Machine Learning and Natural Language Processing Algorithms to Automate the Evaluation of Clinical Decision Support in Electronic Medical Record Systems.

PubMed

Szlosek, Donald A; Ferrett, Jonathan

2016-01-01

As the number of clinical decision support systems (CDSSs) incorporated into electronic medical records (EMRs) increases, so does the need to evaluate their effectiveness. The use of medical record review and similar manual methods for evaluating decision rules is laborious and inefficient. The authors use machine learning and Natural Language Processing (NLP) algorithms to accurately evaluate a clinical decision support rule through an EMR system, and they compare it against manual evaluation. Modeled after the EMR system EPIC at Maine Medical Center, we developed a dummy data set containing physician notes in free text for 3,621 artificial patients records undergoing a head computed tomography (CT) scan for mild traumatic brain injury after the incorporation of an electronic best practice approach. We validated the accuracy of the Best Practice Advisories (BPA) using three machine learning algorithms-C-Support Vector Classification (SVC), Decision Tree Classifier (DecisionTreeClassifier), k-nearest neighbors classifier (KNeighborsClassifier)-by comparing their accuracy for adjudicating the occurrence of a mild traumatic brain injury against manual review. We then used the best of the three algorithms to evaluate the effectiveness of the BPA, and we compared the algorithm's evaluation of the BPA to that of manual review. The electronic best practice approach was found to have a sensitivity of 98.8 percent (96.83-100.0), specificity of 10.3 percent, PPV = 7.3 percent, and NPV = 99.2 percent when reviewed manually by abstractors. Though all the machine learning algorithms were observed to have a high level of prediction, the SVC displayed the highest with a sensitivity 93.33 percent (92.49-98.84), specificity of 97.62 percent (96.53-98.38), PPV = 50.00, NPV = 99.83. The SVC algorithm was observed to have a sensitivity of 97.9 percent (94.7-99.86), specificity 10.30 percent, PPV 7.25 percent, and NPV 99.2 percent for evaluating the best practice approach, after accounting for 17 cases (0.66 percent) where the patient records had to be reviewed manually due to the NPL systems inability to capture the proper diagnosis. CDSSs incorporated into EMRs can be evaluated in an automatic fashion by using NLP and machine learning techniques.
Using background knowledge for picture organization and retrieval

NASA Astrophysics Data System (ADS)

Quintana, Yuri

1997-01-01

A picture knowledge base management system is described that is used to represent, organize and retrieve pictures from a frame knowledge base. Experiments with human test subjects were conducted to obtain further descriptions of pictures from news magazines. These descriptions were used to represent the semantic content of pictures in frame representations. A conceptual clustering algorithm is described which organizes pictures not only on the observable features, but also on implicit properties derived from the frame representations. The algorithm uses inheritance reasoning to take into account background knowledge in the clustering. The algorithm creates clusters of pictures using a group similarity function that is based on the gestalt theory of picture perception. For each cluster created, a frame is generated which describes the semantic content of pictures in the cluster. Clustering and retrieval experiments were conducted with and without background knowledge. The paper shows how the use of background knowledge and semantic similarity heuristics improves the speed, precision, and recall of queries processed. The paper concludes with a discussion of how natural language processing of can be used to assist in the development of knowledge bases and the processing of user queries.
Automatic theory generation from analyst text files using coherence networks

NASA Astrophysics Data System (ADS)

Shaffer, Steven C.

2014-05-01

This paper describes a three-phase process of extracting knowledge from analyst textual reports. Phase 1 involves performing natural language processing on the source text to extract subject-predicate-object triples. In phase 2, these triples are then fed into a coherence network analysis process, using a genetic algorithm optimization. Finally, the highest-value sub networks are processed into a semantic network graph for display. Initial work on a well- known data set (a Wikipedia article on Abraham Lincoln) has shown excellent results without any specific tuning. Next, we ran the process on the SYNthetic Counter-INsurgency (SYNCOIN) data set, developed at Penn State, yielding interesting and potentially useful results.
Machining fixture layout optimization using particle swarm optimization algorithm

NASA Astrophysics Data System (ADS)

Dou, Jianping; Wang, Xingsong; Wang, Lei

2011-05-01

Optimization of fixture layout (locator and clamp locations) is critical to reduce geometric error of the workpiece during machining process. In this paper, the application of particle swarm optimization (PSO) algorithm is presented to minimize the workpiece deformation in the machining region. A PSO based approach is developed to optimize fixture layout through integrating ANSYS parametric design language (APDL) of finite element analysis to compute the objective function for a given fixture layout. Particle library approach is used to decrease the total computation time. The computational experiment of 2D case shows that the numbers of function evaluations are decreased about 96%. Case study illustrates the effectiveness and efficiency of the PSO based optimization approach.
SBML-SAT: a systems biology markup language (SBML) based sensitivity analysis tool

PubMed Central

Zi, Zhike; Zheng, Yanan; Rundell, Ann E; Klipp, Edda

2008-01-01

Background It has long been recognized that sensitivity analysis plays a key role in modeling and analyzing cellular and biochemical processes. Systems biology markup language (SBML) has become a well-known platform for coding and sharing mathematical models of such processes. However, current SBML compatible software tools are limited in their ability to perform global sensitivity analyses of these models. Results This work introduces a freely downloadable, software package, SBML-SAT, which implements algorithms for simulation, steady state analysis, robustness analysis and local and global sensitivity analysis for SBML models. This software tool extends current capabilities through its execution of global sensitivity analyses using multi-parametric sensitivity analysis, partial rank correlation coefficient, SOBOL's method, and weighted average of local sensitivity analyses in addition to its ability to handle systems with discontinuous events and intuitive graphical user interface. Conclusion SBML-SAT provides the community of systems biologists a new tool for the analysis of their SBML models of biochemical and cellular processes. PMID:18706080
SBML-SAT: a systems biology markup language (SBML) based sensitivity analysis tool.

PubMed

Zi, Zhike; Zheng, Yanan; Rundell, Ann E; Klipp, Edda

2008-08-15

It has long been recognized that sensitivity analysis plays a key role in modeling and analyzing cellular and biochemical processes. Systems biology markup language (SBML) has become a well-known platform for coding and sharing mathematical models of such processes. However, current SBML compatible software tools are limited in their ability to perform global sensitivity analyses of these models. This work introduces a freely downloadable, software package, SBML-SAT, which implements algorithms for simulation, steady state analysis, robustness analysis and local and global sensitivity analysis for SBML models. This software tool extends current capabilities through its execution of global sensitivity analyses using multi-parametric sensitivity analysis, partial rank correlation coefficient, SOBOL's method, and weighted average of local sensitivity analyses in addition to its ability to handle systems with discontinuous events and intuitive graphical user interface. SBML-SAT provides the community of systems biologists a new tool for the analysis of their SBML models of biochemical and cellular processes.
Automated Assessment of Patients' Self-Narratives for Posttraumatic Stress Disorder Screening Using Natural Language Processing and Text Mining.

PubMed

He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo

2017-03-01

Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.
Behind the scenes: A medical natural language processing project.

PubMed

Wu, Joy T; Dernoncourt, Franck; Gehrmann, Sebastian; Tyler, Patrick D; Moseley, Edward T; Carlson, Eric T; Grant, David W; Li, Yeran; Welt, Jonathan; Celi, Leo Anthony

2018-04-01

Advancement of Artificial Intelligence (AI) capabilities in medicine can help address many pressing problems in healthcare. However, AI research endeavors in healthcare may not be clinically relevant, may have unrealistic expectations, or may not be explicit enough about their limitations. A diverse and well-functioning multidisciplinary team (MDT) can help identify appropriate and achievable AI research agendas in healthcare, and advance medical AI technologies by developing AI algorithms as well as addressing the shortage of appropriately labeled datasets for machine learning. In this paper, our team of engineers, clinicians and machine learning experts share their experience and lessons learned from their two-year-long collaboration on a natural language processing (NLP) research project. We highlight specific challenges encountered in cross-disciplinary teamwork, dataset creation for NLP research, and expectation setting for current medical AI technologies. Copyright © 2017. Published by Elsevier B.V.
Use of graph algorithms in the processing and analysis of images with focus on the biomedical data.

PubMed

Zdimalova, M; Roznovjak, R; Weismann, P; El Falougy, H; Kubikova, E

2017-01-01

Image segmentation is a known problem in the field of image processing. A great number of methods based on different approaches to this issue was created. One of these approaches utilizes the findings of the graph theory. Our work focuses on segmentation using shortest paths in a graph. Specifically, we deal with methods of "Intelligent Scissors," which use Dijkstra's algorithm to find the shortest paths. We created a new software in Microsoft Visual Studio 2013 integrated development environment Visual C++ in the language C++/CLI. We created a format application with a graphical users development environment for system Windows, with using the platform .Net (version 4.5). The program was used for handling and processing the original medical data. The major disadvantage of the method of "Intelligent Scissors" is the computational time length of Dijkstra's algorithm. However, after the implementation of a more efficient priority queue, this problem could be alleviated. The main advantage of this method we see in training that enables to adapt to a particular kind of edge, which we need to segment. The user involvement has a significant influence on the process of segmentation, which enormously aids to achieve high-quality results (Fig. 7, Ref. 13).
Optimizing DNA assembly based on statistical language modelling.

PubMed

Fang, Gang; Zhang, Shemin; Dong, Yafei

2017-12-15

By successively assembling genetic parts such as BioBrick according to grammatical models, complex genetic constructs composed of dozens of functional blocks can be built. However, usually every category of genetic parts includes a few or many parts. With increasing quantity of genetic parts, the process of assembling more than a few sets of these parts can be expensive, time consuming and error prone. At the last step of assembling it is somewhat difficult to decide which part should be selected. Based on statistical language model, which is a probability distribution P(s) over strings S that attempts to reflect how frequently a string S occurs as a sentence, the most commonly used parts will be selected. Then, a dynamic programming algorithm was designed to figure out the solution of maximum probability. The algorithm optimizes the results of a genetic design based on a grammatical model and finds an optimal solution. In this way, redundant operations can be reduced and the time and cost required for conducting biological experiments can be minimized. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

PubMed

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Flowcharting with D-charts

NASA Technical Reports Server (NTRS)

Meyer, D.

1985-01-01

A D-Chart is a style of flowchart using control symbols highly appropriate to modern structured programming languages. The intent of a D-Chart is to provide a clear and concise one-for-one mapping of control symbols to high-level language constructs for purposes of design and documentation. The notation lends itself to both high-level and code-level algorithmic description. The various issues that may arise when representing, in D-Chart style, algorithms expressed in the more popular high-level languages are addressed. In particular, the peculiarities of mapping control constructs for Ada, PASCAL, FORTRAN 77, C, PL/I, Jovial J73, HAL/S, and Algol are discussed.
Flowcharting with D-charts

NASA Technical Reports Server (NTRS)

Meyer, D. D.

1985-01-01

A D-Chart is a style of flowchart using control symbols highly appropriate to modern structured programming languages. The intent of a D-Chart is to provide a clear and concise one-for-one mapping of control symbols to high-level language constructs for purposes of design and documentation. The notation lends itself to both high-level and code-level algorithmic description. The various issues that may arise when representing, in D-Chart style, algorithms expressed in the more popular high-level languages are addressed. In particular, the peculiarities of mapping control constructs for Ada, PASCAL, FORTRAN 77, C, PL/I, Joviai J73, HAL/S, and Algol are discussed.
An Expressive and Efficient Language for XML Information Retrieval.

ERIC Educational Resources Information Center

Chinenyanga, Taurai Tapiwa; Kushmerick, Nicholas

2002-01-01

Discusses XML and information retrieval and describes a query language, ELIXIR (expressive and efficient language for XML information retrieval), with a textual similarity operator that can be used for similarity joins. Explains the algorithm for answering ELIXIR queries to generate intermediate relational data. (Author/LRW)
C Language Integrated Production System, Ada Version

NASA Technical Reports Server (NTRS)

Culbert, Chris; Riley, Gary; Savely, Robert T.; Melebeck, Clovis J.; White, Wesley A.; Mcgregor, Terry L.; Ferguson, Melisa; Razavipour, Reza

1992-01-01

CLIPS/Ada provides capabilities of CLIPS v4.3 but uses Ada as source language for CLIPS executable code. Implements forward-chaining rule-based language. Program contains inference engine and language syntax providing framework for construction of expert-system program. Also includes features for debugging application program. Based on Rete algorithm which provides efficient method for performing repeated matching of patterns. Written in Ada.
Modelling and simulating reaction-diffusion systems using coloured Petri nets.

PubMed

Liu, Fei; Blätke, Mary-Ann; Heiner, Monika; Yang, Ming

2014-10-01

Reaction-diffusion systems often play an important role in systems biology when developmental processes are involved. Traditional methods of modelling and simulating such systems require substantial prior knowledge of mathematics and/or simulation algorithms. Such skills may impose a challenge for biologists, when they are not equally well-trained in mathematics and computer science. Coloured Petri nets as a high-level and graphical language offer an attractive alternative, which is easily approachable. In this paper, we investigate a coloured Petri net framework integrating deterministic, stochastic and hybrid modelling formalisms and corresponding simulation algorithms for the modelling and simulation of reaction-diffusion processes that may be closely coupled with signalling pathways, metabolic reactions and/or gene expression. Such systems often manifest multiscaleness in time, space and/or concentration. We introduce our approach by means of some basic diffusion scenarios, and test it against an established case study, the Brusselator model. Copyright © 2014 Elsevier Ltd. All rights reserved.
Evaluation of a parallel implementation of the learning portion of the backward error propagation neural network: experiments in artifact identification.

PubMed Central

Sittig, D. F.; Orr, J. A.

1991-01-01

Various methods have been proposed in an attempt to solve problems in artifact and/or alarm identification including expert systems, statistical signal processing techniques, and artificial neural networks (ANN). ANNs consist of a large number of simple processing units connected by weighted links. To develop truly robust ANNs, investigators are required to train their networks on huge training data sets, requiring enormous computing power. We implemented a parallel version of the backward error propagation neural network training algorithm in the widely portable parallel programming language C-Linda. A maximum speedup of 4.06 was obtained with six processors. This speedup represents a reduction in total run-time from approximately 6.4 hours to 1.5 hours. We conclude that use of the master-worker model of parallel computation is an excellent method for obtaining speedups in the backward error propagation neural network training algorithm. PMID:1807607
SlideJ: An ImageJ plugin for automated processing of whole slide images.

PubMed

Della Mea, Vincenzo; Baroni, Giulia L; Pilutti, David; Di Loreto, Carla

2017-01-01

The digital slide, or Whole Slide Image, is a digital image, acquired with specific scanners, that represents a complete tissue sample or cytological specimen at microscopic level. While Whole Slide image analysis is recognized among the most interesting opportunities, the typical size of such images-up to Gpixels- can be very demanding in terms of memory requirements. Thus, while algorithms and tools for processing and analysis of single microscopic field images are available, Whole Slide images size makes the direct use of such tools prohibitive or impossible. In this work a plugin for ImageJ, named SlideJ, is proposed with the objective to seamlessly extend the application of image analysis algorithms implemented in ImageJ for single microscopic field images to a whole digital slide analysis. The plugin has been complemented by examples of macro in the ImageJ scripting language to demonstrate its use in concrete situations.
SlideJ: An ImageJ plugin for automated processing of whole slide images

PubMed Central

Baroni, Giulia L.; Pilutti, David; Di Loreto, Carla

2017-01-01

The digital slide, or Whole Slide Image, is a digital image, acquired with specific scanners, that represents a complete tissue sample or cytological specimen at microscopic level. While Whole Slide image analysis is recognized among the most interesting opportunities, the typical size of such images—up to Gpixels- can be very demanding in terms of memory requirements. Thus, while algorithms and tools for processing and analysis of single microscopic field images are available, Whole Slide images size makes the direct use of such tools prohibitive or impossible. In this work a plugin for ImageJ, named SlideJ, is proposed with the objective to seamlessly extend the application of image analysis algorithms implemented in ImageJ for single microscopic field images to a whole digital slide analysis. The plugin has been complemented by examples of macro in the ImageJ scripting language to demonstrate its use in concrete situations. PMID:28683129
A high level language for a high performance computer

NASA Technical Reports Server (NTRS)

Perrott, R. H.

1978-01-01

The proposed computational aerodynamic facility will join the ranks of the supercomputers due to its architecture and increased execution speed. At present, the languages used to program these supercomputers have been modifications of programming languages which were designed many years ago for sequential machines. A new programming language should be developed based on the techniques which have proved valuable for sequential programming languages and incorporating the algorithmic techniques required for these supercomputers. The design objectives for such a language are outlined.
In-Trail Procedure (ITP) Algorithm Design

NASA Technical Reports Server (NTRS)

Munoz, Cesar A.; Siminiceanu, Radu I.

2007-01-01

The primary objective of this document is to provide a detailed description of the In-Trail Procedure (ITP) algorithm, which is part of the Airborne Traffic Situational Awareness In-Trail Procedure (ATSA-ITP) application. To this end, the document presents a high level description of the ITP Algorithm and a prototype implementation of this algorithm in the programming language C.
Automated chart review utilizing natural language processing algorithm for asthma predictive index.

PubMed

Kaur, Harsheen; Sohn, Sunghwan; Wi, Chung-Il; Ryu, Euijung; Park, Miguel A; Bachman, Kay; Kita, Hirohito; Croghan, Ivana; Castro-Rodriguez, Jose A; Voge, Gretchen A; Liu, Hongfang; Juhn, Young J

2018-02-13

Thus far, no algorithms have been developed to automatically extract patients who meet Asthma Predictive Index (API) criteria from the Electronic health records (EHR) yet. Our objective is to develop and validate a natural language processing (NLP) algorithm to identify patients that meet API criteria. This is a cross-sectional study nested in a birth cohort study in Olmsted County, MN. Asthma status ascertained by manual chart review based on API criteria served as gold standard. NLP-API was developed on a training cohort (n = 87) and validated on a test cohort (n = 427). Criterion validity was measured by sensitivity, specificity, positive predictive value and negative predictive value of the NLP algorithm against manual chart review for asthma status. Construct validity was determined by associations of asthma status defined by NLP-API with known risk factors for asthma. Among the eligible 427 subjects of the test cohort, 48% were males and 74% were White. Median age was 5.3 years (interquartile range 3.6-6.8). 35 (8%) had a history of asthma by NLP-API vs. 36 (8%) by abstractor with 31 by both approaches. NLP-API predicted asthma status with sensitivity 86%, specificity 98%, positive predictive value 88%, negative predictive value 98%. Asthma status by both NLP and manual chart review were significantly associated with the known asthma risk factors, such as history of allergic rhinitis, eczema, family history of asthma, and maternal history of smoking during pregnancy (p value < 0.05). Maternal smoking [odds ratio: 4.4, 95% confidence interval 1.8-10.7] was associated with asthma status determined by NLP-API and abstractor, and the effect sizes were similar between the reviews with 4.4 vs 4.2 respectively. NLP-API was able to ascertain asthma status in children mining from EHR and has a potential to enhance asthma care and research through population management and large-scale studies when identifying children who meet API criteria.
An experimental study of graph connectivity for unsupervised word sense disambiguation.

PubMed

Navigli, Roberto; Lapata, Mirella

2010-04-01

Word sense disambiguation (WSD), the task of identifying the intended meanings (senses) of words in context, has been a long-standing research objective for natural language processing. In this paper, we are concerned with graph-based algorithms for large-scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most "important" node among the set of graph nodes representing its senses. We introduce a graph-based WSD algorithm which has few parameters and does not require sense-annotated data for training. Using this algorithm, we investigate several measures of graph connectivity with the aim of identifying those best suited for WSD. We also examine how the chosen lexicon and its connectivity influences WSD performance. We report results on standard data sets and show that our graph-based approach performs comparably to the state of the art.
Web-accessible cervigram automatic segmentation tool

NASA Astrophysics Data System (ADS)

Xue, Zhiyun; Antani, Sameer; Long, L. Rodney; Thoma, George R.

2010-03-01

Uterine cervix image analysis is of great importance to the study of uterine cervix cancer, which is among the leading cancers affecting women worldwide. In this paper, we describe our proof-of-concept, Web-accessible system for automated segmentation of significant tissue regions in uterine cervix images, which also demonstrates our research efforts toward promoting collaboration between engineers and physicians for medical image analysis projects. Our design and implementation unifies the merits of two commonly used languages, MATLAB and Java. It circumvents the heavy workload of recoding the sophisticated segmentation algorithms originally developed in MATLAB into Java while allowing remote users who are not experienced programmers and algorithms developers to apply those processing methods to their own cervicographic images and evaluate the algorithms. Several other practical issues of the systems are also discussed, such as the compression of images and the format of the segmentation results.
Identification of Patients with Family History of Pancreatic Cancer--Investigation of an NLP System Portability.

PubMed

Mehrabi, Saeed; Krishnan, Anand; Roch, Alexandra M; Schmidt, Heidi; Li, DingCheng; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, Max; Palakal, Mathew; Liu, Hongfang

2015-01-01

In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance.
Scientific Programming Using Java: A Remote Sensing Example

NASA Technical Reports Server (NTRS)

Prados, Don; Mohamed, Mohamed A.; Johnson, Michael; Cao, Changyong; Gasser, Jerry

1999-01-01

This paper presents results of a project to port remote sensing code from the C programming language to Java. The advantages and disadvantages of using Java versus C as a scientific programming language in remote sensing applications are discussed. Remote sensing applications deal with voluminous data that require effective memory management, such as buffering operations, when processed. Some of these applications also implement complex computational algorithms, such as Fast Fourier Transformation analysis, that are very performance intensive. Factors considered include performance, precision, complexity, rapidity of development, ease of code reuse, ease of maintenance, memory management, and platform independence. Performance of radiometric calibration code written in Java for the graphical user interface and of using C for the domain model are also presented.
Development of a comprehensive software engineering environment

NASA Technical Reports Server (NTRS)

Hartrum, Thomas C.; Lamont, Gary B.

1987-01-01

The generation of a set of tools for software lifecycle is a recurring theme in the software engineering literature. The development of such tools and their integration into a software development environment is a difficult task because of the magnitude (number of variables) and the complexity (combinatorics) of the software lifecycle process. An initial development of a global approach was initiated in 1982 as the Software Development Workbench (SDW). Continuing efforts focus on tool development, tool integration, human interfacing, data dictionaries, and testing algorithms. Current efforts are emphasizing natural language interfaces, expert system software development associates and distributed environments with Ada as the target language. The current implementation of the SDW is on a VAX-11/780. Other software development tools are being networked through engineering workstations.
Tractography of Association Fibers Associated with Language Processing.

PubMed

Egger, K; Yang, S; Reisert, M; Kaller, C; Mader, I; Beume, L; Weiller, C; Urbach, H

2015-10-01

Several major association fiber tracts are known to be part of the language processing system. There is evidence that high angular diffusion-based MRI is able to separate these fascicles in a constant way. In this study, we wanted to proof this thesis using a novel whole brain "global tracking" approach and to test for possible lateralization. Global tracking was performed in six healthy right-handed volunteers for the arcuate fascicle (AF), the medial longitudinal fascicle (MdLF), the inferior fronto-occipital fascicle (IFOF), and the inferior longitudinal fascicle (ILF). These fiber tracts were characterized quantitatively using the number of streamlines (SL) and the mean fractional anisotropy (FA). We were able to characterize the AF, the MdLF, the IFOF, and the ILF consistently in six healthy volunteers using global tracking. A left-sided dominance (LI > 0.2) for the AF was found in all participants. The MdLF showed a left-sided dominance in four participants (one female, three male). Regarding the FA, no lateralization (LI > 0.2) could be shown in any of the fascicles. Using a novel global tracking algorithm we confirmed that the courses of the primary language processing associated fascicles can consistently be differentiated. Additionally we were able to show a streamline-based left-sided lateralization in the AF of all right-handed healthy subjects.
An intuitive Python interface for Bioconductor libraries demonstrates the utility of language translators.

PubMed

Gautier, Laurent

2010-12-21

Computer languages can be domain-related, and in the case of multidisciplinary projects, knowledge of several languages will be needed in order to quickly implements ideas. Moreover, each computer language has relative strong points, making some languages better suited than others for a given task to be implemented. The Bioconductor project, based on the R language, has become a reference for the numerical processing and statistical analysis of data coming from high-throughput biological assays, providing a rich selection of methods and algorithms to the research community. At the same time, Python has matured as a rich and reliable language for the agile development of prototypes or final implementations, as well as for handling large data sets. The data structures and functions from Bioconductor can be exposed to Python as a regular library. This allows a fully transparent and native use of Bioconductor from Python, without one having to know the R language and with only a small community of translators required to know both. To demonstrate this, we have implemented such Python representations for key infrastructure packages in Bioconductor, letting a Python programmer handle annotation data, microarray data, and next-generation sequencing data. Bioconductor is now not solely reserved to R users. Building a Python application using Bioconductor functionality can be done just like if Bioconductor was a Python package. Moreover, similar principles can be applied to other languages and libraries. Our Python package is available at: http://pypi.python.org/pypi/rpy2-bioconductor-extensions/.
iSentenizer-μ: multilingual sentence boundary detection model.

PubMed

Wong, Derek F; Chao, Lidia S; Zeng, Xiaodong

2014-01-01

Sentence boundary detection (SBD) system is normally quite sensitive to genres of data that the system is trained on. The genres of data are often referred to the shifts of text topics and new languages domains. Although new detection models can be retrained for different languages or new text genres, previous model has to be thrown away and the creation process has to be restarted from scratch. In this paper, we present a multilingual sentence boundary detection system (iSentenizer-μ) for Danish, German, English, Spanish, Dutch, French, Italian, Portuguese, Greek, Finnish, and Swedish languages. The proposed system is able to detect the sentence boundaries of a mixture of different text genres and languages with high accuracy. We employ i (+)Learning algorithm, an incremental tree learning architecture, for constructing the system. iSentenizer-μ, under the incremental learning framework, is adaptable to text of different topics and Roman-alphabet languages, by merging new data into existing model to learn the new knowledge incrementally by revision instead of retraining. The system has been extensively evaluated on different languages and text genres and has been compared against two state-of-the-art SBD systems, Punkt and MaxEnt. The experimental results show that the proposed system outperforms the other systems on all datasets.

Incorporating advanced language models into the P300 speller using particle filtering

NASA Astrophysics Data System (ADS)

Speier, W.; Arnold, C. W.; Deshpande, A.; Knall, J.; Pouratian, N.

2015-08-01

Objective. The P300 speller is a common brain-computer interface (BCI) application designed to communicate language by detecting event related potentials in a subject’s electroencephalogram signal. Information about the structure of natural language can be valuable for BCI communication, but attempts to use this information have thus far been limited to rudimentary n-gram models. While more sophisticated language models are prevalent in natural language processing literature, current BCI analysis methods based on dynamic programming cannot handle their complexity. Approach. Sampling methods can overcome this complexity by estimating the posterior distribution without searching the entire state space of the model. In this study, we implement sequential importance resampling, a commonly used particle filtering (PF) algorithm, to integrate a probabilistic automaton language model. Main result. This method was first evaluated offline on a dataset of 15 healthy subjects, which showed significant increases in speed and accuracy when compared to standard classification methods as well as a recently published approach using a hidden Markov model (HMM). An online pilot study verified these results as the average speed and accuracy achieved using the PF method was significantly higher than that using the HMM method. Significance. These findings strongly support the integration of domain-specific knowledge into BCI classification to improve system performance.
Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems.

PubMed

Huang, Lifu; May, Jonathan; Pan, Xiaoman; Ji, Heng; Ren, Xiang; Han, Jiawei; Zhao, Lin; Hendler, James A

2017-03-01

The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.
Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems

PubMed Central

Huang, Lifu; May, Jonathan; Pan, Xiaoman; Ji, Heng; Ren, Xiang; Han, Jiawei; Zhao, Lin; Hendler, James A.

2017-01-01

Abstract The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework. PMID:28328252
Increasing the speed of medical image processing in MatLab®

PubMed Central

Bister, M; Yap, CS; Ng, KH; Tok, CH

2007-01-01

MatLab® has often been considered an excellent environment for fast algorithm development but is generally perceived as slow and hence not fit for routine medical image processing, where large data sets are now available e.g., high-resolution CT image sets with typically hundreds of 512x512 slices. Yet, with proper programming practices – vectorization, pre-allocation and specialization – applications in MatLab® can run as fast as in C language. In this article, this point is illustrated with fast implementations of bilinear interpolation, watershed segmentation and volume rendering. PMID:21614269
Defence of Foreign Language Teaching in Secondary Schools

ERIC Educational Resources Information Center

Van Passel, F. J. A.

1974-01-01

Shows the necessity of foreign language education for cognitive and attitudinal purposes as well as for utilitarian reasons. Foreign language learning/teaching can be of great educational value when it follows the thread of the logical and psychological steps in the creative/discovery procedure. A learning algorithm is mapped on page 61. See FL…
ZettaBricks: A Language Compiler and Runtime System for Anyscale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amarasinghe, Saman

This grant supported the ZettaBricks and OpenTuner projects. ZettaBricks is a new implicitly parallel language and compiler where defining multiple implementations of multiple algorithms to solve a problem is the natural way of programming. ZettaBricks makes algorithmic choice a first class construct of the language. Choices are provided in a way that also allows our compiler to tune at a finer granularity. The ZettaBricks compiler autotunes programs by making both fine-grained as well as algorithmic choices. Choices also include different automatic parallelization techniques, data distributions, algorithmic parameters, transformations, and blocking. Additionally, ZettaBricks introduces novel techniques to autotune algorithms for differentmore » convergence criteria. When choosing between various direct and iterative methods, the ZettaBricks compiler is able to tune a program in such a way that delivers near-optimal efficiency for any desired level of accuracy. The compiler has the flexibility of utilizing different convergence criteria for the various components within a single algorithm, providing the user with accuracy choice alongside algorithmic choice. OpenTuner is a generalization of the experience gained in building an autotuner for ZettaBricks. OpenTuner is a new open source framework for building domain-specific multi-objective program autotuners. OpenTuner supports fully-customizable configuration representations, an extensible technique representation to allow for domain-specific techniques, and an easy to use interface for communicating with the program to be autotuned. A key capability inside OpenTuner is the use of ensembles of disparate search techniques simultaneously; techniques that perform well will dynamically be allocated a larger proportion of tests.« less
A "Hands on" Strategy for Teaching Genetic Algorithms to Undergraduates

ERIC Educational Resources Information Center

Venables, Anne; Tan, Grace

2007-01-01

Genetic algorithms (GAs) are a problem solving strategy that uses stochastic search. Since their introduction (Holland, 1975), GAs have proven to be particularly useful for solving problems that are "intractable" using classical methods. The language of genetic algorithms (GAs) is heavily laced with biological metaphors from evolutionary…
The Porter Stemming Algorithm: Then and Now

ERIC Educational Resources Information Center

Willett, Peter

2006-01-01

Purpose: In 1980, Porter presented a simple algorithm for stemming English language words. This paper summarises the main features of the algorithm, and highlights its role not just in modern information retrieval research, but also in a range of related subject domains. Design/methodology/approach: Review of literature and research involving use…
Automated Speech Rate Measurement in Dysarthria

ERIC Educational Resources Information Center

Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

2015-01-01

Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Z.; Pike, R.W.; Hertwig, T.A.

An effective approach for source reduction in chemical plants has been demonstrated using on-line optimization with flowsheeting (ASPEN PLUS) for process optimization and parameter estimation and the Tjao-Biegler algorithm implemented in a mathematical programming language (GAMS/MINOS) for data reconciliation and gross error detection. Results for a Monsanto sulfuric acid plant with a Bailey distributed control system showed a 25% reduction in the sulfur dioxide emissions and a 17% improvement in the profit over the current operating conditions. Details of the methods used are described.
Syntactic structures in languages and biology.

PubMed

Horn, David

2008-08-01

Both natural languages and cell biology make use of one-dimensional encryption. Their investigation calls for syntactic deciphering of the text and semantic understanding of the resulting structures. Here we discuss recently published algorithms that allow for such searches: automatic distillation of structure (ADIOS) that is successful in discovering syntactic structures in linguistic texts and its motif extraction (MEX) component that can be used for uncovering motifs in DNA and protein sequences. The underlying principles of these syntactic algorithms and some of their results will be described.
Finite difference numerical method for the superlattice Boltzmann transport equation and case comparison of CPU(C) and GPU(CUDA) implementations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Priimak, Dmitri

2014-12-01

We present a finite difference numerical algorithm for solving two dimensional spatially homogeneous Boltzmann transport equation which describes electron transport in a semiconductor superlattice subject to crossed time dependent electric and constant magnetic fields. The algorithm is implemented both in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPU. We compare performances and merits of one implementation versus another and discuss various software optimisation techniques.
Programming an interim report on the SETL project. Part I: generalities. Part II: the SETL language and examples of its use

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schwartz, J T

1975-06-01

A summary of work during the past several years on SETL, a new programming language drawing its dictions and basic concepts from the mathematical theory of sets, is presented. The work was started with the idea that a programming language modeled after an appropriate version of the formal language of mathematics might allow a programming style with some of the succinctness of mathematics, and that this might ultimately enable one to express and experiment with more complex algorithms than are now within reach. Part I discusses the general approach followed in the work. Part II focuses directly on the detailsmore » of the SETL language as it is now defined. It describes the facilities of SETL, includes short libraries of miscellaneous and of code optimization algorithms illustrating the use of SETL, and gives a detailed description of the manner in which the set-theoretic primitives provided by SETL are currently implemented. (RWR)« less
Novel structures for Discrete Hartley Transform based on first-order moments

NASA Astrophysics Data System (ADS)

Xiong, Jun; Zheng, Wenjuan; Wang, Hao; Liu, Jianguo

2018-03-01

Discrete Hartley Transform (DHT) is an important tool in digital signal processing. In the present paper, the DHT is firstly transformed into the first-order moments-based form, then a new fast algorithm is proposed to calculate the first-order moments without multiplication. Based on the algorithm theory, the corresponding hardware architecture for DHT is proposed, which only contains shift operations and additions with no need for multipliers and large memory. To verify the availability and effectiveness, the proposed design is implemented with hardware description language and synthesized by Synopsys Design Compiler with 0.18-μm SMIC library. A series of experiments have proved that the proposed architecture has better performance in terms of the product of the hardware consumption and computation time.
Expert networks in CLIPS

NASA Technical Reports Server (NTRS)

Hruska, S. I.; Dalke, A.; Ferguson, J. J.; Lacher, R. C.

1991-01-01

Rule-based expert systems may be structurally and functionally mapped onto a special class of neural networks called expert networks. This mapping lends itself to adaptation of connectionist learning strategies for the expert networks. A parsing algorithm to translate C Language Integrated Production System (CLIPS) rules into a network of interconnected assertion and operation nodes has been developed. The translation of CLIPS rules to an expert network and back again is illustrated. Measures of uncertainty similar to those rules in MYCIN-like systems are introduced into the CLIPS system and techniques for combining and hiring nodes in the network based on rule-firing with these certainty factors in the expert system are presented. Several learning algorithms are under study which automate the process of attaching certainty factors to rules.
Accelerating image recognition on mobile devices using GPGPU

NASA Astrophysics Data System (ADS)

Bordallo López, Miguel; Nykänen, Henri; Hannuksela, Jari; Silvén, Olli; Vehviläinen, Markku

2011-01-01

The future multi-modal user interfaces of battery-powered mobile devices are expected to require computationally costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for parallel processing and the addition of programmable stages and high precision arithmetic provide for opportunities to implement energy-efficient complete algorithms. At the moment the first mobile graphics accelerators with programmable pipelines are available, enabling the GPGPU implementation of several image processing algorithms. In this context, we consider a face tracking approach that uses efficient gray-scale invariant texture features and boosting. The solution is based on the Local Binary Pattern (LBP) features and makes use of the GPU on the pre-processing and feature extraction phase. We have implemented a series of image processing techniques in the shader language of OpenGL ES 2.0, compiled them for a mobile graphics processing unit and performed tests on a mobile application processor platform (OMAP3530). In our contribution, we describe the challenges of designing on a mobile platform, present the performance achieved and provide measurement results for the actual power consumption in comparison to using the CPU (ARM) on the same platform.
RACER: Effective Race Detection Using AspectJ

NASA Technical Reports Server (NTRS)

Bodden, Eric; Havelund, Klaus

2008-01-01

The limits of coding with joint constraints on detected and undetected error rates Programming errors occur frequently in large software systems, and even more so if these systems are concurrent. In the past, researchers have developed specialized programs to aid programmers detecting concurrent programming errors such as deadlocks, livelocks, starvation and data races. In this work we propose a language extension to the aspect-oriented programming language AspectJ, in the form of three new built-in pointcuts, lock(), unlock() and may be Shared(), which allow programmers to monitor program events where locks are granted or handed back, and where values are accessed that may be shared amongst multiple Java threads. We decide thread-locality using a static thread-local objects analysis developed by others. Using the three new primitive pointcuts, researchers can directly implement efficient monitoring algorithms to detect concurrent programming errors online. As an example, we expose a new algorithm which we call RACER, an adoption of the well-known ERASER algorithm to the memory model of Java. We implemented the new pointcuts as an extension to the Aspect Bench Compiler, implemented the RACER algorithm using this language extension and then applied the algorithm to the NASA K9 Rover Executive. Our experiments proved our implementation very effective. In the Rover Executive RACER finds 70 data races. Only one of these races was previously known.We further applied the algorithm to two other multi-threaded programs written by Computer Science researchers, in which we found races as well.
A Statistical-Physics Approach to Language Acquisition and Language Change

NASA Astrophysics Data System (ADS)

Cassandro, Marzio; Collet, Pierre; Galves, Antonio; Galves, Charlotte

1999-02-01

The aim of this paper is to explain why Statistical Physics can help understanding two related linguistic questions. The first question is how to model first language acquisition by a child. The second question is how language change proceeds in time. Our approach is based on a Gibbsian model for the interface between syntax and prosody. We also present a simulated annealing model of language acquisition, which extends the Triggering Learning Algorithm recently introduced in the linguistic literature.
Automating software design system DESTA

NASA Technical Reports Server (NTRS)

Lovitsky, Vladimir A.; Pearce, Patricia D.

1992-01-01

'DESTA' is the acronym for the Dialogue Evolutionary Synthesizer of Turnkey Algorithms by means of a natural language (Russian or English) functional specification of algorithms or software being developed. DESTA represents the computer-aided and/or automatic artificial intelligence 'forgiving' system which provides users with software tools support for algorithm and/or structured program development. The DESTA system is intended to provide support for the higher levels and earlier stages of engineering design of software in contrast to conventional Computer Aided Design (CAD) systems which provide low level tools for use at a stage when the major planning and structuring decisions have already been taken. DESTA is a knowledge-intensive system. The main features of the knowledge are procedures, functions, modules, operating system commands, batch files, their natural language specifications, and their interlinks. The specific domain for the DESTA system is a high level programming language like Turbo Pascal 6.0. The DESTA system is operational and runs on an IBM PC computer.
A Novel Application System of Assessing the Pronunciation Differences Between Chinese Children and Adults.

PubMed

Zhang, Xiaoyang; Xue, Lei; Zhang, Zhi; Zhang, Yiwen

2016-01-01

Health problems about children have been attracting much attention of parents and even the whole society all the time, among which, child-language development is a hot research topic. The experts and scholars have studied and found that the guardians taking appropriate intervention in children at the early stage can promote children's language and cognitive ability development effectively, and carry out analysis of quantity. The intervention of Artificial Intelligence Technology has effect on the autistic spectrum disorders of children obviously. This paper presents a speech signal analysis system for children, with preprocessing of the speaker speech signal, subsequent calculation of the number in the speech of guardians and children, and some other characteristic parameters or indicators (e.g cognizable syllable number, the continuity of the language). With these quantitative analysis tool and parameters, we can evaluate and analyze the quality of children's language and cognitive ability objectively and quantitatively to provide the basis for decision-making criteria for parents. Thereby, they can adopt appropriate measures for children to promote the development of children's language and cognitive status. In this paper, according to the existing study of children's language development, we put forward several indicators in the process of automatic measurement for language development which influence the formation of children's language. From the experimental results we can see that after the pretreatment (including signal enhancement, speech activity detection), both divergence algorithm calculation results and the later words count are quite satisfactory compared with the actual situation.

OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-10-01

Octgrav is a very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.
Cellular automata-based modelling and simulation of biofilm structure on multi-core computers.

PubMed

Skoneczny, Szymon

2015-01-01

The article presents a mathematical model of biofilm growth for aerobic biodegradation of a toxic carbonaceous substrate. Modelling of biofilm growth has fundamental significance in numerous processes of biotechnology and mathematical modelling of bioreactors. The process following double-substrate kinetics with substrate inhibition proceeding in a biofilm has not been modelled so far by means of cellular automata. Each process in the model proposed, i.e. diffusion of substrates, uptake of substrates, growth and decay of microorganisms and biofilm detachment, is simulated in a discrete manner. It was shown that for flat biofilm of constant thickness, the results of the presented model agree with those of a continuous model. The primary outcome of the study was to propose a mathematical model of biofilm growth; however a considerable amount of focus was also placed on the development of efficient algorithms for its solution. Two parallel algorithms were created, differing in the way computations are distributed. Computer programs were created using OpenMP Application Programming Interface for C++ programming language. Simulations of biofilm growth were performed on three high-performance computers. Speed-up coefficients of computer programs were compared. Both algorithms enabled a significant reduction of computation time. It is important, inter alia, in modelling and simulation of bioreactor dynamics.
Computer vision camera with embedded FPGA processing

NASA Astrophysics Data System (ADS)

Lecerf, Antoine; Ouellet, Denis; Arias-Estrada, Miguel

2000-03-01

Traditional computer vision is based on a camera-computer system in which the image understanding algorithms are embedded in the computer. To circumvent the computational load of vision algorithms, low-level processing and imaging hardware can be integrated in a single compact module where a dedicated architecture is implemented. This paper presents a Computer Vision Camera based on an open architecture implemented in an FPGA. The system is targeted to real-time computer vision tasks where low level processing and feature extraction tasks can be implemented in the FPGA device. The camera integrates a CMOS image sensor, an FPGA device, two memory banks, and an embedded PC for communication and control tasks. The FPGA device is a medium size one equivalent to 25,000 logic gates. The device is connected to two high speed memory banks, an IS interface, and an imager interface. The camera can be accessed for architecture programming, data transfer, and control through an Ethernet link from a remote computer. A hardware architecture can be defined in a Hardware Description Language (like VHDL), simulated and synthesized into digital structures that can be programmed into the FPGA and tested on the camera. The architecture of a classical multi-scale edge detection algorithm based on a Laplacian of Gaussian convolution has been developed to show the capabilities of the system.
Generation of Referring Expressions: Assessing the Incremental Algorithm

ERIC Educational Resources Information Center

van Deemter, Kees; Gatt, Albert; van der Sluis, Ielka; Power, Richard

2012-01-01

A substantial amount of recent work in natural language generation has focused on the generation of "one-shot" referring expressions whose only aim is to identify a target referent. Dale and Reiter's Incremental Algorithm (IA) is often thought to be the best algorithm for maximizing the similarity to referring expressions produced by people. We…
RISMA: A Rule-based Interval State Machine Algorithm for Alerts Generation, Performance Analysis and Monitoring Real-Time Data Processing

NASA Astrophysics Data System (ADS)

Laban, Shaban; El-Desouky, Aly

2013-04-01

The monitoring of real-time systems is a challenging and complicated process. So, there is a continuous need to improve the monitoring process through the use of new intelligent techniques and algorithms for detecting exceptions, anomalous behaviours and generating the necessary alerts during the workflow monitoring of such systems. The interval-based or period-based theorems have been discussed, analysed, and used by many researches in Artificial Intelligence (AI), philosophy, and linguistics. As explained by Allen, there are 13 relations between any two intervals. Also, there have also been many studies of interval-based temporal reasoning and logics over the past decades. Interval-based theorems can be used for monitoring real-time interval-based data processing. However, increasing the number of processed intervals makes the implementation of such theorems a complex and time consuming process as the relationships between such intervals are increasing exponentially. To overcome the previous problem, this paper presents a Rule-based Interval State Machine Algorithm (RISMA) for processing, monitoring, and analysing the behaviour of interval-based data, received from real-time sensors. The proposed intelligent algorithm uses the Interval State Machine (ISM) approach to model any number of interval-based data into well-defined states as well as inferring them. An interval-based state transition model and methodology are presented to identify the relationships between the different states of the proposed algorithm. By using such model, the unlimited number of relationships between similar large numbers of intervals can be reduced to only 18 direct relationships using the proposed well-defined states. For testing the proposed algorithm, necessary inference rules and code have been designed and applied to the continuous data received in near real-time from the stations of International Monitoring System (IMS) by the International Data Centre (IDC) of the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO). The CLIPS expert system shell has been used as the main rule engine for implementing the algorithm rules. Python programming language and the module "PyCLIPS" are used for building the necessary code for algorithm implementation. More than 1.7 million intervals constitute the Concise List of Frames (CLF) from 20 different seismic stations have been used for evaluating the proposed algorithm and evaluating stations behaviour and performance. The initial results showed that proposed algorithm can help in better understanding of the operation and performance of those stations. Different important information, such as alerts and some station performance parameters, can be derived from the proposed algorithm. For IMS interval-based data and at any period of time it is possible to analyze station behavior, determine the missing data, generate necessary alerts, and to measure some of station performance attributes. The details of the proposed algorithm, methodology, implementation, experimental results, advantages, and limitations of this research are presented. Finally, future directions and recommendations are discussed.
Simple, Scalable, Script-based, Science Processor for Measurements - Data Mining Edition (S4PM-DME)

NASA Astrophysics Data System (ADS)

Pham, L. B.; Eng, E. K.; Lynnes, C. S.; Berrick, S. W.; Vollmer, B. E.

2005-12-01

The S4PM-DME is the Goddard Earth Sciences Distributed Active Archive Center's (GES DAAC) web-based data mining environment. The S4PM-DME replaces the Near-line Archive Data Mining (NADM) system with a better web environment and a richer set of production rules. S4PM-DME enables registered users to submit and execute custom data mining algorithms. The S4PM-DME system uses the GES DAAC developed Simple Scalable Script-based Science Processor for Measurements (S4PM) to automate tasks and perform the actual data processing. A web interface allows the user to access the S4PM-DME system. The user first develops personalized data mining algorithm on his/her home platform and then uploads them to the S4PM-DME system. Algorithms in C and FORTRAN languages are currently supported. The user developed algorithm is automatically audited for any potential security problems before it is installed within the S4PM-DME system and made available to the user. Once the algorithm has been installed the user can promote the algorithm to the "operational" environment. From here the user can search and order the data available in the GES DAAC archive for his/her science algorithm. The user can also set up a processing subscription. The subscription will automatically process new data as it becomes available in the GES DAAC archive. The generated mined data products are then made available for FTP pickup. The benefits of using S4PM-DME are 1) to decrease the downloading time it typically takes a user to transfer the GES DAAC data to his/her system thus off-load the heavy network traffic, 2) to free-up the load on their system, and last 3) to utilize the rich and abundance ocean, atmosphere data from the MODIS and AIRS instruments available from the GES DAAC.
THREAD: A programming environment for interactive planning-level robotics applications

NASA Technical Reports Server (NTRS)

Beahan, John J., Jr.

1989-01-01

THREAD programming language, which was developed to meet the needs of researchers in developing robotics applications that perform such tasks as grasp, trajectory design, sensor data analysis, and interfacing with external subsystems in order to perform servo-level control of manipulators and real time sensing is discussed. The philosophy behind THREAD, the issues which entered into its design, and the features of the language are discussed from the viewpoint of researchers who want to develop algorithms in a simulation environment, and from those who want to implement physical robotics systems. The detailed functions of the many complex robotics algorithms and tools which are part of the language are not explained, but an overall impression of their capability is given.
An Approach to a Comprehensive Test Framework for Analysis and Evaluation of Text Line Segmentation Algorithms

PubMed Central

Brodic, Darko; Milivojevic, Dragan R.; Milivojevic, Zoran N.

2011-01-01

The paper introduces a testing framework for the evaluation and validation of text line segmentation algorithms. Text line segmentation represents the key action for correct optical character recognition. Many of the tests for the evaluation of text line segmentation algorithms deal with text databases as reference templates. Because of the mismatch, the reliable testing framework is required. Hence, a new approach to a comprehensive experimental framework for the evaluation of text line segmentation algorithms is proposed. It consists of synthetic multi-like text samples and real handwritten text as well. Although the tests are mutually independent, the results are cross-linked. The proposed method can be used for different types of scripts and languages. Furthermore, two different procedures for the evaluation of algorithm efficiency based on the obtained error type classification are proposed. The first is based on the segmentation line error description, while the second one incorporates well-known signal detection theory. Each of them has different capabilities and convenience, but they can be used as supplements to make the evaluation process efficient. Overall the proposed procedure based on the segmentation line error description has some advantages, characterized by five measures that describe measurement procedures. PMID:22164106
Developing the science product algorithm testbed for Chinese next-generation geostationary meteorological satellites: Fengyun-4 series

NASA Astrophysics Data System (ADS)

Min, Min; Wu, Chunqiang; Li, Chuan; Liu, Hui; Xu, Na; Wu, Xiao; Chen, Lin; Wang, Fu; Sun, Fenglin; Qin, Danyu; Wang, Xi; Li, Bo; Zheng, Zhaojun; Cao, Guangzhen; Dong, Lixin

2017-08-01

Fengyun-4A (FY-4A), the first of the Chinese next-generation geostationary meteorological satellites, launched in 2016, offers several advances over the FY-2: more spectral bands, faster imaging, and infrared hyperspectral measurements. To support the major objective of developing the prototypes of FY-4 science algorithms, two science product algorithm testbeds for imagers and sounders have been developed by the scientists in the FY-4 Algorithm Working Group (AWG). Both testbeds, written in FORTRAN and C programming languages for Linux or UNIX systems, have been tested successfully by using Intel/g compilers. Some important FY-4 science products, including cloud mask, cloud properties, and temperature profiles, have been retrieved successfully through using a proxy imager, Himawari-8/Advanced Himawari Imager (AHI), and sounder data, obtained from the Atmospheric InfraRed Sounder, thus demonstrating their robustness. In addition, in early 2016, the FY-4 AWG was developed based on the imager testbed—a near real-time processing system for Himawari-8/AHI data for use by Chinese weather forecasters. Consequently, robust and flexible science product algorithm testbeds have provided essential and productive tools for popularizing FY-4 data and developing substantial improvements in FY-4 products.
An approach to a comprehensive test framework for analysis and evaluation of text line segmentation algorithms.

PubMed

Brodic, Darko; Milivojevic, Dragan R; Milivojevic, Zoran N

2011-01-01

The paper introduces a testing framework for the evaluation and validation of text line segmentation algorithms. Text line segmentation represents the key action for correct optical character recognition. Many of the tests for the evaluation of text line segmentation algorithms deal with text databases as reference templates. Because of the mismatch, the reliable testing framework is required. Hence, a new approach to a comprehensive experimental framework for the evaluation of text line segmentation algorithms is proposed. It consists of synthetic multi-like text samples and real handwritten text as well. Although the tests are mutually independent, the results are cross-linked. The proposed method can be used for different types of scripts and languages. Furthermore, two different procedures for the evaluation of algorithm efficiency based on the obtained error type classification are proposed. The first is based on the segmentation line error description, while the second one incorporates well-known signal detection theory. Each of them has different capabilities and convenience, but they can be used as supplements to make the evaluation process efficient. Overall the proposed procedure based on the segmentation line error description has some advantages, characterized by five measures that describe measurement procedures.
Computerized scoring algorithms for the Autobiographical Memory Test.

PubMed

Takano, Keisuke; Gutenbrunner, Charlotte; Martens, Kris; Salmon, Karen; Raes, Filip

2018-02-01

Reduced specificity of autobiographical memories is a hallmark of depressive cognition. Autobiographical memory (AM) specificity is typically measured by the Autobiographical Memory Test (AMT), in which respondents are asked to describe personal memories in response to emotional cue words. Due to this free descriptive responding format, the AMT relies on experts' hand scoring for subsequent statistical analyses. This manual coding potentially impedes research activities in big data analytics such as large epidemiological studies. Here, we propose computerized algorithms to automatically score AM specificity for the Dutch (adult participants) and English (youth participants) versions of the AMT by using natural language processing and machine learning techniques. The algorithms showed reliable performances in discriminating specific and nonspecific (e.g., overgeneralized) autobiographical memories in independent testing data sets (area under the receiver operating characteristic curve > .90). Furthermore, outcome values of the algorithms (i.e., decision values of support vector machines) showed a gradient across similar (e.g., specific and extended memories) and different (e.g., specific memory and semantic associates) categories of AMT responses, suggesting that, for both adults and youth, the algorithms well capture the extent to which a memory has features of specific memories. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Chaotic Traversal (CHAT): Very Large Graphs Traversal Using Chaotic Dynamics

NASA Astrophysics Data System (ADS)

Changaival, Boonyarit; Rosalie, Martin; Danoy, Grégoire; Lavangnananda, Kittichai; Bouvry, Pascal

2017-12-01

Graph Traversal algorithms can find their applications in various fields such as routing problems, natural language processing or even database querying. The exploration can be considered as a first stepping stone into knowledge extraction from the graph which is now a popular topic. Classical solutions such as Breadth First Search (BFS) and Depth First Search (DFS) require huge amounts of memory for exploring very large graphs. In this research, we present a novel memoryless graph traversal algorithm, Chaotic Traversal (CHAT) which integrates chaotic dynamics to traverse large unknown graphs via the Lozi map and the Rössler system. To compare various dynamics effects on our algorithm, we present an original way to perform the exploration of a parameter space using a bifurcation diagram with respect to the topological structure of attractors. The resulting algorithm is an efficient and nonresource demanding algorithm, and is therefore very suitable for partial traversal of very large and/or unknown environment graphs. CHAT performance using Lozi map is proven superior than the, commonly known, Random Walk, in terms of number of nodes visited (coverage percentage) and computation time where the environment is unknown and memory usage is restricted.
SIGNUM: A Matlab, TIN-based landscape evolution model

NASA Astrophysics Data System (ADS)

Refice, A.; Giachetta, E.; Capolongo, D.

2012-08-01

Several numerical landscape evolution models (LEMs) have been developed to date, and many are available as open source codes. Most are written in efficient programming languages such as Fortran or C, but often require additional code efforts to plug in to more user-friendly data analysis and/or visualization tools to ease interpretation and scientific insight. In this paper, we present an effort to port a common core of accepted physical principles governing landscape evolution directly into a high-level language and data analysis environment such as Matlab. SIGNUM (acronym for Simple Integrated Geomorphological Numerical Model) is an independent and self-contained Matlab, TIN-based landscape evolution model, built to simulate topography development at various space and time scales. SIGNUM is presently capable of simulating hillslope processes such as linear and nonlinear diffusion, fluvial incision into bedrock, spatially varying surface uplift which can be used to simulate changes in base level, thrust and faulting, as well as effects of climate changes. Although based on accepted and well-known processes and algorithms in its present version, it is built with a modular structure, which allows to easily modify and upgrade the simulated physical processes to suite virtually any user needs. The code is conceived as an open-source project, and is thus an ideal tool for both research and didactic purposes, thanks to the high-level nature of the Matlab environment and its popularity among the scientific community. In this paper the simulation code is presented together with some simple examples of surface evolution, and guidelines for development of new modules and algorithms are proposed.
A CCTV system with SMS alert (CMDSA): An implementation of pixel processing algorithm for motion detection

NASA Astrophysics Data System (ADS)

Rahman, Nurul Hidayah Ab; Abdullah, Nurul Azma; Hamid, Isredza Rahmi A.; Wen, Chuah Chai; Jelani, Mohamad Shafiqur Rahman Mohd

2017-10-01

Closed-Circuit TV (CCTV) system is one of the technologies in surveillance field to solve the problem of detection and monitoring by providing extra features such as email alert or motion detection. However, detecting and alerting the admin on CCTV system may complicate due to the complexity to integrate the main program with an external Application Programming Interface (API). In this study, pixel processing algorithm is applied due to its efficiency and SMS alert is added as an alternative solution for users who opted out email alert system or have no Internet connection. A CCTV system with SMS alert (CMDSA) was developed using evolutionary prototyping methodology. The system interface was implemented using Microsoft Visual Studio while the backend components, which are database and coding, were implemented on SQLite database and C# programming language, respectively. The main modules of CMDSA are motion detection, capturing and saving video, image processing and Short Message Service (SMS) alert functions. Subsequently, the system is able to reduce the processing time making the detection process become faster, reduce the space and memory used to run the program and alerting the system admin instantly.
Knowledge-Driven Event Extraction in Russian: Corpus-Based Linguistic Resources

PubMed Central

Solovyev, Valery; Ivanov, Vladimir

2016-01-01

Automatic event extraction form text is an important step in knowledge acquisition and knowledge base population. Manual work in development of extraction system is indispensable either in corpus annotation or in vocabularies and pattern creation for a knowledge-based system. Recent works have been focused on adaptation of existing system (for extraction from English texts) to new domains. Event extraction in other languages was not studied due to the lack of resources and algorithms necessary for natural language processing. In this paper we define a set of linguistic resources that are necessary in development of a knowledge-based event extraction system in Russian: a vocabulary of subordination models, a vocabulary of event triggers, and a vocabulary of Frame Elements that are basic building blocks for semantic patterns. We propose a set of methods for creation of such vocabularies in Russian and other languages using Google Books NGram Corpus. The methods are evaluated in development of event extraction system for Russian. PMID:26955386
Development of a prototype multi-processing interactive software invocation system

NASA Technical Reports Server (NTRS)

Berman, W. J.

1983-01-01

The Interactive Software Invocation System (NASA-ISIS) was first transported to the M68000 microcomputer, and then rewritten in the programming language Path Pascal. Path Pascal is a significantly enhanced derivative of Pascal, allowing concurrent algorithms to be expressed using the simple and elegant concept of Path Expressions. The primary results of this contract was to verify the viability of Path Pascal as a system's development language. The NASA-ISIS implementation using Path Pascal is a prototype of a large, interactive system in Path Pascal. As such, it is an excellent demonstration of the feasibility of using Path Pascal to write even more extensive systems. It is hoped that future efforts will build upon this research and, ultimately, that a full Path Pascal/ISIS Operating System (PPIOS) might be developed.
An intuitive Python interface for Bioconductor libraries demonstrates the utility of language translators

PubMed Central

2010-01-01

Background Computer languages can be domain-related, and in the case of multidisciplinary projects, knowledge of several languages will be needed in order to quickly implements ideas. Moreover, each computer language has relative strong points, making some languages better suited than others for a given task to be implemented. The Bioconductor project, based on the R language, has become a reference for the numerical processing and statistical analysis of data coming from high-throughput biological assays, providing a rich selection of methods and algorithms to the research community. At the same time, Python has matured as a rich and reliable language for the agile development of prototypes or final implementations, as well as for handling large data sets. Results The data structures and functions from Bioconductor can be exposed to Python as a regular library. This allows a fully transparent and native use of Bioconductor from Python, without one having to know the R language and with only a small community of translators required to know both. To demonstrate this, we have implemented such Python representations for key infrastructure packages in Bioconductor, letting a Python programmer handle annotation data, microarray data, and next-generation sequencing data. Conclusions Bioconductor is now not solely reserved to R users. Building a Python application using Bioconductor functionality can be done just like if Bioconductor was a Python package. Moreover, similar principles can be applied to other languages and libraries. Our Python package is available at: http://pypi.python.org/pypi/rpy2-bioconductor-extensions/ PMID:21210978
Pairwise Sequence Alignment Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jeff Daily, PNNL

2015-05-20

Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less
Automatic Debugging Support for UML Designs

NASA Technical Reports Server (NTRS)

Schumann, Johann; Swanson, Keith (Technical Monitor)

2001-01-01

Design of large software systems requires rigorous application of software engineering methods covering all phases of the software process. Debugging during the early design phases is extremely important, because late bug-fixes are expensive. In this paper, we describe an approach which facilitates debugging of UML requirements and designs. The Unified Modeling Language (UML) is a set of notations for object-orient design of a software system. We have developed an algorithm which translates requirement specifications in the form of annotated sequence diagrams into structured statecharts. This algorithm detects conflicts between sequence diagrams and inconsistencies in the domain knowledge. After synthesizing statecharts from sequence diagrams, these statecharts usually are subject to manual modification and refinement. By using the "backward" direction of our synthesis algorithm. we are able to map modifications made to the statechart back into the requirements (sequence diagrams) and check for conflicts there. Fed back to the user conflicts detected by our algorithm are the basis for deductive-based debugging of requirements and domain theory in very early development stages. Our approach allows to generate explanations oil why there is a conflict and which parts of the specifications are affected.
Basic test framework for the evaluation of text line segmentation and text parameter extraction.

PubMed

Brodić, Darko; Milivojević, Dragan R; Milivojević, Zoran

2010-01-01

Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms.

Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction

PubMed Central

Brodić, Darko; Milivojević, Dragan R.; Milivojević, Zoran

2010-01-01

Text line segmentation is an essential stage in off-line optical character recognition (OCR) systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms. PMID:22399932
Automatic identification of comparative effectiveness research from Medline citations to support clinicians’ treatment information needs

PubMed Central

Zhang, Mingyuan; Fiol, Guilherme Del; Grout, Randall W.; Jonnalagadda, Siddhartha; Medlin, Richard; Mishra, Rashmi; Weir, Charlene; Liu, Hongfang; Mostafa, Javed; Fiszman, Marcelo

2014-01-01

Online knowledge resources such as Medline can address most clinicians’ patient care information needs. Yet, significant barriers, notably lack of time, limit the use of these sources at the point of care. The most common information needs raised by clinicians are treatment-related. Comparative effectiveness studies allow clinicians to consider multiple treatment alternatives for a particular problem. Still, solutions are needed to enable efficient and effective consumption of comparative effectiveness research at the point of care. Objective Design and assess an algorithm for automatically identifying comparative effectiveness studies and extracting the interventions investigated in these studies. Methods The algorithm combines semantic natural language processing, Medline citation metadata, and machine learning techniques. We assessed the algorithm in a case study of treatment alternatives for depression. Results Both precision and recall for identifying comparative studies was 0.83. A total of 86% of the interventions extracted perfectly or partially matched the gold standard. Conclusion Overall, the algorithm achieved reasonable performance. The method provides building blocks for the automatic summarization of comparative effectiveness research to inform point of care decision-making. PMID:23920677
An early illness recognition framework using a temporal Smith Waterman algorithm and NLP.

PubMed

Hajihashemi, Zahra; Popescu, Mihail

2013-01-01

In this paper we propose a framework for detecting health patterns based on non-wearable sensor sequence similarity and natural language processing (NLP). In TigerPlace, an aging in place facility from Columbia, MO, we deployed 47 sensor networks together with a nursing electronic health record (EHR) system to provide early illness recognition. The proposed framework utilizes sensor sequence similarity and NLP on EHR nursing comments to automatically notify the physician when health problems are detected. The reported methodology is inspired by genomic sequence annotation using similarity algorithms such as Smith Waterman (SW). Similarly, for each sensor sequence, we associate health concepts extracted from the nursing notes using Metamap, a NLP tool provided by Unified Medical Language System (UMLS). Since sensor sequences, unlike genomics ones, have an associated time dimension we propose a temporal variant of SW (TSW) to account for time. The main challenges presented by our framework are finding the most suitable time sequence similarity and aggregation of the retrieved UMLS concepts. On a pilot dataset from three Tiger Place residents, with a total of 1685 sensor days and 626 nursing records, we obtained an average precision of 0.64 and a recall of 0.37.
Scoring best-worst data in unbalanced many-item designs, with applications to crowdsourcing semantic judgments.

PubMed

Hollis, Geoff

2018-04-01

Best-worst scaling is a judgment format in which participants are presented with a set of items and have to choose the superior and inferior items in the set. Best-worst scaling generates a large quantity of information per judgment because each judgment allows for inferences about the rank value of all unjudged items. This property of best-worst scaling makes it a promising judgment format for research in psychology and natural language processing concerned with estimating the semantic properties of tens of thousands of words. A variety of different scoring algorithms have been devised in the previous literature on best-worst scaling. However, due to problems of computational efficiency, these scoring algorithms cannot be applied efficiently to cases in which thousands of items need to be scored. New algorithms are presented here for converting responses from best-worst scaling into item scores for thousands of items (many-item scoring problems). These scoring algorithms are validated through simulation and empirical experiments, and considerations related to noise, the underlying distribution of true values, and trial design are identified that can affect the relative quality of the derived item scores. The newly introduced scoring algorithms consistently outperformed scoring algorithms used in the previous literature on scoring many-item best-worst data.
Identifying Falls Risk Screenings Not Documented with Administrative Codes Using Natural Language Processing

PubMed Central

Zhu, Vivienne J; Walker, Tina D; Warren, Robert W; Jenny, Peggy B; Meystre, Stephane; Lenert, Leslie A

2017-01-01

Quality reporting that relies on coded administrative data alone may not completely and accurately depict providers’ performance. To assess this concern with a test case, we developed and evaluated a natural language processing (NLP) approach to identify falls risk screenings documented in clinical notes of patients without coded falls risk screening data. Extracting information from 1,558 clinical notes (mainly progress notes) from 144 eligible patients, we generated a lexicon of 38 keywords relevant to falls risk screening, 26 terms for pre-negation, and 35 terms for post-negation. The NLP algorithm identified 62 (out of the 144) patients who falls risk screening documented only in clinical notes and not coded. Manual review confirmed 59 patients as true positives and 77 patients as true negatives. Our NLP approach scored 0.92 for precision, 0.95 for recall, and 0.93 for F-measure. These results support the concept of utilizing NLP to enhance healthcare quality reporting. PMID:29854264
Exploiting current-generation graphics hardware for synthetic-scene generation

NASA Astrophysics Data System (ADS)

Tanner, Michael A.; Keen, Wayne A.

2010-04-01

Increasing seeker frame rate and pixel count, as well as the demand for higher levels of scene fidelity, have driven scene generation software for hardware-in-the-loop (HWIL) and software-in-the-loop (SWIL) testing to higher levels of parallelization. Because modern PC graphics cards provide multiple computational cores (240 shader cores for a current NVIDIA Corporation GeForce and Quadro cards), implementation of phenomenology codes on graphics processing units (GPUs) offers significant potential for simultaneous enhancement of simulation frame rate and fidelity. To take advantage of this potential requires algorithm implementation that is structured to minimize data transfers between the central processing unit (CPU) and the GPU. In this paper, preliminary methodologies developed at the Kinetic Hardware In-The-Loop Simulator (KHILS) will be presented. Included in this paper will be various language tradeoffs between conventional shader programming, Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), including performance trades and possible pathways for future tool development.
An Evaluation of a Natural Language Processing Tool for Identifying and Encoding Allergy Information in Emergency Department Clinical Notes

PubMed Central

Goss, Foster R.; Plasek, Joseph M.; Lau, Jason J.; Seger, Diane L.; Chang, Frank Y.; Zhou, Li

2014-01-01

Emergency department (ED) visits due to allergic reactions are common. Allergy information is often recorded in free-text provider notes; however, this domain has not yet been widely studied by the natural language processing (NLP) community. We developed an allergy module built on the MTERMS NLP system to identify and encode food, drug, and environmental allergies and allergic reactions. The module included updates to our lexicon using standard terminologies, and novel disambiguation algorithms. We developed an annotation schema and annotated 400 ED notes that served as a gold standard for comparison to MTERMS output. MTERMS achieved an F-measure of 87.6% for the detection of allergen names and no known allergies, 90% for identifying true reactions in each allergy statement where true allergens were also identified, and 69% for linking reactions to their allergen. These preliminary results demonstrate the feasibility using NLP to extract and encode allergy information from clinical notes. PMID:25954363
Framework for Infectious Disease Analysis: A comprehensive and integrative multi-modeling approach to disease prediction and management.

PubMed

Erraguntla, Madhav; Zapletal, Josef; Lawley, Mark

2017-12-01

The impact of infectious disease on human populations is a function of many factors including environmental conditions, vector dynamics, transmission mechanics, social and cultural behaviors, and public policy. A comprehensive framework for disease management must fully connect the complete disease lifecycle, including emergence from reservoir populations, zoonotic vector transmission, and impact on human societies. The Framework for Infectious Disease Analysis is a software environment and conceptual architecture for data integration, situational awareness, visualization, prediction, and intervention assessment. Framework for Infectious Disease Analysis automatically collects biosurveillance data using natural language processing, integrates structured and unstructured data from multiple sources, applies advanced machine learning, and uses multi-modeling for analyzing disease dynamics and testing interventions in complex, heterogeneous populations. In the illustrative case studies, natural language processing from social media, news feeds, and websites was used for information extraction, biosurveillance, and situation awareness. Classification machine learning algorithms (support vector machines, random forests, and boosting) were used for disease predictions.
caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research.

PubMed

Crowley, Rebecca S; Castine, Melissa; Mitchell, Kevin; Chavan, Girish; McSherry, Tara; Feldman, Michael

2010-01-01

The authors report on the development of the Cancer Tissue Information Extraction System (caTIES)--an application that supports collaborative tissue banking and text mining by leveraging existing natural language processing methods and algorithms, grid communication and security frameworks, and query visualization methods. The system fills an important need for text-derived clinical data in translational research such as tissue-banking and clinical trials. The design of caTIES addresses three critical issues for informatics support of translational research: (1) federation of research data sources derived from clinical systems; (2) expressive graphical interfaces for concept-based text mining; and (3) regulatory and security model for supporting multi-center collaborative research. Implementation of the system at several Cancer Centers across the country is creating a potential network of caTIES repositories that could provide millions of de-identified clinical reports to users. The system provides an end-to-end application of medical natural language processing to support multi-institutional translational research programs.
A Framework for Sentiment Analysis Implementation of Indonesian Language Tweet on Twitter

NASA Astrophysics Data System (ADS)

Asniar; Aditya, B. R.

2017-01-01

Sentiment analysis is the process of understanding, extracting, and processing the textual data automatically to obtain information. Sentiment analysis can be used to see opinion on an issue and identify a response to something. Millions of digital data are still not used to be able to provide any information that has usefulness, especially for government. Sentiment analysis in government is used to monitor the work programs of the government such as the Government of Bandung City through social media data. The analysis can be used quickly as a tool to see the public response to the work programs, so the next strategic steps can be taken. This paper adopts Support Vector Machine as a supervised algorithm for sentiment analysis. It presents a framework for sentiment analysis implementation of Indonesian language tweet on twitter for Work Programs of Government of Bandung City. The results of this paper can be a reference for decision making in local government.
ASIC implementation of recursive scaled discrete cosine transform algorithm

NASA Astrophysics Data System (ADS)

On, Bill N.; Narasimhan, Sam; Huang, Victor K.

1994-05-01

A program to implement the Recursive Scaled Discrete Cosine Transform (DCT) algorithm as proposed by H. S. Hou has been undertaken at the Institute of Microelectronics. Implementation of the design was done using top-down design methodology with VHDL (VHSIC Hardware Description Language) for chip modeling. When the VHDL simulation has been satisfactorily completed, the design is synthesized into gates using a synthesis tool. The architecture of the design consists of two processing units together with a memory module for data storage and transpose. Each processing unit is composed of four pipelined stages which allow the internal clock to run at one-eighth (1/8) the speed of the pixel clock. Each stage operates on eight pixels in parallel. As the data flows through each stage, there are various adders and multipliers to transform them into the desired coefficients. The Scaled IDCT was implemented in a similar fashion with the adders and multipliers rearranged to perform the inverse DCT algorithm. The chip has been verified using Field Programmable Gate Array devices. The design is operational. The combination of fewer multiplications required and pipelined architecture give Hou's Recursive Scaled DCT good potential of achieving high performance at a low cost in using Very Large Scale Integration implementation.
Object/rule integration in CLIPS. [C Language Integrated Production System

NASA Technical Reports Server (NTRS)

Donnell, Brian L.

1993-01-01

This paper gives a brief overview of the C Language Integrated Production System (CLIPS) with a focus on the object-oriented features. The advantages of an object data representation over the traditional working memory element (WME), i.e., facts, are discussed, and the implementation of the Rete inference algorithm in CLIPS is presented in detail. A few methods for achieving pattern-matching on objects with the current inference engine are given, and finally, the paper examines the modifications necessary to the Rete algorithm to allow direct object pattern-matching.
A rapid place name locating algorithm based on ontology qualitative retrieval, ranking and recommendation

NASA Astrophysics Data System (ADS)

Fan, Hong; Zhu, Anfeng; Zhang, Weixia

2015-12-01

In order to meet the rapid positioning of 12315 complaints, aiming at the natural language expression of telephone complaints, a semantic retrieval framework is proposed which is based on natural language parsing and geographical names ontology reasoning. Among them, a search result ranking and recommended algorithms is proposed which is regarding both geo-name conceptual similarity and spatial geometry relation similarity. The experiments show that this method can assist the operator to quickly find location of 12,315 complaints, increased industry and commerce customer satisfaction.
PDF text classification to leverage information extraction from publication reports.

PubMed

Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha

2016-06-01

Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.
Scheduling language and algorithm development study. Volume 1: Study summary and overview

NASA Technical Reports Server (NTRS)

1974-01-01

A high level computer programming language and a program library were developed to be used in writing programs for scheduling complex systems such as the space transportation system. The objectives and requirements of the study are summarized and unique features of the specified language and program library are described and related to the why of the objectives and requirements.
Clustering and Candidate Motif Detection in Exosomal miRNAs by Application of Machine Learning Algorithms.

PubMed

Gaur, Pallavi; Chaturvedi, Anoop

2017-07-22

The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.
Improved P300 speller performance using electrocorticography, spectral features, and natural language processing.

PubMed

Speier, William; Fried, Itzhak; Pouratian, Nader

2013-07-01

The P300 speller is a system designed to restore communication to patients with advanced neuromuscular disorders. This study was designed to explore the potential improvement from using electrocorticography (ECoG) compared to the more traditional usage of electroencephalography (EEG). We tested the P300 speller on two epilepsy patients with temporary subdural electrode arrays over the occipital and temporal lobes respectively. We then performed offline analysis to determine the accuracy and bit rate of the system and integrated spectral features into the classifier and used a natural language processing (NLP) algorithm to further improve the results. The subject with the occipital grid achieved an accuracy of 82.77% and a bit rate of 41.02, which improved to 96.31% and 49.47 respectively using a language model and spectral features. The temporal grid patient achieved an accuracy of 59.03% and a bit rate of 18.26 with an improvement to 75.81% and 27.05 respectively using a language model and spectral features. Spatial analysis of the individual electrodes showed best performance using signals generated and recorded near the occipital pole. Using ECoG and integrating language information and spectral features can improve the bit rate of a P300 speller system. This improvement is sensitive to the electrode placement and likely depends on visually evoked potentials. This study shows that there can be an improvement in BCI performance when using ECoG, but that it is sensitive to the electrode location. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Ocean Models and Proper Orthogonal Decomposition

NASA Astrophysics Data System (ADS)

Salas-de-Leon, D. A.

2007-05-01

The increasing computational developments and the better understanding of mathematical and physical systems resulted in an increasing number of ocean models. Long time ago, modelers were like a secret organization and recognize each other by using secret codes and languages that only a select group of people was able to recognize and understand. The access to computational systems was reduced, on one hand equipment and the using time of computers were expensive and restricted, and on the other hand, they required an advance computational languages that not everybody wanted to learn. Now a days most college freshman own a personal computer (PC or laptop), and/or have access to more sophisticated computational systems than those available for research in the early 80's. The resource availability resulted in a mayor access to all kind models. Today computer speed and time and the algorithms does not seem to be a problem, even though some models take days to run in small computational systems. Almost every oceanographic institution has their own model, what is more, in the same institution from one office to the next there are different models for the same phenomena, developed by different research member, the results does not differ substantially since the equations are the same, and the solving algorithms are similar. The algorithms and the grids, constructed with algorithms, can be found in text books and/or over the internet. Every year more sophisticated models are constructed. The Proper Orthogonal Decomposition is a technique that allows the reduction of the number of variables to solve keeping the model properties, for which it can be a very useful tool in diminishing the processes that have to be solved using "small" computational systems, making sophisticated models available for a greater community.
Experiments in concept modeling for radiographic image reports.

PubMed Central

Bell, D S; Pattison-Gordon, E; Greenes, R A

1994-01-01

OBJECTIVE: Development of methods for building concept models to support structured data entry and image retrieval in chest radiography. DESIGN: An organizing model for chest-radiographic reporting was built by analyzing manually a set of natural-language chest-radiograph reports. During model building, clinician-informaticians judged alternative conceptual structures according to four criteria: content of clinically relevant detail, provision for semantic constraints, provision for canonical forms, and simplicity. The organizing model was applied in representing three sample reports in their entirety. To explore the potential for automatic model discovery, the representation of one sample report was compared with the noun phrases derived from the same report by the CLARIT natural-language processing system. RESULTS: The organizing model for chest-radiographic reporting consists of 62 concept types and 17 relations, arranged in an inheritance network. The broadest types in the model include finding, anatomic locus, procedure, attribute, and status. Diagnoses are modeled as a subtype of finding. Representing three sample reports in their entirety added 79 narrower concept types. Some CLARIT noun phrases suggested valid associations among subtypes of finding, status, and anatomic locus. CONCLUSIONS: A manual modeling process utilizing explicitly stated criteria for making modeling decisions produced an organizing model that showed consistency in early testing. A combination of top-down and bottom-up modeling was required. Natural-language processing may inform model building, but algorithms that would replace manual modeling were not discovered. Further progress in modeling will require methods for objective model evaluation and tools for formalizing the model-building process. PMID:7719807
Design Of Computer Based Test Using The Unified Modeling Language

NASA Astrophysics Data System (ADS)

Tedyyana, Agus; Danuri; Lidyawati

2017-12-01

The Admission selection of Politeknik Negeri Bengkalis through interest and talent search (PMDK), Joint Selection of admission test for state Polytechnics (SB-UMPN) and Independent (UM-Polbeng) were conducted by using paper-based Test (PBT). Paper Based Test model has some weaknesses. They are wasting too much paper, the leaking of the questios to the public, and data manipulation of the test result. This reasearch was Aimed to create a Computer-based Test (CBT) models by using Unified Modeling Language (UML) the which consists of Use Case diagrams, Activity diagram and sequence diagrams. During the designing process of the application, it is important to pay attention on the process of giving the password for the test questions before they were shown through encryption and description process. RSA cryptography algorithm was used in this process. Then, the questions shown in the questions banks were randomized by using the Fisher-Yates Shuffle method. The network architecture used in Computer Based test application was a client-server network models and Local Area Network (LAN). The result of the design was the Computer Based Test application for admission to the selection of Politeknik Negeri Bengkalis.

Representing and computing regular languages on massively parallel networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, M.I.; O'Sullivan, J.A.; Boysam, B.

1991-01-01

This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochasticmore » diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.« less
Rapid data processing for ultrafast X-ray computed tomography using scalable and modular CUDA based pipelines

NASA Astrophysics Data System (ADS)

Frust, Tobias; Wagner, Michael; Stephan, Jan; Juckeland, Guido; Bieberle, André

2017-10-01

Ultrafast X-ray tomography is an advanced imaging technique for the study of dynamic processes basing on the principles of electron beam scanning. A typical application case for this technique is e.g. the study of multiphase flows, that is, flows of mixtures of substances such as gas-liquidflows in pipelines or chemical reactors. At Helmholtz-Zentrum Dresden-Rossendorf (HZDR) a number of such tomography scanners are operated. Currently, there are two main points limiting their application in some fields. First, after each CT scan sequence the data of the radiation detector must be downloaded from the scanner to a data processing machine. Second, the current data processing is comparably time-consuming compared to the CT scan sequence interval. To enable online observations or use this technique to control actuators in real-time, a modular and scalable data processing tool has been developed, consisting of user-definable stages working independently together in a so called data processing pipeline, that keeps up with the CT scanner's maximal frame rate of up to 8 kHz. The newly developed data processing stages are freely programmable and combinable. In order to achieve the highest processing performance all relevant data processing steps, which are required for a standard slice image reconstruction, were individually implemented in separate stages using Graphics Processing Units (GPUs) and NVIDIA's CUDA programming language. Data processing performance tests on different high-end GPUs (Tesla K20c, GeForce GTX 1080, Tesla P100) showed excellent performance. Program Files doi:http://dx.doi.org/10.17632/65sx747rvm.1 Licensing provisions: LGPLv3 Programming language: C++/CUDA Supplementary material: Test data set, used for the performance analysis. Nature of problem: Ultrafast computed tomography is performed with a scan rate of up to 8 kHz. To obtain cross-sectional images from projection data computer-based image reconstruction algorithms must be applied. The objective of the presented program is to reconstruct a data stream of around 1.3 GB s-1 in a minimum time period. Thus, the program allows to go into new fields of application and to use in the future even more compute-intensive algorithms, especially for data post-processing, to improve the quality of data analysis. Solution method: The program solves the given problem using a two-step process: first, by a generic, expandable and widely applicable template library implementing the streaming paradigm (GLADOS); second, by optimized processing stages for ultrafast computed tomography implementing the required algorithms in a performance-oriented way using CUDA (RISA). Thereby, task-parallelism between the processing stages as well as data parallelism within one processing stage is realized.
Analytical learning and term-rewriting systems

NASA Technical Reports Server (NTRS)

Laird, Philip; Gamble, Evan

1990-01-01

Analytical learning is a set of machine learning techniques for revising the representation of a theory based on a small set of examples of that theory. When the representation of the theory is correct and complete but perhaps inefficient, an important objective of such analysis is to improve the computational efficiency of the representation. Several algorithms with this purpose have been suggested, most of which are closely tied to a first order logical language and are variants of goal regression, such as the familiar explanation based generalization (EBG) procedure. But because predicate calculus is a poor representation for some domains, these learning algorithms are extended to apply to other computational models. It is shown that the goal regression technique applies to a large family of programming languages, all based on a kind of term rewriting system. Included in this family are three language families of importance to artificial intelligence: logic programming, such as Prolog; lambda calculus, such as LISP; and combinatorial based languages, such as FP. A new analytical learning algorithm, AL-2, is exhibited that learns from success but is otherwise quite different from EBG. These results suggest that term rewriting systems are a good framework for analytical learning research in general, and that further research should be directed toward developing new techniques.
Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

NASA Astrophysics Data System (ADS)

Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

2018-01-01

We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Meeting medical terminology needs--the Ontology-Enhanced Medical Concept Mapper.

PubMed

Leroy, G; Chen, H

2001-12-01

This paper describes the development and testing of the Medical Concept Mapper, a tool designed to facilitate access to online medical information sources by providing users with appropriate medical search terms for their personal queries. Our system is valuable for patients whose knowledge of medical vocabularies is inadequate to find the desired information, and for medical experts who search for information outside their field of expertise. The Medical Concept Mapper maps synonyms and semantically related concepts to a user's query. The system is unique because it integrates our natural language processing tool, i.e., the Arizona (AZ) Noun Phraser, with human-created ontologies, the Unified Medical Language System (UMLS) and WordNet, and our computer generated Concept Space, into one system. Our unique contribution results from combining the UMLS Semantic Net with Concept Space in our deep semantic parsing (DSP) algorithm. This algorithm establishes a medical query context based on the UMLS Semantic Net, which allows Concept Space terms to be filtered so as to isolate related terms relevant to the query. We performed two user studies in which Medical Concept Mapper terms were compared against human experts' terms. We conclude that the AZ Noun Phraser is well suited to extract medical phrases from user queries, that WordNet is not well suited to provide strictly medical synonyms, that the UMLS Metathesaurus is well suited to provide medical synonyms, and that Concept Space is well suited to provide related medical terms, especially when these terms are limited by our DSP algorithm.
Automatic Determination of the Need for Intravenous Contrast in Musculoskeletal MRI Examinations Using IBM Watson's Natural Language Processing Algorithm.

PubMed

Trivedi, Hari; Mesterhazy, Joseph; Laguna, Benjamin; Vu, Thienkhai; Sohn, Jae Ho

2018-04-01

Magnetic resonance imaging (MRI) protocoling can be time- and resource-intensive, and protocols can often be suboptimal dependent upon the expertise or preferences of the protocoling radiologist. Providing a best-practice recommendation for an MRI protocol has the potential to improve efficiency and decrease the likelihood of a suboptimal or erroneous study. The goal of this study was to develop and validate a machine learning-based natural language classifier that can automatically assign the use of intravenous contrast for musculoskeletal MRI protocols based upon the free-text clinical indication of the study, thereby improving efficiency of the protocoling radiologist and potentially decreasing errors. We utilized a deep learning-based natural language classification system from IBM Watson, a question-answering supercomputer that gained fame after challenging the best human players on Jeopardy! in 2011. We compared this solution to a series of traditional machine learning-based natural language processing techniques that utilize a term-document frequency matrix. Each classifier was trained with 1240 MRI protocols plus their respective clinical indications and validated with a test set of 280. Ground truth of contrast assignment was obtained from the clinical record. For evaluation of inter-reader agreement, a blinded second reader radiologist analyzed all cases and determined contrast assignment based on only the free-text clinical indication. In the test set, Watson demonstrated overall accuracy of 83.2% when compared to the original protocol. This was similar to the overall accuracy of 80.2% achieved by an ensemble of eight traditional machine learning algorithms based on a term-document matrix. When compared to the second reader's contrast assignment, Watson achieved 88.6% agreement. When evaluating only the subset of cases where the original protocol and second reader were concordant (n = 251), agreement climbed further to 90.0%. The classifier was relatively robust to spelling and grammatical errors, which were frequent. Implementation of this automated MR contrast determination system as a clinical decision support tool may save considerable time and effort of the radiologist while potentially decreasing error rates, and require no change in order entry or workflow.
Constructing Concept Schemes From Astronomical Telegrams Via Natural Language Clustering

NASA Astrophysics Data System (ADS)

Graham, Matthew; Zhang, M.; Djorgovski, S. G.; Donalek, C.; Drake, A. J.; Mahabal, A.

2012-01-01

The rapidly emerging field of time domain astronomy is one of the most exciting and vibrant new research frontiers, ranging in scientific scope from studies of the Solar System to extreme relativistic astrophysics and cosmology. It is being enabled by a new generation of large synoptic digital sky surveys - LSST, PanStarrs, CRTS - that cover large areas of sky repeatedly, looking for transient objects and phenomena. One of the biggest challenges facing these is the automated classification of transient events, a process that needs machine-processible astronomical knowledge. Semantic technologies enable the formal representation of concepts and relations within a particular domain. ATELs (http://www.astronomerstelegram.org) are a commonly-used means for reporting and commenting upon new astronomical observations of transient sources (supernovae, stellar outbursts, blazar flares, etc). However, they are loose and unstructured and employ scientific natural language for description: this makes automated processing of them - a necessity within the next decade with petascale data rates - a challenge. Nevertheless they represent a potentially rich corpus of information that could lead to new and valuable insights into transient phenomena. This project lies in the cutting-edge field of astrosemantics, a branch of astroinformatics, which applies semantic technologies to astronomy. The ATELs have been used to develop an appropriate concept scheme - a representation of the information they contain - for transient astronomy using hierarchical clustering of processed natural language. This allows us to automatically organize ATELs based on the vocabulary used. We conclude that we can use simple algorithms to process and extract meaning from astronomical textual data.
OpenCL-based vicinity computation for 3D multiresolution mesh compression

NASA Astrophysics Data System (ADS)

Hachicha, Soumaya; Elkefi, Akram; Ben Amar, Chokri

2017-03-01

3D multiresolution mesh compression systems are still widely addressed in many domains. These systems are more and more requiring volumetric data to be processed in real-time. Therefore, the performance is becoming constrained by material resources usage and an overall reduction in the computational time. In this paper, our contribution entirely lies on computing, in real-time, triangles neighborhood of 3D progressive meshes for a robust compression algorithm based on the scan-based wavelet transform(WT) technique. The originality of this latter algorithm is to compute the WT with minimum memory usage by processing data as they are acquired. However, with large data, this technique is considered poor in term of computational complexity. For that, this work exploits the GPU to accelerate the computation using OpenCL as a heterogeneous programming language. Experiments demonstrate that, aside from the portability across various platforms and the flexibility guaranteed by the OpenCL-based implementation, this method can improve performance gain in speedup factor of 5 compared to the sequential CPU implementation.
Implementation of total focusing method for phased array ultrasonic imaging on FPGA

NASA Astrophysics Data System (ADS)

Guo, JianQiang; Li, Xi; Gao, Xiaorong; Wang, Zeyong; Zhao, Quanke

2015-02-01

This paper describes a multi-FPGA imaging system dedicated for the real-time imaging using the Total Focusing Method (TFM) and Full Matrix Capture (FMC). The system was entirely described using Verilog HDL language and implemented on Altera Stratix IV GX FPGA development board. The whole algorithm process is to: establish a coordinate system of image and divide it into grids; calculate the complete acoustic distance of array element between transmitting array element and receiving array element, and transform it into index value; then index the sound pressure values from ROM and superimpose sound pressure values to get pixel value of one focus point; and calculate the pixel values of all focus points to get the final imaging. The imaging result shows that this algorithm has high SNR of defect imaging. And FPGA with parallel processing capability can provide high speed performance, so this system can provide the imaging interface, with complete function and good performance.
Artificial Intelligence in Medical Practice: The Question to the Answer?

PubMed

Miller, D Douglas; Brown, Eric W

2018-02-01

Computer science advances and ultra-fast computing speeds find artificial intelligence (AI) broadly benefitting modern society-forecasting weather, recognizing faces, detecting fraud, and deciphering genomics. AI's future role in medical practice remains an unanswered question. Machines (computers) learn to detect patterns not decipherable using biostatistics by processing massive datasets (big data) through layered mathematical models (algorithms). Correcting algorithm mistakes (training) adds to AI predictive model confidence. AI is being successfully applied for image analysis in radiology, pathology, and dermatology, with diagnostic speed exceeding, and accuracy paralleling, medical experts. While diagnostic confidence never reaches 100%, combining machines plus physicians reliably enhances system performance. Cognitive programs are impacting medical practice by applying natural language processing to read the rapidly expanding scientific literature and collate years of diverse electronic medical records. In this and other ways, AI may optimize the care trajectory of chronic disease patients, suggest precision therapies for complex illnesses, reduce medical errors, and improve subject enrollment into clinical trials. Copyright © 2018 Elsevier Inc. All rights reserved.
Language study on Spliced Semigraph using Folding techniques

NASA Astrophysics Data System (ADS)

Thiagarajan, K.; Padmashree, J.

2018-04-01

In this paper, we proposed algorithm to identify cut vertices and cut edges for n-Cut Spliced Semigraph and splicing the n-Cut Spliced Semigraph using cut vertices else cut edges or combination of cut vertex and cut edge and applying sequence of folding to the spliced semigraph to obtain the semigraph quadruple η(S)=(2, 1, 1, 1). We observed that the splicing and folding using both cut vertices and cut edges is applicable only for n-Cut Spliced Semigraph where n > 2. Also, we transformed the spliced semigraph into tree structure and studied the language for the semigraph with n+2 vertices and n+1 semivertices using Depth First Edge Sequence algorithm and obtain the language structure with sequence of alphabet ‘a’ and ‘b’.
Procedural Quantum Programming

NASA Astrophysics Data System (ADS)

Ömer, Bernhard

2002-09-01

While classical computing science has developed a variety of methods and programming languages around the concept of the universal computer, the typical description of quantum algorithms still uses a purely mathematical, non-constructive formalism which makes no difference between a hydrogen atom and a quantum computer. This paper investigates, how the concept of procedural programming languages, the most widely used classical formalism for describing and implementing algorithms, can be adopted to the field of quantum computing, and how non-classical features like the reversibility of unitary transformations, the non-observability of quantum states or the lack of copy and erase operations can be reflected semantically. It introduces the key concepts of procedural quantum programming (hybrid target architecture, operator hierarchy, quantum data types, memory management, etc.) and presents the experimental language QCL, which implements these principles.
Identification of Patients with Family History of Pancreatic Cancer - Investigation of an NLP System Portability

PubMed Central

Mehrabi, Saeed; Krishnan, Anand; Roch, Alexandra M; Schmidt, Heidi; Li, DingCheng; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, Max; Palakal, Mathew; Liu, Hongfang

2018-01-01

In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance. PMID:26262122
Data Analysis with Graphical Models: Software Tools

NASA Technical Reports Server (NTRS)

Buntine, Wray L.

1994-01-01

Probabilistic graphical models (directed and undirected Markov fields, and combined in chain graphs) are used widely in expert systems, image processing and other areas as a framework for representing and reasoning with probabilities. They come with corresponding algorithms for performing probabilistic inference. This paper discusses an extension to these models by Spiegelhalter and Gilks, plates, used to graphically model the notion of a sample. This offers a graphical specification language for representing data analysis problems. When combined with general methods for statistical inference, this also offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper outlines the framework and then presents some basic tools for the task: a graphical version of the Pitman-Koopman Theorem for the exponential family, problem decomposition, and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.
A comparison between physicians and computer algorithms for form CMS-2728 data reporting.

PubMed

Malas, Mohammed Said; Wish, Jay; Moorthi, Ranjani; Grannis, Shaun; Dexter, Paul; Duke, Jon; Moe, Sharon

2017-01-01

CMS-2728 form (Medical Evidence Report) assesses 23 comorbidities chosen to reflect poor outcomes and increased mortality risk. Previous studies questioned the validity of physician reporting on forms CMS-2728. We hypothesize that reporting of comorbidities by computer algorithms identifies more comorbidities than physician completion, and, therefore, is more reflective of underlying disease burden. We collected data from CMS-2728 forms for all 296 patients who had incident ESRD diagnosis and received chronic dialysis from 2005 through 2014 at Indiana University outpatient dialysis centers. We analyzed patients' data from electronic medical records systems that collated information from multiple health care sources. Previously utilized algorithms or natural language processing was used to extract data on 10 comorbidities for a period of up to 10 years prior to ESRD incidence. These algorithms incorporate billing codes, prescriptions, and other relevant elements. We compared the presence or unchecked status of these comorbidities on the forms to the presence or absence according to the algorithms. Computer algorithms had higher reporting of comorbidities compared to forms completion by physicians. This remained true when decreasing data span to one year and using only a single health center source. The algorithms determination was well accepted by a physician panel. Importantly, algorithms use significantly increased the expected deaths and lowered the standardized mortality ratios. Using computer algorithms showed superior identification of comorbidities for form CMS-2728 and altered standardized mortality ratios. Adapting similar algorithms in available EMR systems may offer more thorough evaluation of comorbidities and improve quality reporting. © 2016 International Society for Hemodialysis.
A Novel Application System of Assessing the Pronunciation Differences Between Chinese Children and Adults

PubMed Central

Zhang, Xiaoyang; Xue, Lei; Zhang, Zhi; Zhang, Yiwen

2016-01-01

Background: Health problems about children have been attracting much attention of parents and even the whole society all the time, among which, child-language development is a hot research topic. The experts and scholars have studied and found that the guardians taking appropriate intervention in children at the early stage can promote children’s language and cognitive ability development effectively, and carry out analysis of quantity. The intervention of Artificial Intelligence Technology has effect on the autistic spectrum disorders of children obviously. Objective and Methods: This paper presents a speech signal analysis system for children, with preprocessing of the speaker speech signal, subsequent calculation of the number in the speech of guardians and children, and some other characteristic parameters or indicators (e.g cognizable syllable number, the continuity of the language). Results: With these quantitative analysis tool and parameters, we can evaluate and analyze the quality of children’s language and cognitive ability objectively and quantitatively to provide the basis for decision-making criteria for parents. Thereby, they can adopt appropriate measures for children to promote the development of children's language and cognitive status. Conclusion: In this paper, according to the existing study of children’s language development, we put forward several indicators in the process of automatic measurement for language development which influence the formation of children’s language. From the experimental results we can see that after the pretreatment (including signal enhancement, speech activity detection), both divergence algorithm calculation results and the later words count are quite satisfactory compared with the actual situation. PMID:27583037
Formal language constrained path problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, C.; Jacob, R.; Marathe, M.

1997-07-08

In many path finding problems arising in practice, certain patterns of edge/vertex labels in the labeled graph being traversed are allowed/preferred, while others are disallowed. Motivated by such applications as intermodal transportation planning, the authors investigate the complexity of finding feasible paths in a labeled network, where the mode choice for each traveler is specified by a formal language. The main contributions of this paper include the following: (1) the authors show that the problem of finding a shortest path between a source and destination for a traveler whose mode choice is specified as a context free language is solvablemore » efficiently in polynomial time, when the mode choice is specified as a regular language they provide algorithms with improved space and time bounds; (2) in contrast, they show that the problem of finding simple paths between a source and a given destination is NP-hard, even when restricted to very simple regular expressions and/or very simple graphs; (3) for the class of treewidth bounded graphs, they show that (i) the problem of finding a regular language constrained simple path between source and a destination is solvable in polynomial time and (ii) the extension to finding context free language constrained simple paths is NP-complete. Several extensions of these results are presented in the context of finding shortest paths with additional constraints. These results significantly extend the results in [MW95]. As a corollary of the results, they obtain a polynomial time algorithm for the BEST k-SIMILAR PATH problem studied in [SJB97]. The previous best algorithm was given by [SJB97] and takes exponential time in the worst case.« less
Scheduling language and algorithm development study. Volume 2: Use of the basic language and module library

NASA Technical Reports Server (NTRS)

Chamberlain, R. A.; Cornick, D. E.; Flater, J. F.; Odoherty, R. J.; Peterson, F. M.; Ramsey, H. R.; Willoughby, J. K.

1974-01-01

The capabilities of the specified scheduling language and the program module library are outlined. The summary is written with the potential user in mind and, therefore, provides maximum insight on how the capabilities will be helpful in writing scheduling programs. Simple examples and illustrations are provided to assist the potential user in applying the capabilities of his problem.
DynAMo: A Modular Platform for Monitoring Process, Outcome, and Algorithm-Based Treatment Planning in Psychotherapy

PubMed Central

Laireiter, Anton Rupert

2017-01-01

Background In recent years, the assessment of mental disorders has become more and more personalized. Modern advancements such as Internet-enabled mobile phones and increased computing capacity make it possible to tap sources of information that have long been unavailable to mental health practitioners. Objective Software packages that combine algorithm-based treatment planning, process monitoring, and outcome monitoring are scarce. The objective of this study was to assess whether the DynAMo Web application can fill this gap by providing a software solution that can be used by both researchers to conduct state-of-the-art psychotherapy process research and clinicians to plan treatments and monitor psychotherapeutic processes. Methods In this paper, we report on the current state of a Web application that can be used for assessing the temporal structure of mental disorders using information on their temporal and synchronous associations. A treatment planning algorithm automatically interprets the data and delivers priority scores of symptoms to practitioners. The application is also capable of monitoring psychotherapeutic processes during therapy and of monitoring treatment outcomes. This application was developed using the R programming language (R Core Team, Vienna) and the Shiny Web application framework (RStudio, Inc, Boston). It is made entirely from open-source software packages and thus is easily extensible. Results The capabilities of the proposed application are demonstrated. Case illustrations are provided to exemplify its usefulness in clinical practice. Conclusions With the broad availability of Internet-enabled mobile phones and similar devices, collecting data on psychopathology and psychotherapeutic processes has become easier than ever. The proposed application is a valuable tool for capturing, processing, and visualizing these data. The combination of dynamic assessment and process- and outcome monitoring has the potential to improve the efficacy and effectiveness of psychotherapy. PMID:28729233
Using MaxCompiler for the high level synthesis of trigger algorithms

NASA Astrophysics Data System (ADS)

Summers, S.; Rose, A.; Sanders, P.

2017-02-01

Firmware for FPGA trigger applications at the CMS experiment is conventionally written using hardware description languages such as Verilog and VHDL. MaxCompiler is an alternative, Java based, tool for developing FPGA applications which uses a higher level of abstraction from the hardware than a hardware description language. An implementation of the jet and energy sum algorithms for the CMS Level-1 calorimeter trigger has been written using MaxCompiler to benchmark against the VHDL implementation in terms of accuracy, latency, resource usage, and code size. A Kalman Filter track fitting algorithm has been developed using MaxCompiler for a proposed CMS Level-1 track trigger for the High-Luminosity LHC upgrade. The design achieves a low resource usage, and has a latency of 187.5 ns per iteration.

Multilingual event extraction for epidemic detection.

PubMed

Lejeune, Gaël; Brixtel, Romain; Doucet, Antoine; Lucas, Nadine

2015-10-01

This paper presents a multilingual news surveillance system applied to tele-epidemiology. It has been shown that multilingual approaches improve timeliness in detection of epidemic events across the globe, eliminating the wait for local news to be translated into major languages. We present here a system to extract epidemic events in potentially any language, provided a Wikipedia seed for common disease names exists. The Daniel system presented herein relies on properties that are common to news writing (the journalistic genre), the most useful being repetition and saliency. Wikipedia is used to screen common disease names to be matched with repeated characters strings. Language variations, such as declensions, are handled by processing text at the character-level, rather than at the word level. This additionally makes it possible to handle various writing systems in a similar fashion. As no multilingual ground truth existed to evaluate the Daniel system, we built a multilingual corpus from the Web, and collected annotations from native speakers of Chinese, English, Greek, Polish and Russian, with no connection or interest in the Daniel system. This data set is available online freely, and can be used for the evaluation of other event extraction systems. Experiments for 5 languages out of 17 tested are detailed in this paper: Chinese, English, Greek, Polish and Russian. The Daniel system achieves an average F-measure of 82% in these 5 languages. It reaches 87% on BEcorpus, the state-of-the-art corpus in English, slightly below top-performing systems, which are tailored with numerous language-specific resources. The consistent performance of Daniel on multiple languages is an important contribution to the reactivity and the coverage of epidemiological event detection systems. Most event extraction systems rely on extensive resources that are language-specific. While their sophistication induces excellent results (over 90% precision and recall), it restricts their coverage in terms of languages and geographic areas. In contrast, in order to detect epidemic events in any language, the Daniel system only requires a list of a few hundreds of disease names and locations, which can actually be acquired automatically. The system can perform consistently well on any language, with precision and recall around 82% on average, according to this paper's evaluation. Daniel's character-based approach is especially interesting for morphologically-rich and low-resourced languages. The lack of resources to be exploited and the state of the art string matching algorithms imply that Daniel can process thousands of documents per minute on a simple laptop. In the context of epidemic surveillance, reactivity and geographic coverage are of primary importance, since no one knows where the next event will strike, and therefore in what vernacular language it will first be reported. By being able to process any language, the Daniel system offers unique coverage for poorly endowed languages, and can complete state of the art techniques for major languages. Copyright © 2015 Elsevier B.V. All rights reserved.
Flexible language constructs for large parallel programs

NASA Technical Reports Server (NTRS)

Rosing, Matthew; Schnabel, Robert

1993-01-01

The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
Automated Item Generation with Recurrent Neural Networks.

PubMed

von Davier, Matthias

2018-03-12

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.
Constraint-based Data Mining

NASA Astrophysics Data System (ADS)

Boulicaut, Jean-Francois; Jeudy, Baptiste

Knowledge Discovery in Databases (KDD) is a complex interactive process. The promising theoretical framework of inductive databases considers this is essentially a querying process. It is enabled by a query language which can deal either with raw data or patterns which hold in the data. Mining patterns turns to be the so-called inductive query evaluation process for which constraint-based Data Mining techniques have to be designed. An inductive query specifies declaratively the desired constraints and algorithms are used to compute the patterns satisfying the constraints in the data. We survey important results of this active research domain. This chapter emphasizes a real breakthrough for hard problems concerning local pattern mining under various constraints and it points out the current directions of research as well.
PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.

PubMed

Kirby, Jacqueline C; Speltz, Peter; Rasmussen, Luke V; Basford, Melissa; Gottesman, Omri; Peissig, Peggy L; Pacheco, Jennifer A; Tromp, Gerard; Pathak, Jyotishman; Carrell, David S; Ellis, Stephen B; Lingren, Todd; Thompson, Will K; Savova, Guergana; Haines, Jonathan; Roden, Dan M; Harris, Paul A; Denny, Joshua C

2016-11-01

Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Automatable algorithms to identify nonmedical opioid use using electronic data: a systematic review.

PubMed

Canan, Chelsea; Polinski, Jennifer M; Alexander, G Caleb; Kowal, Mary K; Brennan, Troyen A; Shrank, William H

2017-11-01

Improved methods to identify nonmedical opioid use can help direct health care resources to individuals who need them. Automated algorithms that use large databases of electronic health care claims or records for surveillance are a potential means to achieve this goal. In this systematic review, we reviewed the utility, attempts at validation, and application of such algorithms to detect nonmedical opioid use. We searched PubMed and Embase for articles describing automatable algorithms that used electronic health care claims or records to identify patients or prescribers with likely nonmedical opioid use. We assessed algorithm development, validation, and performance characteristics and the settings where they were applied. Study variability precluded a meta-analysis. Of 15 included algorithms, 10 targeted patients, 2 targeted providers, 2 targeted both, and 1 identified medications with high abuse potential. Most patient-focused algorithms (67%) used prescription drug claims and/or medical claims, with diagnosis codes of substance abuse and/or dependence as the reference standard. Eleven algorithms were developed via regression modeling. Four used natural language processing, data mining, audit analysis, or factor analysis. Automated algorithms can facilitate population-level surveillance. However, there is no true gold standard for determining nonmedical opioid use. Users must recognize the implications of identifying false positives and, conversely, false negatives. Few algorithms have been applied in real-world settings. Automated algorithms may facilitate identification of patients and/or providers most likely to need more intensive screening and/or intervention for nonmedical opioid use. Additional implementation research in real-world settings would clarify their utility. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability

PubMed Central

Kirby, Jacqueline C; Speltz, Peter; Rasmussen, Luke V; Basford, Melissa; Gottesman, Omri; Peissig, Peggy L; Pacheco, Jennifer A; Tromp, Gerard; Pathak, Jyotishman; Carrell, David S; Ellis, Stephen B; Lingren, Todd; Thompson, Will K; Savova, Guergana; Haines, Jonathan; Roden, Dan M; Harris, Paul A

2016-01-01

Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems. Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data. PMID:27026615
An Accurate and Efficient Algorithm for Detection of Radio Bursts with an Unknown Dispersion Measure, for Single-dish Telescopes and Interferometers

NASA Astrophysics Data System (ADS)

Zackay, Barak; Ofek, Eran O.

2017-01-01

Astronomical radio signals are subjected to phase dispersion while traveling through the interstellar medium. To optimally detect a short-duration signal within a frequency band, we have to precisely compensate for the unknown pulse dispersion, which is a computationally demanding task. We present the “fast dispersion measure transform” algorithm for optimal detection of such signals. Our algorithm has a low theoretical complexity of 2{N}f{N}t+{N}t{N}{{Δ }}{{log}}2({N}f), where Nf, Nt, and NΔ are the numbers of frequency bins, time bins, and dispersion measure bins, respectively. Unlike previously suggested fast algorithms, our algorithm conserves the sensitivity of brute-force dedispersion. Our tests indicate that this algorithm, running on a standard desktop computer and implemented in a high-level programming language, is already faster than the state-of-the-art dedispersion codes running on graphical processing units (GPUs). We also present a variant of the algorithm that can be efficiently implemented on GPUs. The latter algorithm’s computation and data-transport requirements are similar to those of a two-dimensional fast Fourier transform, indicating that incoherent dedispersion can now be considered a nonissue while planning future surveys. We further present a fast algorithm for sensitive detection of pulses shorter than the dispersive smearing limits of incoherent dedispersion. In typical cases, this algorithm is orders of magnitude faster than enumerating dispersion measures and coherently dedispersing by convolution. We analyze the computational complexity of pulsed signal searches by radio interferometers. We conclude that, using our suggested algorithms, maximally sensitive blind searches for dispersed pulses are feasible using existing facilities. We provide an implementation of these algorithms in Python and MATLAB.
Real-time processing of radar return on a parallel computer

NASA Technical Reports Server (NTRS)

Aalfs, David D.

1992-01-01

NASA is working with the FAA to demonstrate the feasibility of pulse Doppler radar as a candidate airborne sensor to detect low altitude windshears. The need to provide the pilot with timely information about possible hazards has motivated a demand for real-time processing of a radar return. Investigated here is parallel processing as a means of accommodating the high data rates required. A PC based parallel computer, called the transputer, is used to investigate issues in real time concurrent processing of radar signals. A transputer network is made up of an array of single instruction stream processors that can be networked in a variety of ways. They are easily reconfigured and software development is largely independent of the particular network topology. The performance of the transputer is evaluated in light of the computational requirements. A number of algorithms have been implemented on the transputers in OCCAM, a language specially designed for parallel processing. These include signal processing algorithms such as the Fast Fourier Transform (FFT), pulse-pair, and autoregressive modelling, as well as routing software to support concurrency. The most computationally intensive task is estimating the spectrum. Two approaches have been taken on this problem, the first and most conventional of which is to use the FFT. By using table look-ups for the basis function and other optimizing techniques, an algorithm has been developed that is sufficient for real time. The other approach is to model the signal as an autoregressive process and estimate the spectrum based on the model coefficients. This technique is attractive because it does not suffer from the spectral leakage problem inherent in the FFT. Benchmark tests indicate that autoregressive modeling is feasible in real time.
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Saini, Subhash (Technical Monitor)

1998-01-01

Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Preferences in Data Production Planning

NASA Technical Reports Server (NTRS)

Golden, Keith; Brafman, Ronen; Pang, Wanlin

2005-01-01

This paper discusses the data production problem, which consists of transforming a set of (initial) input data into a set of (goal) output data. There are typically many choices among input data and processing algorithms, each leading to significantly different end products. To discriminate among these choices, the planner supports an input language that provides a number of constructs for specifying user preferences over data (and plan) properties. We discuss these preference constructs, how we handle them to guide search, and additional challenges in the area of preference management that this important application domain offers.
Application of parallelized software architecture to an autonomous ground vehicle

NASA Astrophysics Data System (ADS)

Shakya, Rahul; Wright, Adam; Shin, Young Ho; Momin, Orko; Petkovsek, Steven; Wortman, Paul; Gautam, Prasanna; Norton, Adam

2011-01-01

This paper presents improvements made to Q, an autonomous ground vehicle designed to participate in the Intelligent Ground Vehicle Competition (IGVC). For the 2010 IGVC, Q was upgraded with a new parallelized software architecture and a new vision processor. Improvements were made to the power system reducing the number of batteries required for operation from six to one. In previous years, a single state machine was used to execute the bulk of processing activities including sensor interfacing, data processing, path planning, navigation algorithms and motor control. This inefficient approach led to poor software performance and made it difficult to maintain or modify. For IGVC 2010, the team implemented a modular parallel architecture using the National Instruments (NI) LabVIEW programming language. The new architecture divides all the necessary tasks - motor control, navigation, sensor data collection, etc. into well-organized components that execute in parallel, providing considerable flexibility and facilitating efficient use of processing power. Computer vision is used to detect white lines on the ground and determine their location relative to the robot. With the new vision processor and some optimization of the image processing algorithm used last year, two frames can be acquired and processed in 70ms. With all these improvements, Q placed 2nd in the autonomous challenge.
Anniversary Paper: Image processing and manipulation through the pages of Medical Physics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Armato, Samuel G. III; Ginneken, Bram van; Image Sciences Institute, University Medical Center Utrecht, Heidelberglaan 100, Room Q0S.459, 3584 CX Utrecht

The language of radiology has gradually evolved from ''the film'' (the foundation of radiology since Wilhelm Roentgen's 1895 discovery of x-rays) to ''the image,'' an electronic manifestation of a radiologic examination that exists within the bits and bytes of a computer. Rather than simply storing and displaying radiologic images in a static manner, the computational power of the computer may be used to enhance a radiologist's ability to visually extract information from the image through image processing and image manipulation algorithms. Image processing tools provide a broad spectrum of opportunities for image enhancement. Gray-level manipulations such as histogram equalization, spatialmore » alterations such as geometric distortion correction, preprocessing operations such as edge enhancement, and enhanced radiography techniques such as temporal subtraction provide powerful methods to improve the diagnostic quality of an image or to enhance structures of interest within an image. Furthermore, these image processing algorithms provide the building blocks of more advanced computer vision methods. The prominent role of medical physicists and the AAPM in the advancement of medical image processing methods, and in the establishment of the ''image'' as the fundamental entity in radiology and radiation oncology, has been captured in 35 volumes of Medical Physics.« less
Parallelising a molecular dynamics algorithm on a multi-processor workstation

NASA Astrophysics Data System (ADS)

Müller-Plathe, Florian

1990-12-01

The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.
Second Language Writing Classification System Based on Word-Alignment Distribution

ERIC Educational Resources Information Center

Kotani, Katsunori; Yoshimi, Takehiko

2010-01-01

The present paper introduces an automatic classification system for assisting second language (L2) writing evaluation. This system, which classifies sentences written by L2 learners as either native speaker-like or learner-like sentences, is constructed by machine learning algorithms using word-alignment distributions as classification features…
Data management and language enhancement for generalized set theory computer language for operation of large relational databases

NASA Technical Reports Server (NTRS)

Finley, Gail T.

1988-01-01

This report covers the study of the relational database implementation in the NASCAD computer program system. The existing system is used primarily for computer aided design. Attention is also directed to a hidden-surface algorithm for final drawing output.
Dill: an algorithm and a symbolic software package for doing classical supersymmetry calculations

NASA Astrophysics Data System (ADS)

Luc̆ić, Vladan

1995-11-01

An algorithm is presented that formalizes different steps in a classical Supersymmetric (SUSY) calculation. Based on the algorithm Dill, a symbolic software package, that can perform the calculations, is developed in the Mathematica programming language. While the algorithm is quite general, the package is created for the 4 - D, N = 1 model. Nevertheless, with little modification, the package could be used for other SUSY models. The package has been tested and some of the results are presented.
Algorithmic Classification of Five Characteristic Types of Paraphasias.

PubMed

Fergadiotis, Gerasimos; Gorman, Kyle; Bedrick, Steven

2016-12-01

This study was intended to evaluate a series of algorithms developed to perform automatic classification of paraphasic errors (formal, semantic, mixed, neologistic, and unrelated errors). We analyzed 7,111 paraphasias from the Moss Aphasia Psycholinguistics Project Database (Mirman et al., 2010) and evaluated the classification accuracy of 3 automated tools. First, we used frequency norms from the SUBTLEXus database (Brysbaert & New, 2009) to differentiate nonword errors and real-word productions. Then we implemented a phonological-similarity algorithm to identify phonologically related real-word errors. Last, we assessed the performance of a semantic-similarity criterion that was based on word2vec (Mikolov, Yih, & Zweig, 2013). Overall, the algorithmic classification replicated human scoring for the major categories of paraphasias studied with high accuracy. The tool that was based on the SUBTLEXus frequency norms was more than 97% accurate in making lexicality judgments. The phonological-similarity criterion was approximately 91% accurate, and the overall classification accuracy of the semantic classifier ranged from 86% to 90%. Overall, the results highlight the potential of tools from the field of natural language processing for the development of highly reliable, cost-effective diagnostic tools suitable for collecting high-quality measurement data for research and clinical purposes.
The efficiency of geophysical adjoint codes generated by automatic differentiation tools

NASA Astrophysics Data System (ADS)

Vlasenko, A. V.; Köhl, A.; Stammer, D.

2016-02-01

The accuracy of numerical models that describe complex physical or chemical processes depends on the choice of model parameters. Estimating an optimal set of parameters by optimization algorithms requires knowledge of the sensitivity of the process of interest to model parameters. Typically the sensitivity computation involves differentiation of the model, which can be performed by applying algorithmic differentiation (AD) tools to the underlying numerical code. However, existing AD tools differ substantially in design, legibility and computational efficiency. In this study we show that, for geophysical data assimilation problems of varying complexity, the performance of adjoint codes generated by the existing AD tools (i) Open_AD, (ii) Tapenade, (iii) NAGWare and (iv) Transformation of Algorithms in Fortran (TAF) can be vastly different. Based on simple test problems, we evaluate the efficiency of each AD tool with respect to computational speed, accuracy of the adjoint, the efficiency of memory usage, and the capability of each AD tool to handle modern FORTRAN 90-95 elements such as structures and pointers, which are new elements that either combine groups of variables or provide aliases to memory addresses, respectively. We show that, while operator overloading tools are the only ones suitable for modern codes written in object-oriented programming languages, their computational efficiency lags behind source transformation by orders of magnitude, rendering the application of these modern tools to practical assimilation problems prohibitive. In contrast, the application of source transformation tools appears to be the most efficient choice, allowing handling even large geophysical data assimilation problems. However, they can only be applied to numerical models written in earlier generations of programming languages. Our study indicates that applying existing AD tools to realistic geophysical problems faces limitations that urgently need to be solved to allow the continuous use of AD tools for solving geophysical problems on modern computer architectures.
The Design of SimpleITK.

PubMed

Lowekamp, Bradley C; Chen, David T; Ibáñez, Luis; Blezek, Daniel

2013-01-01

SimpleITK is a new interface to the Insight Segmentation and Registration Toolkit (ITK) designed to facilitate rapid prototyping, education and scientific activities via high level programming languages. ITK is a templated C++ library of image processing algorithms and frameworks for biomedical and other applications, and it was designed to be generic, flexible and extensible. Initially, ITK provided a direct wrapping interface to languages such as Python and Tcl through the WrapITK system. Unlike WrapITK, which exposed ITK's complex templated interface, SimpleITK was designed to provide an easy to use and simplified interface to ITK's algorithms. It includes procedural methods, hides ITK's demand driven pipeline, and provides a template-less layer. Also SimpleITK provides practical conveniences such as binary distribution packages and overloaded operators. Our user-friendly design goals dictated a departure from the direct interface wrapping approach of WrapITK, toward a new facade class structure that only exposes the required functionality, hiding ITK's extensive template use. Internally SimpleITK utilizes a manual description of each filter with code-generation and advanced C++ meta-programming to provide the higher-level interface, bringing the capabilities of ITK to a wider audience. SimpleITK is licensed as open source software library under the Apache License Version 2.0 and more information about downloading it can be found at http://www.simpleitk.org.

A biometric authentication model using hand gesture images.

PubMed

Fong, Simon; Zhuang, Yan; Fister, Iztok; Fister, Iztok

2013-10-30

A novel hand biometric authentication method based on measurements of the user's stationary hand gesture of hand sign language is proposed. The measurement of hand gestures could be sequentially acquired by a low-cost video camera. There could possibly be another level of contextual information, associated with these hand signs to be used in biometric authentication. As an analogue, instead of typing a password 'iloveu' in text which is relatively vulnerable over a communication network, a signer can encode a biometric password using a sequence of hand signs, 'i' , 'l' , 'o' , 'v' , 'e' , and 'u'. Subsequently the features from the hand gesture images are extracted which are integrally fuzzy in nature, to be recognized by a classification model for telling if this signer is who he claimed himself to be, by examining over his hand shape and the postures in doing those signs. It is believed that everybody has certain slight but unique behavioral characteristics in sign language, so are the different hand shape compositions. Simple and efficient image processing algorithms are used in hand sign recognition, including intensity profiling, color histogram and dimensionality analysis, coupled with several popular machine learning algorithms. Computer simulation is conducted for investigating the efficacy of this novel biometric authentication model which shows up to 93.75% recognition accuracy.
Feature Selection for Speech Emotion Recognition in Spanish and Basque: On the Use of Machine Learning to Improve Human-Computer Interaction

PubMed Central

Arruti, Andoni; Cearreta, Idoia; Álvarez, Aitor; Lazkano, Elena; Sierra, Basilio

2014-01-01

Study of emotions in human–computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested. PMID:25279686
Automated detection of hospital outbreaks: A systematic review of methods.

PubMed

Leclère, Brice; Buckeridge, David L; Boëlle, Pierre-Yves; Astagneau, Pascal; Lepelletier, Didier

2017-01-01

Several automated algorithms for epidemiological surveillance in hospitals have been proposed. However, the usefulness of these methods to detect nosocomial outbreaks remains unclear. The goal of this review was to describe outbreak detection algorithms that have been tested within hospitals, consider how they were evaluated, and synthesize their results. We developed a search query using keywords associated with hospital outbreak detection and searched the MEDLINE database. To ensure the highest sensitivity, no limitations were initially imposed on publication languages and dates, although we subsequently excluded studies published before 2000. Every study that described a method to detect outbreaks within hospitals was included, without any exclusion based on study design. Additional studies were identified through citations in retrieved studies. Twenty-nine studies were included. The detection algorithms were grouped into 5 categories: simple thresholds (n = 6), statistical process control (n = 12), scan statistics (n = 6), traditional statistical models (n = 6), and data mining methods (n = 4). The evaluation of the algorithms was often solely descriptive (n = 15), but more complex epidemiological criteria were also investigated (n = 10). The performance measures varied widely between studies: e.g., the sensitivity of an algorithm in a real world setting could vary between 17 and 100%. Even if outbreak detection algorithms are useful complementary tools for traditional surveillance, the heterogeneity in results among published studies does not support quantitative synthesis of their performance. A standardized framework should be followed when evaluating outbreak detection methods to allow comparison of algorithms across studies and synthesis of results.
GPU-completeness: theory and implications

NASA Astrophysics Data System (ADS)

Lin, I.-Jong

2011-01-01

This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
An Algorithm for Controlled Integration of Sound and Text.

ERIC Educational Resources Information Center

Wohlert, Harry S.; McCormick, Martin

1985-01-01

A serious drawback in introducing sound into computer programs for teaching foreign language speech has been the lack of an algorithm to turn off the cassette recorder immediately to keep screen text and audio in synchronization. This article describes a program which solves that problem. (SED)
Post interaural neural net-based vowel recognition

NASA Astrophysics Data System (ADS)

Jouny, Ismail I.

2001-10-01

Interaural head related transfer functions are used to process speech signatures prior to neural net based recognition. Data representing the head related transfer function of a dummy has been collected at MIT and made available on the Internet. This data is used to pre-process vowel signatures to mimic the effects of human ear on speech perception. Signatures representing various vowels of the English language are then presented to a multi-layer perceptron trained using the back propagation algorithm for recognition purposes. The focus in this paper is to assess the effects of human interaural system on vowel recognition performance particularly when using a classification system that mimics the human brain such as a neural net.
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
General purpose graphics-processing-unit implementation of cosmological domain wall network evolution.

PubMed

Correia, J R C C C; Martins, C J A P

2017-10-01

Topological defects unavoidably form at symmetry breaking phase transitions in the early universe. To probe the parameter space of theoretical models and set tighter experimental constraints (exploiting the recent advances in astrophysical observations), one requires more and more demanding simulations, and therefore more hardware resources and computation time. Improving the speed and efficiency of existing codes is essential. Here we present a general purpose graphics-processing-unit implementation of the canonical Press-Ryden-Spergel algorithm for the evolution of cosmological domain wall networks. This is ported to the Open Computing Language standard, and as a consequence significant speedups are achieved both in two-dimensional (2D) and 3D simulations.
Interfaces and Integration of Medical Image Analysis Frameworks: Challenges and Opportunities.

PubMed

Covington, Kelsie; McCreedy, Evan S; Chen, Min; Carass, Aaron; Aucoin, Nicole; Landman, Bennett A

2010-05-25

Clinical research with medical imaging typically involves large-scale data analysis with interdependent software toolsets tied together in a processing workflow. Numerous, complementary platforms are available, but these are not readily compatible in terms of workflows or data formats. Both image scientists and clinical investigators could benefit from using the framework which is a most natural fit to the specific problem at hand, but pragmatic choices often dictate that a compromise platform is used for collaboration. Manual merging of platforms through carefully tuned scripts has been effective, but exceptionally time consuming and is not feasible for large-scale integration efforts. Hence, the benefits of innovation are constrained by platform dependence. Removing this constraint via integration of algorithms from one framework into another is the focus of this work. We propose and demonstrate a light-weight interface system to expose parameters across platforms and provide seamless integration. In this initial effort, we focus on four platforms Medical Image Analysis and Visualization (MIPAV), Java Image Science Toolkit (JIST), command line tools, and 3D Slicer. We explore three case studies: (1) providing a system for MIPAV to expose internal algorithms and utilize these algorithms within JIST, (2) exposing JIST modules through self-documenting command line interface for inclusion in scripting environments, and (3) detecting and using JIST modules in 3D Slicer. We review the challenges and opportunities for light-weight software integration both within development language (e.g., Java in MIPAV and JIST) and across languages (e.g., C/C++ in 3D Slicer and shell in command line tools).
Design Report for Low Power Acoustic Detector

DTIC Science & Technology

2013-08-01

high speed integrated circuit (VHSIC) hardware description language ( VHDL ) implementation of both the HED and DCD detectors. Figures 4 and 5 show the...the hardware design, target detection algorithm design in both MATLAB and VHDL , and typical performance results. 15. SUBJECT TERMS Acoustic low...5 2.4 Algorithm Implementation ..............................................................................................6 3. Testing
Scheduling language and algorithm development study. Volume 1, phase 2: Design considerations for a scheduling and resource allocation system

NASA Technical Reports Server (NTRS)

Morrell, R. A.; Odoherty, R. J.; Ramsey, H. R.; Reynolds, C. C.; Willoughby, J. K.; Working, R. D.

1975-01-01

Data and analyses related to a variety of algorithms for solving typical large-scale scheduling and resource allocation problems are presented. The capabilities and deficiencies of various alternative problem solving strategies are discussed from the viewpoint of computer system design.
Computational provenance in hydrologic science: a snow mapping example.

PubMed

Dozier, Jeff; Frew, James

2009-03-13

Computational provenance--a record of the antecedents and processing history of digital information--is key to properly documenting computer-based scientific research. To support investigations in hydrologic science, we produce the daily fractional snow-covered area from NASA's moderate-resolution imaging spectroradiometer (MODIS). From the MODIS reflectance data in seven wavelengths, we estimate the fraction of each 500 m pixel that snow covers. The daily products have data gaps and errors because of cloud cover and sensor viewing geometry, so we interpolate and smooth to produce our best estimate of the daily snow cover. To manage the data, we have developed the Earth System Science Server (ES3), a software environment for data-intensive Earth science, with unique capabilities for automatically and transparently capturing and managing the provenance of arbitrary computations. Transparent acquisition avoids the scientists having to express their computations in specific languages or schemas in order for provenance to be acquired and maintained. ES3 models provenance as relationships between processes and their input and output files. It is particularly suited to capturing the provenance of an evolving algorithm whose components span multiple languages and execution environments.
ProjectQ Software Framework

NASA Astrophysics Data System (ADS)

Steiger, Damian S.; Haener, Thomas; Troyer, Matthias

Quantum computers promise to transform our notions of computation by offering a completely new paradigm. A high level quantum programming language and optimizing compilers are essential components to achieve scalable quantum computation. In order to address this, we introduce the ProjectQ software framework - an open source effort to support both theorists and experimentalists by providing intuitive tools to implement and run quantum algorithms. Here, we present our ProjectQ quantum compiler, which compiles a quantum algorithm from our high-level Python-embedded language down to low-level quantum gates available on the target system. We demonstrate how this compiler can be used to control actual hardware and to run high-performance simulations.
Automatic Thread-Level Parallelization in the Chombo AMR Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Christen, Matthias; Keen, Noel; Ligocki, Terry

2011-05-26

The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number ofmore » existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.« less
A Specification for a Godunov-type Eulerian 2-D Hydrocode, Revision 0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nystrom, William D; Robey, Jonathan M

2012-05-01

The purpose of this code specification is to describe an algorithm for solving the Euler equations of hydrodynamics in a 2D rectangular region in sufficient detail to allow a software developer to produce an implementation on their target platform using their programming language of choice without requiring detailed knowledge and experience in the field of computational fluid dynamics. It should be possible for a software developer who is proficient in the programming language of choice and is knowledgable of the target hardware to produce an efficient implementation of this specification if they also possess a thorough working knowledge of parallelmore » programming and have some experience in scientific programming using fields and meshes. On modern architectures, it will be important to focus on issues related to the exploitation of the fine grain parallelism and data locality present in this algorithm. This specification aims to make that task easier by presenting the essential details of the algorithm in a systematic and language neutral manner while also avoiding the inclusion of implementation details that would likely be specific to a particular type of programming paradigm or platform architecture.« less
Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

NASA Astrophysics Data System (ADS)

Caballero Morales, Santiago Omar; Cox, Stephen J.

2009-12-01

Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.
An empirical generative framework for computational modeling of language acquisition.

PubMed

Waterfall, Heidi R; Sandbank, Ben; Onnis, Luca; Edelman, Shimon

2010-06-01

This paper reports progress in developing a computer model of language acquisition in the form of (1) a generative grammar that is (2) algorithmically learnable from realistic corpus data, (3) viable in its large-scale quantitative performance and (4) psychologically real. First, we describe new algorithmic methods for unsupervised learning of generative grammars from raw CHILDES data and give an account of the generative performance of the acquired grammars. Next, we summarize findings from recent longitudinal and experimental work that suggests how certain statistically prominent structural properties of child-directed speech may facilitate language acquisition. We then present a series of new analyses of CHILDES data indicating that the desired properties are indeed present in realistic child-directed speech corpora. Finally, we suggest how our computational results, behavioral findings, and corpus-based insights can be integrated into a next-generation model aimed at meeting the four requirements of our modeling framework.
Darwin v. 2.0: an interpreted computer language for the biosciences.

PubMed

Gonnet, G H; Hallett, M T; Korostensky, C; Bernardin, L

2000-02-01

We announce the availability of the second release of Darwin v. 2.0, an interpreted computer language especially tailored to researchers in the biosciences. The system is a general tool applicable to a wide range of problems. This second release improves Darwin version 1.6 in several ways: it now contains (1) a larger set of libraries touching most of the classical problems from computational biology (pairwise alignment, all versus all alignments, tree construction, multiple sequence alignment), (2) an expanded set of general purpose algorithms (search algorithms for discrete problems, matrix decomposition routines, complex/long integer arithmetic operations), (3) an improved language with a cleaner syntax, (4) better on-line help, and (5) a number of fixes to user-reported bugs. Darwin is made available for most operating systems free of char ge from the Computational Biochemistry Research Group (CBRG), reachable at http://chrg.inf.ethz.ch. darwin@inf.ethz.ch
Experimental validation of improved 3D SBP positioning algorithm in PET applications using UW Phase II Board

NASA Astrophysics Data System (ADS)

Jorge, L. S.; Bonifacio, D. A. B.; DeWitt, Don; Miyaoka, R. S.

2016-12-01

Continuous scintillator-based detectors have been considered as a competitive and cheaper approach than highly pixelated discrete crystal positron emission tomography (PET) detectors, despite the need for algorithms to estimate 3D gamma interaction position. In this work, we report on the implementation of a positioning algorithm to estimate the 3D interaction position in a continuous crystal PET detector using a Field Programmable Gate Array (FPGA). The evaluated method is the Statistics-Based Processing (SBP) technique that requires light response function and event position characterization. An algorithm has been implemented using the Verilog language and evaluated using a data acquisition board that contains an Altera Stratix III FPGA. The 3D SBP algorithm was previously successfully implemented on a Stratix II FPGA using simulated data and a different module design. In this work, improvements were made to the FPGA coding of the 3D positioning algorithm, reducing the total memory usage to around 34%. Further the algorithm was evaluated using experimental data from a continuous miniature crystal element (cMiCE) detector module. Using our new implementation, average FWHM (Full Width at Half Maximum) for the whole block is 1.71±0.01 mm, 1.70±0.01 mm and 1.632±0.005 mm for x, y and z directions, respectively. Using a pipelined architecture, the FPGA is able to process 245,000 events per second for interactions inside of the central area of the detector that represents 64% of the total block area. The weighted average of the event rate by regional area (corner, border and central regions) is about 198,000 events per second. This event rate is greater than the maximum expected coincidence rate for any given detector module in future PET systems using the cMiCE detector design.
CERES: A new cerebellum lobule segmentation method.

PubMed

Romero, Jose E; Coupé, Pierrick; Giraud, Rémi; Ta, Vinh-Thong; Fonov, Vladimir; Park, Min Tae M; Chakravarty, M Mallar; Voineskos, Aristotle N; Manjón, Jose V

2017-02-15

The human cerebellum is involved in language, motor tasks and cognitive processes such as attention or emotional processing. Therefore, an automatic and accurate segmentation method is highly desirable to measure and understand the cerebellum role in normal and pathological brain development. In this work, we propose a patch-based multi-atlas segmentation tool called CERES (CEREbellum Segmentation) that is able to automatically parcellate the cerebellum lobules. The proposed method works with standard resolution magnetic resonance T1-weighted images and uses the Optimized PatchMatch algorithm to speed up the patch matching process. The proposed method was compared with related recent state-of-the-art methods showing competitive results in both accuracy (average DICE of 0.7729) and execution time (around 5 minutes). Copyright © 2016 Elsevier Inc. All rights reserved.

Natural language processing in an intelligent writing strategy tutoring system.

PubMed

McNamara, Danielle S; Crossley, Scott A; Roscoe, Rod

2013-06-01

The Writing Pal is an intelligent tutoring system that provides writing strategy training. A large part of its artificial intelligence resides in the natural language processing algorithms to assess essay quality and guide feedback to students. Because writing is often highly nuanced and subjective, the development of these algorithms must consider a broad array of linguistic, rhetorical, and contextual features. This study assesses the potential for computational indices to predict human ratings of essay quality. Past studies have demonstrated that linguistic indices related to lexical diversity, word frequency, and syntactic complexity are significant predictors of human judgments of essay quality but that indices of cohesion are not. The present study extends prior work by including a larger data sample and an expanded set of indices to assess new lexical, syntactic, cohesion, rhetorical, and reading ease indices. Three models were assessed. The model reported by McNamara, Crossley, and McCarthy (Written Communication 27:57-86, 2010) including three indices of lexical diversity, word frequency, and syntactic complexity accounted for only 6% of the variance in the larger data set. A regression model including the full set of indices examined in prior studies of writing predicted 38% of the variance in human scores of essay quality with 91% adjacent accuracy (i.e., within 1 point). A regression model that also included new indices related to rhetoric and cohesion predicted 44% of the variance with 94% adjacent accuracy. The new indices increased accuracy but, more importantly, afford the means to provide more meaningful feedback in the context of a writing tutoring system.
Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

PubMed

Williams, N J; Nasuto, S J; Saddy, J D

2015-07-30

The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.
A Natural Language Processing-based Model to Automate MRI Brain Protocol Selection and Prioritization.

PubMed

Brown, Andrew D; Marotta, Thomas R

2017-02-01

Incorrect imaging protocol selection can contribute to increased healthcare cost and waste. To help healthcare providers improve the quality and safety of medical imaging services, we developed and evaluated three natural language processing (NLP) models to determine whether NLP techniques could be employed to aid in clinical decision support for protocoling and prioritization of magnetic resonance imaging (MRI) brain examinations. To test the feasibility of using an NLP model to support clinical decision making for MRI brain examinations, we designed three different medical imaging prediction tasks, each with a unique outcome: selecting an examination protocol, evaluating the need for contrast administration, and determining priority. We created three models for each prediction task, each using a different classification algorithm-random forest, support vector machine, or k-nearest neighbor-to predict outcomes based on the narrative clinical indications and demographic data associated with 13,982 MRI brain examinations performed from January 1, 2013 to June 30, 2015. Test datasets were used to calculate the accuracy, sensitivity and specificity, predictive values, and the area under the curve. Our optimal results show an accuracy of 82.9%, 83.0%, and 88.2% for the protocol selection, contrast administration, and prioritization tasks, respectively, demonstrating that predictive algorithms can be used to aid in clinical decision support for examination protocoling. NLP models developed from the narrative clinical information provided by referring clinicians and demographic data are feasible methods to predict the protocol and priority of MRI brain examinations. Copyright © 2017 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Scheduling language and algorithm development study. Volume 2, phase 2: Introduction to plans programming. [user guide

NASA Technical Reports Server (NTRS)

Cochran, D. R.; Ishikawa, M. K.; Paulson, R. E.; Ramsey, H. R.

1975-01-01

A user guide for the Programming Language for Allocation and Network Scheduling (PLANS) is presented. Information is included for the construction of PLANS programs. The basic philosophy of PLANS is discussed, and access and update reference techniques are described along with the use of tree structures.
Utilizing a language model to improve online dynamic data collection in P300 spellers.

PubMed

Mainsah, Boyla O; Colwell, Kenneth A; Collins, Leslie M; Throckmorton, Chandra S

2014-07-01

P300 spellers provide a means of communication for individuals with severe physical limitations, especially those with locked-in syndrome, such as amyotrophic lateral sclerosis. However, P300 speller use is still limited by relatively low communication rates due to the multiple data measurements that are required to improve the signal-to-noise ratio of event-related potentials for increased accuracy. Therefore, the amount of data collection has competing effects on accuracy and spelling speed. Adaptively varying the amount of data collection prior to character selection has been shown to improve spelling accuracy and speed. The goal of this study was to optimize a previously developed dynamic stopping algorithm that uses a Bayesian approach to control data collection by incorporating a priori knowledge via a language model. Participants ( n = 17) completed online spelling tasks using the dynamic stopping algorithm, with and without a language model. The addition of the language model resulted in improved participant performance from a mean theoretical bit rate of 46.12 bits/min at 88.89% accuracy to 54.42 bits/min ( ) at 90.36% accuracy.
Flexible Language Constructs for Large Parallel Programs

DOE PAGES

Rosing, Matt; Schnabel, Robert

1994-01-01

The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less
Toward a molecular programming language for algorithmic self-assembly

NASA Astrophysics Data System (ADS)

Patitz, Matthew John

Self-assembly is the process whereby relatively simple components autonomously combine to form more complex objects. Nature exhibits self-assembly to form everything from microscopic crystals to living cells to galaxies. With a desire to both form increasingly sophisticated products and to understand the basic components of living systems, scientists have developed and studied artificial self-assembling systems. One such framework is the Tile Assembly Model introduced by Erik Winfree in 1998. In this model, simple two-dimensional square 'tiles' are designed so that they self-assemble into desired shapes. The work in this thesis consists of a series of results which build toward the future goal of designing an abstracted, high-level programming language for designing the molecular components of self-assembling systems which can perform powerful computations and form into intricate structures. The first two sets of results demonstrate self-assembling systems which perform infinite series of computations that characterize computably enumerable and decidable languages, and exhibit tools for algorithmically generating the necessary sets of tiles. In the next chapter, methods for generating tile sets which self-assemble into complicated shapes, namely a class of discrete self-similar fractal structures, are presented. Next, a software package for graphically designing tile sets, simulating their self-assembly, and debugging designed systems is discussed. Finally, a high-level programming language which abstracts much of the complexity and tedium of designing such systems, while preventing many of the common errors, is presented. The summation of this body of work presents a broad coverage of the spectrum of desired outputs from artificial self-assembling systems and a progression in the sophistication of tools used to design them. By creating a broader and deeper set of modular tools for designing self-assembling systems, we hope to increase the complexity which is attainable. These tools provide a solid foundation for future work in both the Tile Assembly Model and explorations into more advanced models.
An Efficient Universal Trajectory Language

NASA Technical Reports Server (NTRS)

Hagen, George E.; Guerreiro, Nelson M.; Maddalon, Jeffrey M.; Butler, Ricky W.

2017-01-01

The Efficient Universal Trajectory Language (EUTL) is a language for specifying and representing trajectories for Air Traffic Management (ATM) concepts such as Trajectory-Based Operations (TBO). In these concepts, the communication of a trajectory between an aircraft and ground automation is fundamental. Historically, this trajectory exchange has not been done, leading to trajectory definitions that have been centered around particular application domains and, therefore, are not well suited for TBO applications. The EUTL trajectory language has been defined in the Prototype Verification System (PVS) formal specification language, which provides an operational semantics for the EUTL language. The hope is that EUTL will provide a foundation for mathematically verified algorithms that manipulate trajectories. Additionally, the EUTL language provides well-defined methods to unambiguously determine position and velocity information between the reported trajectory points. In this paper, we present the EUTL trajectory language in mathematical detail.
Development of a Real Time Sparse Non-Negative Matrix Factorization Module for Cochlear Implants by Using xPC Target

PubMed Central

Hu, Hongmei; Krasoulis, Agamemnon; Lutman, Mark; Bleeck, Stefan

2013-01-01

Cochlear implants (CIS) require efficient speech processing to maximize information transmission to the brain, especially in noise. A novel CI processing strategy was proposed in our previous studies, in which sparsity-constrained non-negative matrix factorization (NMF) was applied to the envelope matrix in order to improve the CI performance in noisy environments. It showed that the algorithm needs to be adaptive, rather than fixed, in order to adjust to acoustical conditions and individual characteristics. Here, we explore the benefit of a system that allows the user to adjust the signal processing in real time according to their individual listening needs and their individual hearing capabilities. In this system, which is based on MATLAB®, SIMULINK® and the xPC Target™ environment, the input/outupt (I/O) boards are interfaced between the SIMULINK blocks and the CI stimulation system, such that the output can be controlled successfully in the manner of a hardware-in-the-loop (HIL) simulation, hence offering a convenient way to implement a real time signal processing module that does not require any low level language. The sparsity constrained parameter of the algorithm was adapted online subjectively during an experiment with normal-hearing subjects and noise vocoded speech simulation. Results show that subjects chose different parameter values according to their own intelligibility preferences, indicating that adaptive real time algorithms are beneficial to fully explore subjective preferences. We conclude that the adaptive real time systems are beneficial for the experimental design, and such systems allow one to conduct psychophysical experiments with high ecological validity. PMID:24129021
Development of a real time sparse non-negative matrix factorization module for cochlear implants by using xPC target.

PubMed

Hu, Hongmei; Krasoulis, Agamemnon; Lutman, Mark; Bleeck, Stefan

2013-10-14

Cochlear implants (CIs) require efficient speech processing to maximize information transmission to the brain, especially in noise. A novel CI processing strategy was proposed in our previous studies, in which sparsity-constrained non-negative matrix factorization (NMF) was applied to the envelope matrix in order to improve the CI performance in noisy environments. It showed that the algorithm needs to be adaptive, rather than fixed, in order to adjust to acoustical conditions and individual characteristics. Here, we explore the benefit of a system that allows the user to adjust the signal processing in real time according to their individual listening needs and their individual hearing capabilities. In this system, which is based on MATLAB®, SIMULINK® and the xPC Target™ environment, the input/outupt (I/O) boards are interfaced between the SIMULINK blocks and the CI stimulation system, such that the output can be controlled successfully in the manner of a hardware-in-the-loop (HIL) simulation, hence offering a convenient way to implement a real time signal processing module that does not require any low level language. The sparsity constrained parameter of the algorithm was adapted online subjectively during an experiment with normal-hearing subjects and noise vocoded speech simulation. Results show that subjects chose different parameter values according to their own intelligibility preferences, indicating that adaptive real time algorithms are beneficial to fully explore subjective preferences. We conclude that the adaptive real time systems are beneficial for the experimental design, and such systems allow one to conduct psychophysical experiments with high ecological validity.
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.

PubMed

Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A

2016-06-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
IdentiPy: An Extensible Search Engine for Protein Identification in Shotgun Proteomics.

PubMed

Levitsky, Lev I; Ivanov, Mark V; Lobas, Anna A; Bubis, Julia A; Tarasova, Irina A; Solovyeva, Elizaveta M; Pridatchenko, Marina L; Gorshkov, Mikhail V

2018-06-18

We present an open-source, extensible search engine for shotgun proteomics. Implemented in Python programming language, IdentiPy shows competitive processing speed and sensitivity compared with the state-of-the-art search engines. It is equipped with a user-friendly web interface, IdentiPy Server, enabling the use of a single server installation accessed from multiple workstations. Using a simplified version of X!Tandem scoring algorithm and its novel "autotune" feature, IdentiPy outperforms the popular alternatives on high-resolution data sets. Autotune adjusts the search parameters for the particular data set, resulting in improved search efficiency and simplifying the user experience. IdentiPy with the autotune feature shows higher sensitivity compared with the evaluated search engines. IdentiPy Server has built-in postprocessing and protein inference procedures and provides graphic visualization of the statistical properties of the data set and the search results. It is open-source and can be freely extended to use third-party scoring functions or processing algorithms and allows customization of the search workflow for specialized applications.
Development of Educational Support System for Algorithm using Flowchart

NASA Astrophysics Data System (ADS)

Ohchi, Masashi; Aoki, Noriyuki; Furukawa, Tatsuya; Takayama, Kanta

Recently, an information technology is indispensable for the business and industrial developments. However, it has been a social problem that the number of software developers has been insufficient. To solve the problem, it is necessary to develop and implement the environment for learning the algorithm and programming language. In the paper, we will describe the algorithm study support system for a programmer using the flowchart. Since the proposed system uses Graphical User Interface(GUI), it will become easy for a programmer to understand the algorithm in programs.
Advanced End-to-end Simulation for On-board Processing (AESOP)

NASA Technical Reports Server (NTRS)

Mazer, Alan S.

1994-01-01

Developers of data compression algorithms typically use their own software together with commercial packages to implement, evaluate and demonstrate their work. While convenient for an individual developer, this approach makes it difficult to build on or use another's work without intimate knowledge of each component. When several people or groups work on different parts of the same problem, the larger view can be lost. What's needed is a simple piece of software to stand in the gap and link together the efforts of different people, enabling them to build on each other's work, and providing a base for engineers and scientists to evaluate the parts as a cohesive whole and make design decisions. AESOP (Advanced End-to-end Simulation for On-board Processing) attempts to meet this need by providing a graphical interface to a developer-selected set of algorithms, interfacing with compiled code and standalone programs, as well as procedures written in the IDL and PV-Wave command languages. As a proof of concept, AESOP is outfitted with several data compression algorithms integrating previous work on different processors (AT&T DSP32C, TI TMS320C30, SPARC). The user can specify at run-time the processor on which individual parts of the compression should run. Compressed data is then fed through simulated transmission and uncompression to evaluate the effects of compression parameters, noise and error correction algorithms. The following sections describe AESOP in detail. Section 2 describes fundamental goals for usability. Section 3 describes the implementation. Sections 4 through 5 describe how to add new functionality to the system and present the existing data compression algorithms. Sections 6 and 7 discuss portability and future work.
The design and implementation of signal decomposition system of CL multi-wavelet transform based on DSP builder

NASA Astrophysics Data System (ADS)

Huang, Yan; Wang, Zhihui

2015-12-01

With the development of FPGA, DSP Builder is widely applied to design system-level algorithms. The algorithm of CL multi-wavelet is more advanced and effective than scalar wavelets in processing signal decomposition. Thus, a system of CL multi-wavelet based on DSP Builder is designed for the first time in this paper. The system mainly contains three parts: a pre-filtering subsystem, a one-level decomposition subsystem and a two-level decomposition subsystem. It can be converted into hardware language VHDL by the Signal Complier block that can be used in Quartus II. After analyzing the energy indicator, it shows that this system outperforms Daubenchies wavelet in signal decomposition. Furthermore, it has proved to be suitable for the implementation of signal fusion based on SoPC hardware, and it will become a solid foundation in this new field.
Bottom-Up Evaluation of Twig Join Pattern Queries in XML Document Databases

NASA Astrophysics Data System (ADS)

Chen, Yangjun

Since the extensible markup language XML emerged as a new standard for information representation and exchange on the Internet, the problem of storing, indexing, and querying XML documents has been among the major issues of database research. In this paper, we study the twig pattern matching and discuss a new algorithm for processing ordered twig pattern queries. The time complexity of the algorithmis bounded by O(|D|·|Q| + |T|·leaf Q ) and its space overhead is by O(leaf T ·leaf Q ), where T stands for a document tree, Q for a twig pattern and D is a largest data stream associated with a node q of Q, which contains the database nodes that match the node predicate at q. leaf T (leaf Q ) represents the number of the leaf nodes of T (resp. Q). In addition, the algorithm can be adapted to an indexing environment with XB-trees being used.
Text-based Analytics for Biosurveillance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Charles, Lauren E.; Smith, William P.; Rounds, Jeremiah

The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related tomore » biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when). The ability to prevent, mitigate, or control a biological threat depends on how quickly the threat is identified and characterized. Ensuring the timely delivery of data and analytics is an essential aspect of providing adequate situational awareness in the face of a disease outbreak. This chapter outlines an analytic pipeline for supporting an advanced early warning system that can integrate multiple data sources and provide situational awareness of potential and occurring disease situations. The pipeline, includes real-time automated data analysis founded on natural language processing (NLP), semantic concept matching, and machine learning techniques, to enrich content with metadata related to biosurveillance. Online news articles are presented as an example use case for the pipeline, but the processes can be generalized to any textual data. In this chapter, the mechanics of a streaming pipeline are briefly discussed as well as the major steps required to provide targeted situational awareness. The text-based analytic pipeline includes various processing steps as well as identifying article relevance to biosurveillance (e.g., relevance algorithm) and article feature extraction (who, what, where, why, how, and when).« less
Interactive remote data processing using Pixelize Wavelet Filtration (PWF-method) and PeriodMap analysis

NASA Astrophysics Data System (ADS)

Sych, Robert; Nakariakov, Valery; Anfinogentov, Sergey

Wavelet analysis is suitable for investigating waves and oscillating in solar atmosphere, which are limited in both time and frequency. We have developed an algorithms to detect this waves by use the Pixelize Wavelet Filtration (PWF-method). This method allows to obtain information about the presence of propagating and non-propagating waves in the data observation (cube images), and localize them precisely in time as well in space. We tested the algorithm and found that the results of coronal waves detection are consistent with those obtained by visual inspection. For fast exploration of the data cube, in addition, we applied early-developed Period- Map analysis. This method based on the Fast Fourier Transform and allows on initial stage quickly to look for "hot" regions with the peak harmonic oscillations and determine spatial distribution at the significant harmonics. We propose the detection procedure of coronal waves separate on two parts: at the first part, we apply the PeriodMap analysis (fast preparation) and than, at the second part, use information about spatial distribution of oscillation sources to apply the PWF-method (slow preparation). There are two possible algorithms working with the data: in automatic and hands-on operation mode. Firstly we use multiply PWF analysis as a preparation narrowband maps at frequency subbands multiply two and/or harmonic PWF analysis for separate harmonics in a spectrum. Secondly we manually select necessary spectral subband and temporal interval and than construct narrowband maps. For practical implementation of the proposed methods, we have developed the remote data processing system at Institute of Solar-Terrestrial Physics, Irkutsk. The system based on the data processing server - http://pwf.iszf.irk.ru. The main aim of this resource is calculation in remote access through the local and/or global network (Internet) narrowband maps of wave's sources both in whole spectral band and at significant harmonics. In addition, we can obtain temporal dynamics (mpeg- files) of the main oscillation characteristics: amplitude, power and phase as a spatial-temporal coordinates. For periodogram mapping of data cubes as a method for the pre-analysis, we developed preparation of the color maps where the pixel's colour corresponds to the frequency of the power spectrum maximum. The computer system based on applications ION-scripts, algorithmic languages IDL and PHP, and Apache WEB server. The IDL ION-scripts use for preparation and configuration of network requests at the central data server with subsequent connection to IDL run-unit software and graphic output on FTP-server and screen. Web page is constructed using PHP language.
Executable medical guidelines with Arden Syntax-Applications in dermatology and obstetrics.

PubMed

Seitinger, Alexander; Rappelsberger, Andrea; Leitich, Harald; Binder, Michael; Adlassnig, Klaus-Peter

2016-08-12

Clinical decision support systems (CDSSs) are being developed to assist physicians in processing extensive data and new knowledge based on recent scientific advances. Structured medical knowledge in the form of clinical alerts or reminder rules, decision trees or tables, clinical protocols or practice guidelines, score algorithms, and others, constitute the core of CDSSs. Several medical knowledge representation and guideline languages have been developed for the formal computerized definition of such knowledge. One of these languages is Arden Syntax for Medical Logic Systems, an International Health Level Seven (HL7) standard whose development started in 1989. Its latest version is 2.10, which was presented in 2014. In the present report we discuss Arden Syntax as a modern medical knowledge representation and processing language, and show that this language is not only well suited to define clinical alerts, reminders, and recommendations, but can also be used to implement and process computerized medical practice guidelines. This section describes how contemporary software such as Java, server software, web-services, XML, is used to implement CDSSs based on Arden Syntax. Special emphasis is given to clinical decision support (CDS) that employs practice guidelines as its clinical knowledge base. Two guideline-based applications using Arden Syntax for medical knowledge representation and processing were developed. The first is a software platform for implementing practice guidelines from dermatology. This application employs fuzzy set theory and logic to represent linguistic and propositional uncertainty in medical data, knowledge, and conclusions. The second application implements a reminder system based on clinically published standard operating procedures in obstetrics to prevent deviations from state-of-the-art care. A to-do list with necessary actions specifically tailored to the gestational week/labor/delivery is generated. Today, with the latest versions of Arden Syntax and the application of contemporary software development methods, Arden Syntax has become a powerful and versatile medical knowledge representation and processing language, well suited to implement a large range of CDSSs, including clinical-practice-guideline-based CDSSs. Moreover, such CDS is provided and can be shared as a service by different medical institutions, redefining the sharing of medical knowledge. Arden Syntax is also highly flexible and provides developers the freedom to use up-to-date software design and programming patterns for external patient data access. Copyright © 2016. Published by Elsevier B.V.
EquiX-A Search and Query Language for XML.

ERIC Educational Resources Information Center

Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

2002-01-01

Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

APGEN Scheduling: 15 Years of Experience in Planning Automation

NASA Technical Reports Server (NTRS)

Maldague, Pierre F.; Wissler, Steve; Lenda, Matthew; Finnerty, Daniel

2014-01-01

In this paper, we discuss the scheduling capability of APGEN (Activity Plan Generator), a multi-mission planning application that is part of the NASA AMMOS (Advanced Multi- Mission Operations System), and how APGEN scheduling evolved over its applications to specific Space Missions. Our analysis identifies two major reasons for the successful application of APGEN scheduling to real problems: an expressive DSL (Domain-Specific Language) for formulating scheduling algorithms, and a well-defined process for enlisting the help of auxiliary modeling tools in providing high-fidelity, system-level simulations of the combined spacecraft and ground support system.
The Agent of extracting Internet Information with Lead Order

NASA Astrophysics Data System (ADS)

Mo, Zan; Huang, Chuliang; Liu, Aijun

In order to carry out e-commerce better, advanced technologies to access business information are in need urgently. An agent is described to deal with the problems of extracting internet information that caused by the non-standard and skimble-scamble structure of Chinese websites. The agent designed includes three modules which respond to the process of extracting information separately. A method of HTTP tree and a kind of Lead algorithm is proposed to generate a lead order, with which the required web can be retrieved easily. How to transform the extracted information structuralized with natural language is also discussed.
Machine learning to parse breast pathology reports in Chinese.

PubMed

Tang, Rong; Ouyang, Lizhi; Li, Clara; He, Yue; Griffin, Molly; Taghian, Alphonse; Smith, Barbara; Yala, Adam; Barzilay, Regina; Hughes, Kevin

2018-06-01

Large structured databases of pathology findings are valuable in deriving new clinical insights. However, they are labor intensive to create and generally require manual annotation. There has been some work in the bioinformatics community to support automating this work via machine learning in English. Our contribution is to provide an automated approach to construct such structured databases in Chinese, and to set the stage for extraction from other languages. We collected 2104 de-identified Chinese benign and malignant breast pathology reports from Hunan Cancer Hospital. Physicians with native Chinese proficiency reviewed the reports and annotated a variety of binary and numerical pathologic entities. After excluding 78 cases with a bilateral lesion in the same report, 1216 cases were used as a training set for the algorithm, which was then refined by 405 development cases. The Natural language processing algorithm was tested by using the remaining 405 cases to evaluate the machine learning outcome. The model was used to extract 13 binary entities and 8 numerical entities. When compared to physicians with native Chinese proficiency, the model showed a per-entity accuracy from 91 to 100% for all common diagnoses on the test set. The overall accuracy of binary entities was 98% and of numerical entities was 95%. In a per-report evaluation for binary entities with more than 100 training cases, 85% of all the testing reports were completely correct and 11% had an error in 1 out of 22 entities. We have demonstrated that Chinese breast pathology reports can be automatically parsed into structured data using standard machine learning approaches. The results of our study demonstrate that techniques effective in parsing English reports can be scaled to other languages.
Automated identification of drug and food allergies entered using non-standard terminology.

PubMed

Epstein, Richard H; St Jacques, Paul; Stockin, Michael; Rothman, Brian; Ehrenfeld, Jesse M; Denny, Joshua C

2013-01-01

An accurate computable representation of food and drug allergy is essential for safe healthcare. Our goal was to develop a high-performance, easily maintained algorithm to identify medication and food allergies and sensitivities from unstructured allergy entries in electronic health record (EHR) systems. An algorithm was developed in Transact-SQL to identify ingredients to which patients had allergies in a perioperative information management system. The algorithm used RxNorm and natural language processing techniques developed on a training set of 24 599 entries from 9445 records. Accuracy, specificity, precision, recall, and F-measure were determined for the training dataset and repeated for the testing dataset (24 857 entries from 9430 records). Accuracy, precision, recall, and F-measure for medication allergy matches were all above 98% in the training dataset and above 97% in the testing dataset for all allergy entries. Corresponding values for food allergy matches were above 97% and above 93%, respectively. Specificities of the algorithm were 90.3% and 85.0% for drug matches and 100% and 88.9% for food matches in the training and testing datasets, respectively. The algorithm had high performance for identification of medication and food allergies. Maintenance is practical, as updates are managed through upload of new RxNorm versions and additions to companion database tables. However, direct entry of codified allergy information by providers (through autocompleters or drop lists) is still preferred to post-hoc encoding of the data. Data tables used in the algorithm are available for download. A high performing, easily maintained algorithm can successfully identify medication and food allergies from free text entries in EHR systems.
Automated detection of hospital outbreaks: A systematic review of methods

PubMed Central

Buckeridge, David L.; Lepelletier, Didier

2017-01-01

Objectives Several automated algorithms for epidemiological surveillance in hospitals have been proposed. However, the usefulness of these methods to detect nosocomial outbreaks remains unclear. The goal of this review was to describe outbreak detection algorithms that have been tested within hospitals, consider how they were evaluated, and synthesize their results. Methods We developed a search query using keywords associated with hospital outbreak detection and searched the MEDLINE database. To ensure the highest sensitivity, no limitations were initially imposed on publication languages and dates, although we subsequently excluded studies published before 2000. Every study that described a method to detect outbreaks within hospitals was included, without any exclusion based on study design. Additional studies were identified through citations in retrieved studies. Results Twenty-nine studies were included. The detection algorithms were grouped into 5 categories: simple thresholds (n = 6), statistical process control (n = 12), scan statistics (n = 6), traditional statistical models (n = 6), and data mining methods (n = 4). The evaluation of the algorithms was often solely descriptive (n = 15), but more complex epidemiological criteria were also investigated (n = 10). The performance measures varied widely between studies: e.g., the sensitivity of an algorithm in a real world setting could vary between 17 and 100%. Conclusion Even if outbreak detection algorithms are useful complementary tools for traditional surveillance, the heterogeneity in results among published studies does not support quantitative synthesis of their performance. A standardized framework should be followed when evaluating outbreak detection methods to allow comparison of algorithms across studies and synthesis of results. PMID:28441422
CUTEX: CUrvature Thresholding EXtractor

NASA Astrophysics Data System (ADS)

Molinari, S.; Schisano, E.; Faustini, F.; Pestalozzi, M.; di Giorgio, A. M.; Liu, S.

2017-08-01

CuTEx analyzes images in the infrared bands and extracts sources from complex backgrounds, particularly star-forming regions that offer the challenges of crowding, having a highly spatially variable background, and having no-psf profiles such as protostars in their accreting phase. The code is composed of two main algorithms, the first an algorithm for source detection, and the second for flux extraction. The code is originally written in IDL language and it was exported in the license free GDL language. CuTEx could be used in other bands or in scientific cases different from the native case. This software is also available as an on-line tool from the Multi-Mission Interactive Archive web pages dedicated to the Herschel Observatory.
DynAMo: A Modular Platform for Monitoring Process, Outcome, and Algorithm-Based Treatment Planning in Psychotherapy.

PubMed

Kaiser, Tim; Laireiter, Anton Rupert

2017-07-20

In recent years, the assessment of mental disorders has become more and more personalized. Modern advancements such as Internet-enabled mobile phones and increased computing capacity make it possible to tap sources of information that have long been unavailable to mental health practitioners. Software packages that combine algorithm-based treatment planning, process monitoring, and outcome monitoring are scarce. The objective of this study was to assess whether the DynAMo Web application can fill this gap by providing a software solution that can be used by both researchers to conduct state-of-the-art psychotherapy process research and clinicians to plan treatments and monitor psychotherapeutic processes. In this paper, we report on the current state of a Web application that can be used for assessing the temporal structure of mental disorders using information on their temporal and synchronous associations. A treatment planning algorithm automatically interprets the data and delivers priority scores of symptoms to practitioners. The application is also capable of monitoring psychotherapeutic processes during therapy and of monitoring treatment outcomes. This application was developed using the R programming language (R Core Team, Vienna) and the Shiny Web application framework (RStudio, Inc, Boston). It is made entirely from open-source software packages and thus is easily extensible. The capabilities of the proposed application are demonstrated. Case illustrations are provided to exemplify its usefulness in clinical practice. With the broad availability of Internet-enabled mobile phones and similar devices, collecting data on psychopathology and psychotherapeutic processes has become easier than ever. The proposed application is a valuable tool for capturing, processing, and visualizing these data. The combination of dynamic assessment and process- and outcome monitoring has the potential to improve the efficacy and effectiveness of psychotherapy. ©Tim Kaiser, Anton Rupert Laireiter. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 20.07.2017.
A Semantic Approach for Geospatial Information Extraction from Unstructured Documents

NASA Astrophysics Data System (ADS)

Sallaberry, Christian; Gaio, Mauro; Lesbegueries, Julien; Loustau, Pierre

Local cultural heritage document collections are characterized by their content, which is strongly attached to a territory and its land history (i.e., geographical references). Our contribution aims at making the content retrieval process more efficient whenever a query includes geographic criteria. We propose a core model for a formal representation of geographic information. It takes into account characteristics of different modes of expression, such as written language, captures of drawings, maps, photographs, etc. We have developed a prototype that fully implements geographic information extraction (IE) and geographic information retrieval (IR) processes. All PIV prototype processing resources are designed as Web Services. We propose a geographic IE process based on semantic treatment as a supplement to classical IE approaches. We implement geographic IR by using intersection computing algorithms that seek out any intersection between formal geocoded representations of geographic information in a user query and similar representations in document collection indexes.
Noise-robust speech triage.

PubMed

Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav

2018-04-01

A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).
Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

NASA Astrophysics Data System (ADS)

Work, Paul R.

1991-12-01

This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
Modeling in the State Flow Environment to Support Launch Vehicle Verification Testing for Mission and Fault Management Algorithms in the NASA Space Launch System

NASA Technical Reports Server (NTRS)

Trevino, Luis; Berg, Peter; England, Dwight; Johnson, Stephen B.

2016-01-01

Analysis methods and testing processes are essential activities in the engineering development and verification of the National Aeronautics and Space Administration's (NASA) new Space Launch System (SLS). Central to mission success is reliable verification of the Mission and Fault Management (M&FM) algorithms for the SLS launch vehicle (LV) flight software. This is particularly difficult because M&FM algorithms integrate and operate LV subsystems, which consist of diverse forms of hardware and software themselves, with equally diverse integration from the engineering disciplines of LV subsystems. M&FM operation of SLS requires a changing mix of LV automation. During pre-launch the LV is primarily operated by the Kennedy Space Center (KSC) Ground Systems Development and Operations (GSDO) organization with some LV automation of time-critical functions, and much more autonomous LV operations during ascent that have crucial interactions with the Orion crew capsule, its astronauts, and with mission controllers at the Johnson Space Center. M&FM algorithms must perform all nominal mission commanding via the flight computer to control LV states from pre-launch through disposal and also address failure conditions by initiating autonomous or commanded aborts (crew capsule escape from the failing LV), redundancy management of failing subsystems and components, and safing actions to reduce or prevent threats to ground systems and crew. To address the criticality of the verification testing of these algorithms, the NASA M&FM team has utilized the State Flow environment6 (SFE) with its existing Vehicle Management End-to-End Testbed (VMET) platform which also hosts vendor-supplied physics-based LV subsystem models. The human-derived M&FM algorithms are designed and vetted in Integrated Development Teams composed of design and development disciplines such as Systems Engineering, Flight Software (FSW), Safety and Mission Assurance (S&MA) and major subsystems and vehicle elements such as Main Propulsion Systems (MPS), boosters, avionics, Guidance, Navigation, and Control (GN&C), Thrust Vector Control (TVC), liquid engines, and the astronaut crew office. Since the algorithms are realized using model-based engineering (MBE) methods from a hybrid of the Unified Modeling Language (UML) and Systems Modeling Language (SysML), SFE methods are a natural fit to provide an in depth analysis of the interactive behavior of these algorithms with the SLS LV subsystem models. For this, the M&FM algorithms and the SLS LV subsystem models are modeled using constructs provided by Matlab which also enables modeling of the accompanying interfaces providing greater flexibility for integrated testing and analysis, which helps forecast expected behavior in forward VMET integrated testing activities. In VMET, the M&FM algorithms are prototyped and implemented using the same C++ programming language and similar state machine architectural concepts used by the FSW group. Due to the interactive complexity of the algorithms, VMET testing thus far has verified all the individual M&FM subsystem algorithms with select subsystem vendor models but is steadily progressing to assessing the interactive behavior of these algorithms with LV subsystems, as represented by subsystem models. The novel SFE applications has proven to be useful for quick look analysis into early integrated system behavior and assessment of the M&FM algorithms with the modeled LV subsystems. This early MBE analysis generates vital insight into the integrated system behaviors, algorithm sensitivities, design issues, and has aided in the debugging of the M&FM algorithms well before full testing can begin in more expensive, higher fidelity but more arduous environments such as VMET, FSW testing, and the Systems Integration Lab7 (SIL). SFE has exhibited both expected and unexpected behaviors in nominal and off nominal test cases prior to full VMET testing. In many findings, these behavioral characteristics were used to correct the M&FM algorithms, enable better test coverage, and develop more effective test cases for each of the LV subsystems. This has improved the fidelity of testing and planning for the next generation of M&FM algorithms as the SLS program evolves from non-crewed to crewed flight, impacting subsystem configurations and the M&FM algorithms that control them. SFE analysis has improved robustness and reliability of the M&FM algorithms by revealing implementation errors and documentation inconsistencies. It is also improving planning efficiency for future VMET testing of the M&FM algorithms hosted in the LV flight computers, further reducing risk for the SLS launch infrastructure, the SLS LV, and most importantly the crew.
Beyond the "c" and the "x": Learning with Algorithms in Massive Open Online Courses (MOOCs)

ERIC Educational Resources Information Center

Knox, Jeremy

2018-01-01

This article examines how algorithms are shaping student learning in massive open online courses (MOOCs). Following the dramatic rise of MOOC platform organisations in 2012, over 4,500 MOOCs have been offered to date, in increasingly diverse languages, and with a growing requirement for fees. However, discussions of "learning" in MOOCs…
Calculus: A Computer Oriented Presentation, Part 1 [and] Part 2.

ERIC Educational Resources Information Center

Stenberg, Warren; Walker, Robert J.

Parts one and two of a one-year computer-oriented calculus course (without analytic geometry) are presented. The ideas of calculus are introduced and motivated through computer (i.e., algorithmic) concepts. An introduction to computing via algorithms and a simple flow chart language allows the book to be self-contained, except that material on…
Fully automated reconstruction of three-dimensional vascular tree structures from two orthogonal views using computational algorithms and productionrules

NASA Astrophysics Data System (ADS)

Liu, Iching; Sun, Ying

1992-10-01

A system for reconstructing 3-D vascular structure from two orthogonally projected images is presented. The formidable problem of matching segments between two views is solved using knowledge of the epipolar constraint and the similarity of segment geometry and connectivity. The knowledge is represented in a rule-based system, which also controls the operation of several computational algorithms for tracking segments in each image, representing 2-D segments with directed graphs, and reconstructing 3-D segments from matching 2-D segment pairs. Uncertain reasoning governs the interaction between segmentation and matching; it also provides a framework for resolving the matching ambiguities in an iterative way. The system was implemented in the C language and the C Language Integrated Production System (CLIPS) expert system shell. Using video images of a tree model, the standard deviation of reconstructed centerlines was estimated to be 0.8 mm (1.7 mm) when the view direction was parallel (perpendicular) to the epipolar plane. Feasibility of clinical use was shown using x-ray angiograms of a human chest phantom. The correspondence of vessel segments between two views was accurate. Computational time for the entire reconstruction process was under 30 s on a workstation. A fully automated system for two-view reconstruction that does not require the a priori knowledge of vascular anatomy is demonstrated.
The Design of SimpleITK

PubMed Central

Lowekamp, Bradley C.; Chen, David T.; Ibáñez, Luis; Blezek, Daniel

2013-01-01

SimpleITK is a new interface to the Insight Segmentation and Registration Toolkit (ITK) designed to facilitate rapid prototyping, education and scientific activities via high level programming languages. ITK is a templated C++ library of image processing algorithms and frameworks for biomedical and other applications, and it was designed to be generic, flexible and extensible. Initially, ITK provided a direct wrapping interface to languages such as Python and Tcl through the WrapITK system. Unlike WrapITK, which exposed ITK's complex templated interface, SimpleITK was designed to provide an easy to use and simplified interface to ITK's algorithms. It includes procedural methods, hides ITK's demand driven pipeline, and provides a template-less layer. Also SimpleITK provides practical conveniences such as binary distribution packages and overloaded operators. Our user-friendly design goals dictated a departure from the direct interface wrapping approach of WrapITK, toward a new facade class structure that only exposes the required functionality, hiding ITK's extensive template use. Internally SimpleITK utilizes a manual description of each filter with code-generation and advanced C++ meta-programming to provide the higher-level interface, bringing the capabilities of ITK to a wider audience. SimpleITK is licensed as open source software library under the Apache License Version 2.0 and more information about downloading it can be found at http://www.simpleitk.org. PMID:24416015
A biometric authentication model using hand gesture images

PubMed Central

2013-01-01

A novel hand biometric authentication method based on measurements of the user’s stationary hand gesture of hand sign language is proposed. The measurement of hand gestures could be sequentially acquired by a low-cost video camera. There could possibly be another level of contextual information, associated with these hand signs to be used in biometric authentication. As an analogue, instead of typing a password ‘iloveu’ in text which is relatively vulnerable over a communication network, a signer can encode a biometric password using a sequence of hand signs, ‘i’ , ‘l’ , ‘o’ , ‘v’ , ‘e’ , and ‘u’. Subsequently the features from the hand gesture images are extracted which are integrally fuzzy in nature, to be recognized by a classification model for telling if this signer is who he claimed himself to be, by examining over his hand shape and the postures in doing those signs. It is believed that everybody has certain slight but unique behavioral characteristics in sign language, so are the different hand shape compositions. Simple and efficient image processing algorithms are used in hand sign recognition, including intensity profiling, color histogram and dimensionality analysis, coupled with several popular machine learning algorithms. Computer simulation is conducted for investigating the efficacy of this novel biometric authentication model which shows up to 93.75% recognition accuracy. PMID:24172288
Algorithms in the historical emergence of word senses.

PubMed

Ramiro, Christian; Srinivasan, Mahesh; Malt, Barbara C; Xu, Yang

2018-03-06

Human language relies on a finite lexicon to express a potentially infinite set of ideas. A key result of this tension is that words acquire novel senses over time. However, the cognitive processes that underlie the historical emergence of new word senses are poorly understood. Here, we present a computational framework that formalizes competing views of how new senses of a word might emerge by attaching to existing senses of the word. We test the ability of the models to predict the temporal order in which the senses of individual words have emerged, using an historical lexicon of English spanning the past millennium. Our findings suggest that word senses emerge in predictable ways, following an historical path that reflects cognitive efficiency, predominantly through a process of nearest-neighbor chaining. Our work contributes a formal account of the generative processes that underlie lexical evolution.
PuReD-MCL: a graph-based PubMed document clustering methodology.

PubMed

Theodosiou, T; Darzentas, N; Angelis, L; Ouzounis, C A

2008-09-01

Biomedical literature is the principal repository of biomedical knowledge, with PubMed being the most complete database collecting, organizing and analyzing such textual knowledge. There are numerous efforts that attempt to exploit this information by using text mining and machine learning techniques. We developed a novel approach, called PuReD-MCL (Pubmed Related Documents-MCL), which is based on the graph clustering algorithm MCL and relevant resources from PubMed. PuReD-MCL avoids using natural language processing (NLP) techniques directly; instead, it takes advantage of existing resources, available from PubMed. PuReD-MCL then clusters documents efficiently using the MCL graph clustering algorithm, which is based on graph flow simulation. This process allows users to analyse the results by highlighting important clues, and finally to visualize the clusters and all relevant information using an interactive graph layout algorithm, for instance BioLayout Express 3D. The methodology was applied to two different datasets, previously used for the validation of the document clustering tool TextQuest. The first dataset involves the organisms Escherichia coli and yeast, whereas the second is related to Drosophila development. PuReD-MCL successfully reproduces the annotated results obtained from TextQuest, while at the same time provides additional insights into the clusters and the corresponding documents. Source code in perl and R are available from http://tartara.csd.auth.gr/~theodos/
A common type system for clinical natural language processing

PubMed Central

2013-01-01

Background One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. Results We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. Conclusions We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types. PMID:23286462
A common type system for clinical natural language processing.

PubMed

Wu, Stephen T; Kaggal, Vinod C; Dligach, Dmitriy; Masanz, James J; Chen, Pei; Becker, Lee; Chapman, Wendy W; Savova, Guergana K; Liu, Hongfang; Chute, Christopher G

2013-01-03

One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.

chemf: A purely functional chemistry toolkit.

PubMed

Höck, Stefan; Riedl, Rainer

2012-12-20

Although programming in a type-safe and referentially transparent style offers several advantages over working with mutable data structures and side effects, this style of programming has not seen much use in chemistry-related software. Since functional programming languages were designed with referential transparency in mind, these languages offer a lot of support when writing immutable data structures and side-effects free code. We therefore started implementing our own toolkit based on the above programming paradigms in a modern, versatile programming language. We present our initial results with functional programming in chemistry by first describing an immutable data structure for molecular graphs together with a couple of simple algorithms to calculate basic molecular properties before writing a complete SMILES parser in accordance with the OpenSMILES specification. Along the way we show how to deal with input validation, error handling, bulk operations, and parallelization in a purely functional way. At the end we also analyze and improve our algorithms and data structures in terms of performance and compare it to existing toolkits both object-oriented and purely functional. All code was written in Scala, a modern multi-paradigm programming language with a strong support for functional programming and a highly sophisticated type system. We have successfully made the first important steps towards a purely functional chemistry toolkit. The data structures and algorithms presented in this article perform well while at the same time they can be safely used in parallelized applications, such as computer aided drug design experiments, without further adjustments. This stands in contrast to existing object-oriented toolkits where thread safety of data structures and algorithms is a deliberate design decision that can be hard to implement. Finally, the level of type-safety achieved by Scala highly increased the reliability of our code as well as the productivity of the programmers involved in this project.
chemf: A purely functional chemistry toolkit

PubMed Central

2012-01-01

Background Although programming in a type-safe and referentially transparent style offers several advantages over working with mutable data structures and side effects, this style of programming has not seen much use in chemistry-related software. Since functional programming languages were designed with referential transparency in mind, these languages offer a lot of support when writing immutable data structures and side-effects free code. We therefore started implementing our own toolkit based on the above programming paradigms in a modern, versatile programming language. Results We present our initial results with functional programming in chemistry by first describing an immutable data structure for molecular graphs together with a couple of simple algorithms to calculate basic molecular properties before writing a complete SMILES parser in accordance with the OpenSMILES specification. Along the way we show how to deal with input validation, error handling, bulk operations, and parallelization in a purely functional way. At the end we also analyze and improve our algorithms and data structures in terms of performance and compare it to existing toolkits both object-oriented and purely functional. All code was written in Scala, a modern multi-paradigm programming language with a strong support for functional programming and a highly sophisticated type system. Conclusions We have successfully made the first important steps towards a purely functional chemistry toolkit. The data structures and algorithms presented in this article perform well while at the same time they can be safely used in parallelized applications, such as computer aided drug design experiments, without further adjustments. This stands in contrast to existing object-oriented toolkits where thread safety of data structures and algorithms is a deliberate design decision that can be hard to implement. Finally, the level of type-safety achieved by Scala highly increased the reliability of our code as well as the productivity of the programmers involved in this project. PMID:23253942
An Open-Source Sandbox for Increasing the Accessibility of Functional Programming to the Bioinformatics and Scientific Communities

PubMed Central

Fenwick, Matthew; Sesanker, Colbert; Schiller, Martin R.; Ellis, Heidi JC; Hinman, M. Lee; Vyas, Jay; Gryk, Michael R.

2012-01-01

Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org). PMID:25328913
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wagner, Robert; Rivers, Wilmer

any single computer program for seismic data analysis will not have all the capabilities needed to study reference events, since hese detailed studies will be highly specialized. It may be necessary to develop and test new algorithms, and then these special ;odes must be integrated with existing software to use their conventional data-processing routines. We have investigated two neans of establishing communications between the legacy and new codes: CORBA and XML/SOAP Web services. We have nvestigated making new Java code communicate with a legacy C-language program, geotool, running under Linux. Both methods vere successful, but both were difficult to implement.more » C programs on UNIX/Linux are poorly supported for Web services, compared vith the Java and .NET languages and platforms. Easier-to-use middleware will be required for scientists to construct distributed applications as easily as stand-alone ones. Considerable difficulty was encountered in modifying geotool, and this problem shows he need to use component-based user interfaces instead of large C-language codes where changes to one part of the program nay introduce side effects into other parts. We have nevertheless made bug fixes and enhancements to that legacy program, but t remains difficult to expand it through communications with external software.« less
An Open-Source Sandbox for Increasing the Accessibility of Functional Programming to the Bioinformatics and Scientific Communities.

PubMed

Fenwick, Matthew; Sesanker, Colbert; Schiller, Martin R; Ellis, Heidi Jc; Hinman, M Lee; Vyas, Jay; Gryk, Michael R

2012-01-01

Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org).
Thai Automatic Speech Recognition

DTIC Science & Technology

2005-01-01

used in an external DARPA evaluation involving medical scenarios between an American Doctor and a naïve monolingual Thai patient. 2. Thai Language... dictionary generation more challenging, and (3) the lack of word segmentation, which calls for automatic segmentation approaches to make n-gram language...requires a dictionary and provides various segmentation algorithms to automatically select suitable segmentations. Here we used a maximal matching
Genetic programming over context-free languages with linear constraints for the knapsack problem: first results.

PubMed

Bruhn, Peter; Geyer-Schulz, Andreas

2002-01-01

In this paper, we introduce genetic programming over context-free languages with linear constraints for combinatorial optimization, apply this method to several variants of the multidimensional knapsack problem, and discuss its performance relative to Michalewicz's genetic algorithm with penalty functions. With respect to Michalewicz's approach, we demonstrate that genetic programming over context-free languages with linear constraints improves convergence. A final result is that genetic programming over context-free languages with linear constraints is ideally suited to modeling complementarities between items in a knapsack problem: The more complementarities in the problem, the stronger the performance in comparison to its competitors.
Modal Logics with Counting

NASA Astrophysics Data System (ADS)

Areces, Carlos; Hoffmann, Guillaume; Denis, Alexandre

We present a modal language that includes explicit operators to count the number of elements that a model might include in the extension of a formula, and we discuss how this logic has been previously investigated under different guises. We show that the language is related to graded modalities and to hybrid logics. We illustrate a possible application of the language to the treatment of plural objects and queries in natural language. We investigate the expressive power of this logic via bisimulations, discuss the complexity of its satisfiability problem, define a new reasoning task that retrieves the cardinality bound of the extension of a given input formula, and provide an algorithm to solve it.
Direct volumetric rendering based on point primitives in OpenGL.

PubMed

da Rosa, André Luiz Miranda; de Almeida Souza, Ilana; Yuuji Hira, Adilson; Zuffo, Marcelo Knörich

2006-01-01

The aim of this project is to present a renderization by software algorithm of acquired volumetric data. The algorithm was implemented in Java language and the LWJGL graphical library was used, allowing the volume renderization by software and thus preventing the necessity to acquire specific graphical boards for the 3D reconstruction. The considered algorithm creates a model in OpenGL, through point primitives, where each voxel becomes a point with the color values related to this pixel position in the corresponding images.
Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods.

PubMed

Patel, Tejal A; Puppala, Mamta; Ogunti, Richard O; Ensor, Joe E; He, Tiancheng; Shewale, Jitesh B; Ankerst, Donna P; Kaklamani, Virginia G; Rodriguez, Angel A; Wong, Stephen T C; Chang, Jenny C

2017-01-01

A key challenge to mining electronic health records for mammography research is the preponderance of unstructured narrative text, which strikingly limits usable output. The imaging characteristics of breast cancer subtypes have been described previously, but without standardization of parameters for data mining. The authors searched the enterprise-wide data warehouse at the Houston Methodist Hospital, the Methodist Environment for Translational Enhancement and Outcomes Research (METEOR), for patients with Breast Imaging Reporting and Data System (BI-RADS) category 5 mammogram readings performed between January 2006 and May 2015 and an available pathology report. The authors developed natural language processing (NLP) software algorithms to automatically extract mammographic and pathologic findings from free text mammogram and pathology reports. The correlation between mammographic imaging features and breast cancer subtype was analyzed using one-way analysis of variance and the Fisher exact test. The NLP algorithm was able to obtain key characteristics for 543 patients who met the inclusion criteria. Patients with estrogen receptor-positive tumors were more likely to have spiculated margins (P = .0008), and those with tumors that overexpressed human epidermal growth factor receptor 2 (HER2) were more likely to have heterogeneous and pleomorphic calcifications (P = .0078 and P = .0002, respectively). Mammographic imaging characteristics, obtained from an automated text search and the extraction of mammogram reports using NLP techniques, correlated with pathologic breast cancer subtype. The results of the current study validate previously reported trends assessed by manual data collection. Furthermore, NLP provides an automated means with which to scale up data extraction and analysis for clinical decision support. Cancer 2017;114-121. © 2016 American Cancer Society. © 2016 American Cancer Society.
NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.

PubMed

Tseytlin, Eugene; Mitchell, Kevin; Legowski, Elizabeth; Corrigan, Julia; Chavan, Girish; Jacobson, Rebecca S

2016-01-14

Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system's matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE's performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines.
Classification of health webpages as expert and non expert with a reduced set of cross-language features.

PubMed

Grabar, Natalia; Krivine, Sonia; Jaulent, Marie-Christine

2007-10-11

Making the distinction between expert and non expert health documents can help users to select the information which is more suitable for them, according to whether they are familiar or not with medical terminology. This issue is particularly important for the information retrieval area. In our work we address this purpose through stylistic corpus analysis and the application of machine learning algorithms. Our hypothesis is that this distinction can be performed on the basis of a small number of features and that such features can be language and domain independent. The used features were acquired in source corpus (Russian language, diabetes topic) and then tested on target (French language, pneumology topic) and source corpora. These cross-language features show 90% precision and 93% recall with non expert documents in source language; and 85% precision and 74% recall with expert documents in target language.
First stage identification of syntactic elements in an extra-terrestrial signal

NASA Astrophysics Data System (ADS)

Elliott, John

2011-02-01

By investigating the generic attributes of a representative set of terrestrial languages at varying levels of abstraction, it is our endeavour to try and isolate elements of the signal universe, which are computationally tractable for its detection and structural decipherment. Ultimately, our aim is to contribute in some way to the understanding of what 'languageness' actually is. This paper describes algorithms and software developed to characterise and detect generic intelligent language-like features in an input signal, using natural language learning techniques: looking for characteristic statistical "language-signatures" in test corpora. As a first step towards such species-independent language-detection, we present a suite of programs to analyse digital representations of a range of data, and use the results to extrapolate whether or not there are language-like structures which distinguish this data from other sources, such as music, images, and white noise.
Behavioral and computational aspects of language and its acquisition

NASA Astrophysics Data System (ADS)

Edelman, Shimon; Waterfall, Heidi

2007-12-01

One of the greatest challenges facing the cognitive sciences is to explain what it means to know a language, and how the knowledge of language is acquired. The dominant approach to this challenge within linguistics has been to seek an efficient characterization of the wealth of documented structural properties of language in terms of a compact generative grammar-ideally, the minimal necessary set of innate, universal, exception-less, highly abstract rules that jointly generate all and only the observed phenomena and are common to all human languages. We review developmental, behavioral, and computational evidence that seems to favor an alternative view of language, according to which linguistic structures are generated by a large, open set of constructions of varying degrees of abstraction and complexity, which embody both form and meaning and are acquired through socially situated experience in a given language community, by probabilistic learning algorithms that resemble those at work in other cognitive modalities.
Research on intelligent recommendation algorithm of e-commerce based on association rules

NASA Astrophysics Data System (ADS)

Shen, Jiajie; Cheng, Xianyi

2017-09-01

As the commodities of e-commerce are more and more rich, more and more consumers are willing to choose online shopping, because of these rich varieties of commodity information, customers will often appear aesthetic fatigue. Therefore, we need a recommendation algorithm according to the recent behavior of customers including browsing and consuming to predicate and intelligently recommend goods which the customers need, thus to improve the satisfaction of customers and to increase the profit of e-commerce. This paper first discusses recommendation algorithm, then improves Apriori. Finally, using R language realizes a recommendation algorithm of commodities. The result shows that this algorithm provides a certain decision-making role for customers to buy commodities.
De-identifying an EHR database - anonymity, correctness and readability of the medical record.

PubMed

Pantazos, Kostas; Lauesen, Soren; Lippert, Soren

2011-01-01

Electronic health records (EHR) contain a large amount of structured data and free text. Exploring and sharing clinical data can improve healthcare and facilitate the development of medical software. However, revealing confidential information is against ethical principles and laws. We de-identified a Danish EHR database with 437,164 patients. The goal was to generate a version with real medical records, but related to artificial persons. We developed a de-identification algorithm that uses lists of named entities, simple language analysis, and special rules. Our algorithm consists of 3 steps: collect lists of identifiers from the database and external resources, define a replacement for each identifier, and replace identifiers in structured data and free text. Some patient records could not be safely de-identified, so the de-identified database has 323,122 patient records with an acceptable degree of anonymity, readability and correctness (F-measure of 95%). The algorithm has to be adjusted for each culture, language and database.
Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing.

PubMed

Fernandes, Andrea C; Dutta, Rina; Velupillai, Sumithra; Sanyal, Jyoti; Stewart, Robert; Chandran, David

2018-05-09

Research into suicide prevention has been hampered by methodological limitations such as low sample size and recall bias. Recently, Natural Language Processing (NLP) strategies have been used with Electronic Health Records to increase information extraction from free text notes as well as structured fields concerning suicidality and this allows access to much larger cohorts than previously possible. This paper presents two novel NLP approaches - a rule-based approach to classify the presence of suicide ideation and a hybrid machine learning and rule-based approach to identify suicide attempts in a psychiatric clinical database. Good performance of the two classifiers in the evaluation study suggest they can be used to accurately detect mentions of suicide ideation and attempt within free-text documents in this psychiatric database. The novelty of the two approaches lies in the malleability of each classifier if a need to refine performance, or meet alternate classification requirements arises. The algorithms can also be adapted to fit infrastructures of other clinical datasets given sufficient clinical recording practice knowledge, without dependency on medical codes or additional data extraction of known risk factors to predict suicidal behaviour.
Adapting Semantic Natural Language Processing Technology to Address Information Overload in Influenza Epidemic Management

PubMed Central

Keselman, Alla; Rosemblat, Graciela; Kilicoglu, Halil; Fiszman, Marcelo; Jin, Honglan; Shin, Dongwook; Rindflesch, Thomas C.

2013-01-01

Explosion of disaster health information results in information overload among response professionals. The objective of this project was to determine the feasibility of applying semantic natural language processing (NLP) technology to addressing this overload. The project characterizes concepts and relationships commonly used in disaster health-related documents on influenza pandemics, as the basis for adapting an existing semantic summarizer to the domain. Methods include human review and semantic NLP analysis of a set of relevant documents. This is followed by a pilot-test in which two information specialists use the adapted application for a realistic information seeking task. According to the results, the ontology of influenza epidemics management can be described via a manageable number of semantic relationships that involve concepts from a limited number of semantic types. Test users demonstrate several ways to engage with the application to obtain useful information. This suggests that existing semantic NLP algorithms can be adapted to support information summarization and visualization in influenza epidemics and other disaster health areas. However, additional research is needed in the areas of terminology development (as many relevant relationships and terms are not part of existing standardized vocabularies), NLP, and user interface design. PMID:24311971
SensorWeb 3G: Extending On-Orbit Sensor Capabilities to Enable Near Realtime User Configurability

NASA Technical Reports Server (NTRS)

Mandl, Daniel; Cappelaere, Pat; Frye, Stuart; Sohlberg, Rob; Ly, Vuong; Chien, Steve; Tran, Daniel; Davies, Ashley; Sullivan, Don; Ames, Troy;

2010-01-01

This research effort prototypes an implementation of a standard interface, Web Coverage Processing Service (WCPS), which is an Open Geospatial Consortium(OGC) standard, to enable users to define, test, upload and execute algorithms for on-orbit sensor systems. The user is able to customize on-orbit data products that result from raw data streaming from an instrument. This extends the SensorWeb 2.0 concept that was developed under a previous Advanced Information System Technology (AIST) effort in which web services wrap sensors and a standardized Extensible Markup Language (XML) based scripting workflow language orchestrates processing steps across multiple domains. SensorWeb 3G extends the concept by providing the user controls into the flight software modules associated with on-orbit sensor and thus provides a degree of flexibility which does not presently exist. The successful demonstrations to date will be presented, which includes a realistic HyspIRI decadal mission testbed. Furthermore, benchmarks that were run will also be presented along with future demonstration and benchmark tests planned. Finally, we conclude with implications for the future and how this concept dovetails into efforts to develop "cloud computing" methods and standards.

PS: A nonprocedural language with data types and modules

NASA Technical Reports Server (NTRS)

Gokhale, M. B.

1986-01-01

The Problem Specification (PS) nonprocedural language is a very high level language for algorithm specification. PS is suitable for nonprogrammers, who can specify a problem using mathematically-oriented equations; for expert programmers, who can prototype different versions of a software system for evaluation; and for those who wish to use specifications for portions (if not all) of a program. PS has data types and modules similar to Modula-2. The compiler generates C code. PS is first shown by example, and then efficiency issues in scheduling and code generation are discussed.

COSBID-M3: a platform for multimodal monitoring, data collection, and research in neurocritical care.

PubMed

Wilson, J Adam; Shutter, Lori A; Hartings, Jed A

2013-01-01

Neuromonitoring in patients with severe brain trauma and stroke is often limited to intracranial pressure (ICP); advanced neuroscience intensive care units may also monitor brain oxygenation (partial pressure of brain tissue oxygen, P(bt)O(2)), electroencephalogram (EEG), cerebral blood flow (CBF), or neurochemistry. For example, cortical spreading depolarizations (CSDs) recorded by electrocorticography (ECoG) are associated with delayed cerebral ischemia after subarachnoid hemorrhage and are an attractive target for novel therapeutic approaches. However, to better understand pathophysiologic relations and realize the potential of multimodal monitoring, a common platform for data collection and integration is needed. We have developed a multimodal system that integrates clinical, research, and imaging data into a single research and development (R&D) platform. Our system is adapted from the widely used BCI2000, a brain-computer interface tool which is written in the C++ language and supports over 20 data acquisition systems. It is optimized for real-time analysis of multimodal data using advanced time and frequency domain analyses and is extensible for research development using a combination of C++, MATLAB, and Python languages. Continuous streams of raw and processed data, including BP (blood pressure), ICP, PtiO2, CBF, ECoG, EEG, and patient video are stored in an open binary data format. Selected events identified in raw (e.g., ICP) or processed (e.g., CSD) measures are displayed graphically, can trigger alarms, or can be sent to researchers or clinicians via text message. For instance, algorithms for automated detection of CSD have been incorporated, and processed ECoG signals are projected onto three-dimensional (3D) brain models based on patient magnetic resonance imaging (MRI) and computed tomographic (CT) scans, allowing real-time correlation of pathoanatomy and cortical function. This platform will provide clinicians and researchers with an advanced tool to investigate pathophysiologic relationships and novel measures of cerebral status, as well as implement treatment algorithms based on such multimodal measures.
Adaptable Binary Programs

DTIC Science & Technology

1994-04-01

a variation of Ziv - Lempel compression [ZL77]. We found that using a standard compression algorithm rather than semantic compression allowed simplified...mentation. In Proceedings of the Conference on Programming Language Design and Implementation, 1993. (ZL77] J. Ziv and A. Lempel . A universal algorithm ...required by adaptable binaries. Our ABS stores adaptable binary information using the conventional binary symbol table and compresses this data using
A Parallel Genetic Algorithm for Automated Electronic Circuit Design

NASA Technical Reports Server (NTRS)

Lohn, Jason D.; Colombano, Silvano P.; Haith, Gary L.; Stassinopoulos, Dimitris; Norvig, Peter (Technical Monitor)

2000-01-01

We describe a parallel genetic algorithm (GA) that automatically generates circuit designs using evolutionary search. A circuit-construction programming language is introduced and we show how evolution can generate practical analog circuit designs. Our system allows circuit size (number of devices), circuit topology, and device values to be evolved. We present experimental results as applied to analog filter and amplifier design tasks.
Using parallel computing methods to improve log surface defect detection methods

Treesearch

R. Edward Thomas; Liya Thomas

2013-01-01

Determining the size and location of surface defects is crucial to evaluating the potential yield and value of hardwood logs. Recently a surface defect detection algorithm was developed using the Java language. This algorithm was developed around an earlier laser scanning system that had poor resolution along the length of the log (15 scan lines per foot). A newer...
A Rewriting-Based Approach to Trace Analysis

NASA Technical Reports Server (NTRS)

Havelund, Klaus; Rosu, Grigore; Clancy, Daniel (Technical Monitor)

2002-01-01

We present a rewriting-based algorithm for efficiently evaluating future time Linear Temporal Logic (LTL) formulae on finite execution traces online. While the standard models of LTL are infinite traces, finite traces appear naturally when testing and/or monitoring red applications that only run for limited time periods. The presented algorithm is implemented in the Maude executable specification language and essentially consists of a set of equations establishing an executable semantics of LTL using a simple formula transforming approach. The algorithm is further improved to build automata on-the-fly from formulae, using memoization. The result is a very efficient and small Maude program that can be used to monitor program executions. We furthermore present an alternative algorithm for synthesizing probably minimal observer finite state machines (or automata) from LTL formulae, which can be used to analyze execution traces without the need for a rewriting system, and can hence be used by observers written in conventional programming languages. The presented work is part of an ambitious runtime verification and monitoring project at NASA Ames, called PATHEXPLORER, and demonstrates that rewriting can be a tractable and attractive means for experimenting and implementing program monitoring logics.
Neurolinguistic approach to natural language processing with applications to medical text analysis.

PubMed

Duch, Włodzisław; Matykiewicz, Paweł; Pestian, John

2008-12-01

Understanding written or spoken language presumably involves spreading neural activation in the brain. This process may be approximated by spreading activation in semantic networks, providing enhanced representations that involve concepts not found directly in the text. The approximation of this process is of great practical and theoretical interest. Although activations of neural circuits involved in representation of words rapidly change in time snapshots of these activations spreading through associative networks may be captured in a vector model. Concepts of similar type activate larger clusters of neurons, priming areas in the left and right hemisphere. Analysis of recent brain imaging experiments shows the importance of the right hemisphere non-verbal clusterization. Medical ontologies enable development of a large-scale practical algorithm to re-create pathways of spreading neural activations. First concepts of specific semantic type are identified in the text, and then all related concepts of the same type are added to the text, providing expanded representations. To avoid rapid growth of the extended feature space after each step only the most useful features that increase document clusterization are retained. Short hospital discharge summaries are used to illustrate how this process works on a real, very noisy data. Expanded texts show significantly improved clustering and may be classified with much higher accuracy. Although better approximations to the spreading of neural activations may be devised a practical approach presented in this paper helps to discover pathways used by the brain to process specific concepts, and may be used in large-scale applications.
The Use of a Context-Based Information Retrieval Technique

DTIC Science & Technology

2009-07-01

provided in context. Latent Semantic Analysis (LSA) is a statistical technique for inferring contextual and structural information, and previous studies...WAIS). 10 DSTO-TR-2322 1.4.4 Latent Semantic Analysis LSA, which is also known as latent semantic indexing (LSI), uses a statistical and...1.4.6 Language Models In contrast, natural language models apply algorithms that combine statistical information with semantic information. Semantic
Distributed Visualization Project

NASA Technical Reports Server (NTRS)

Craig, Douglas; Conroy, Michael; Kickbusch, Tracey; Mazone, Rebecca

2016-01-01

Distributed Visualization allows anyone, anywhere to see any simulation at any time. Development focuses on algorithms, software, data formats, data systems and processes to enable sharing simulation-based information across temporal and spatial boundaries without requiring stakeholders to possess highly-specialized and very expensive display systems. It also introduces abstraction between the native and shared data, which allows teams to share results without giving away proprietary or sensitive data. The initial implementation of this capability is the Distributed Observer Network (DON) version 3.1. DON 3.1 is available for public release in the NASA Software Store (https://software.nasa.gov/software/KSC-13775) and works with version 3.0 of the Model Process Control specification (an XML Simulation Data Representation and Communication Language) to display complex graphical information and associated Meta-Data.
Opinion mining on book review using CNN-L2-SVM algorithm

NASA Astrophysics Data System (ADS)

Rozi, M. F.; Mukhlash, I.; Soetrisno; Kimura, M.

2018-03-01

Review of a product can represent quality of a product itself. An extraction to that review can be used to know sentiment of that opinion. Process to extract useful information of user review is called Opinion Mining. Review extraction model that is enhancing nowadays is Deep Learning model. This Model has been used by many researchers to obtain excellent performance on Natural Language Processing. In this research, one of deep learning model, Convolutional Neural Network (CNN) is used for feature extraction and L2 Support Vector Machine (SVM) as classifier. These methods are implemented to know the sentiment of book review data. The result of this method shows state-of-the art performance in 83.23% for training phase and 64.6% for testing phase.
C to VHDL compiler

NASA Astrophysics Data System (ADS)

Berdychowski, Piotr P.; Zabolotny, Wojciech M.

2010-09-01

The main goal of C to VHDL compiler project is to make FPGA platform more accessible for scientists and software developers. FPGA platform offers unique ability to configure the hardware to implement virtually any dedicated architecture, and modern devices provide sufficient number of hardware resources to implement parallel execution platforms with complex processing units. All this makes the FPGA platform very attractive for those looking for efficient heterogeneous, computing environment. Current industry standard in development of digital systems on FPGA platform is based on HDLs. Although very effective and expressive in hands of hardware development specialists, these languages require specific knowledge and experience, unreachable for most scientists and software programmers. C to VHDL compiler project attempts to remedy that by creating an application, that derives initial VHDL description of a digital system (for further compilation and synthesis), from purely algorithmic description in C programming language. This idea itself is not new, and the C to VHDL compiler combines the best approaches from existing solutions developed over many previous years, with the introduction of some new unique improvements.
Data-driven classification of patients with primary progressive aphasia.

PubMed

Hoffman, Paul; Sajjadi, Seyed Ahmad; Patterson, Karalyn; Nestor, Peter J

2017-11-01

Current diagnostic criteria classify primary progressive aphasia into three variants-semantic (sv), nonfluent (nfv) and logopenic (lv) PPA-though the adequacy of this scheme is debated. This study took a data-driven approach, applying k-means clustering to data from 43 PPA patients. The algorithm grouped patients based on similarities in language, semantic and non-linguistic cognitive scores. The optimum solution consisted of three groups. One group, almost exclusively those diagnosed as svPPA, displayed a selective semantic impairment. A second cluster, with impairments to speech production, repetition and syntactic processing, contained a majority of patients with nfvPPA but also some lvPPA patients. The final group exhibited more severe deficits to speech, repetition and syntax as well as semantic and other cognitive deficits. These results suggest that, amongst cases of non-semantic PPA, differentiation mainly reflects overall degree of language/cognitive impairment. The observed patterns were scarcely affected by inclusion/exclusion of non-linguistic cognitive scores. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
MicrO: an ontology of phenotypic and metabolic characters, assays, and culture media found in prokaryotic taxonomic descriptions.

PubMed

Blank, Carrine E; Cui, Hong; Moore, Lisa R; Walls, Ramona L

2016-01-01

MicrO is an ontology of microbiological terms, including prokaryotic qualities and processes, material entities (such as cell components), chemical entities (such as microbiological culture media and medium ingredients), and assays. The ontology was built to support the ongoing development of a natural language processing algorithm, MicroPIE (or, Microbial Phenomics Information Extractor). During the MicroPIE design process, we realized there was a need for a prokaryotic ontology which would capture the evolutionary diversity of phenotypes and metabolic processes across the tree of life, capture the diversity of synonyms and information contained in the taxonomic literature, and relate microbiological entities and processes to terms in a large number of other ontologies, most particularly the Gene Ontology (GO), the Phenotypic Quality Ontology (PATO), and the Chemical Entities of Biological Interest (ChEBI). We thus constructed MicrO to be rich in logical axioms and synonyms gathered from the taxonomic literature. MicrO currently has ~14550 classes (~2550 of which are new, the remainder being microbiologically-relevant classes imported from other ontologies), connected by ~24,130 logical axioms (5,446 of which are new), and is available at (http://purl.obolibrary.org/obo/MicrO.owl) and on the project website at https://github.com/carrineblank/MicrO. MicrO has been integrated into the OBO Foundry Library (http://www.obofoundry.org/ontology/micro.html), so that other ontologies can borrow and re-use classes. Term requests and user feedback can be made using MicrO's Issue Tracker in GitHub. We designed MicrO such that it can support the ongoing and future development of algorithms that can leverage the controlled vocabulary and logical inference power provided by the ontology. By connecting microbial classes with large numbers of chemical entities, material entities, biological processes, molecular functions, and qualities using a dense array of logical axioms, we intend MicrO to be a powerful new tool to increase the computing power of bioinformatics tools such as the automated text mining of prokaryotic taxonomic descriptions using natural language processing. We also intend MicrO to support the development of new bioinformatics tools that aim to develop new connections between microbial phenotypes and genotypes (i.e., the gene content in genomes). Future ontology development will include incorporation of pathogenic phenotypes and prokaryotic habitats.
Graphical programming interface: A development environment for MRI methods.

PubMed

Zwart, Nicholas R; Pipe, James G

2015-11-01

To introduce a multiplatform, Python language-based, development environment called graphical programming interface for prototyping MRI techniques. The interface allows developers to interact with their scientific algorithm prototypes visually in an event-driven environment making tasks such as parameterization, algorithm testing, data manipulation, and visualization an integrated part of the work-flow. Algorithm developers extend the built-in functionality through simple code interfaces designed to facilitate rapid implementation. This article shows several examples of algorithms developed in graphical programming interface including the non-Cartesian MR reconstruction algorithms for PROPELLER and spiral as well as spin simulation and trajectory visualization of a FLORET example. The graphical programming interface framework is shown to be a versatile prototyping environment for developing numeric algorithms used in the latest MR techniques. © 2014 Wiley Periodicals, Inc.
Analysis of parameter estimation and optimization application of ant colony algorithm in vehicle routing problem

NASA Astrophysics Data System (ADS)

Xu, Quan-Li; Cao, Yu-Wei; Yang, Kun

2018-03-01

Ant Colony Optimization (ACO) is the most widely used artificial intelligence algorithm at present. This study introduced the principle and mathematical model of ACO algorithm in solving Vehicle Routing Problem (VRP), and designed a vehicle routing optimization model based on ACO, then the vehicle routing optimization simulation system was developed by using c ++ programming language, and the sensitivity analyses, estimations and improvements of the three key parameters of ACO were carried out. The results indicated that the ACO algorithm designed in this paper can efficiently solve rational planning and optimization of VRP, and the different values of the key parameters have significant influence on the performance and optimization effects of the algorithm, and the improved algorithm is not easy to locally converge prematurely and has good robustness.
Scaling-up NLP Pipelines to Process Large Corpora of Clinical Notes.

PubMed

Divita, G; Carter, M; Redd, A; Zeng, Q; Gupta, K; Trautner, B; Samore, M; Gundlapalli, A

2015-01-01

This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". This paper describes the scale-up efforts at the VA Salt Lake City Health Care System to address processing large corpora of clinical notes through a natural language processing (NLP) pipeline. The use case described is a current project focused on detecting the presence of an indwelling urinary catheter in hospitalized patients and subsequent catheter-associated urinary tract infections. An NLP algorithm using v3NLP was developed to detect the presence of an indwelling urinary catheter in hospitalized patients. The algorithm was tested on a small corpus of notes on patients for whom the presence or absence of a catheter was already known (reference standard). In planning for a scale-up, we estimated that the original algorithm would have taken 2.4 days to run on a larger corpus of notes for this project (550,000 notes), and 27 days for a corpus of 6 million records representative of a national sample of notes. We approached scaling-up NLP pipelines through three techniques: pipeline replication via multi-threading, intra-annotator threading for tasks that can be further decomposed, and remote annotator services which enable annotator scale-out. The scale-up resulted in reducing the average time to process a record from 206 milliseconds to 17 milliseconds or a 12- fold increase in performance when applied to a corpus of 550,000 notes. Purposely simplistic in nature, these scale-up efforts are the straight forward evolution from small scale NLP processing to larger scale extraction without incurring associated complexities that are inherited by the use of the underlying UIMA framework. These efforts represent generalizable and widely applicable techniques that will aid other computationally complex NLP pipelines that are of need to be scaled out for processing and analyzing big data.
Automatic two- and three-dimensional mesh generation based on fuzzy knowledge processing

NASA Astrophysics Data System (ADS)

Yagawa, G.; Yoshimura, S.; Soneda, N.; Nakao, K.

1992-09-01

This paper describes the development of a novel automatic FEM mesh generation algorithm based on the fuzzy knowledge processing technique. A number of local nodal patterns are stored in a nodal pattern database of the mesh generation system. These nodal patterns are determined a priori based on certain theories or past experience of experts of FEM analyses. For example, such human experts can determine certain nodal patterns suitable for stress concentration analyses of cracks, corners, holes and so on. Each nodal pattern possesses a membership function and a procedure of node placement according to this function. In the cases of the nodal patterns for stress concentration regions, the membership function which is utilized in the fuzzy knowledge processing has two meanings, i.e. the “closeness” of nodal location to each stress concentration field as well as “nodal density”. This is attributed to the fact that a denser nodal pattern is required near a stress concentration field. What a user has to do in a practical mesh generation process are to choose several local nodal patterns properly and to designate the maximum nodal density of each pattern. After those simple operations by the user, the system places the chosen nodal patterns automatically in an analysis domain and on its boundary, and connects them smoothly by the fuzzy knowledge processing technique. Then triangular or tetrahedral elements are generated by means of the advancing front method. The key issue of the present algorithm is an easy control of complex two- or three-dimensional nodal density distribution by means of the fuzzy knowledge processing technique. To demonstrate fundamental performances of the present algorithm, a prototype system was constructed with one of object-oriented languages, Smalltalk-80 on a 32-bit microcomputer, Macintosh II. The mesh generation of several two- and three-dimensional domains with cracks, holes and junctions was presented as examples.
Using Agent Base Models to Optimize Large Scale Network for Large System Inventories

NASA Technical Reports Server (NTRS)

Shameldin, Ramez Ahmed; Bowling, Shannon R.

2010-01-01

The aim of this paper is to use Agent Base Models (ABM) to optimize large scale network handling capabilities for large system inventories and to implement strategies for the purpose of reducing capital expenses. The models used in this paper either use computational algorithms or procedure implementations developed by Matlab to simulate agent based models in a principal programming language and mathematical theory using clusters, these clusters work as a high performance computational performance to run the program in parallel computational. In both cases, a model is defined as compilation of a set of structures and processes assumed to underlie the behavior of a network system.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2011-09-06

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
ML-Space: Hybrid Spatial Gillespie and Particle Simulation of Multi-Level Rule-Based Models in Cell Biology.

PubMed

Bittig, Arne T; Uhrmacher, Adelinde M

2017-01-01

Spatio-temporal dynamics of cellular processes can be simulated at different levels of detail, from (deterministic) partial differential equations via the spatial Stochastic Simulation algorithm to tracking Brownian trajectories of individual particles. We present a spatial simulation approach for multi-level rule-based models, which includes dynamically hierarchically nested cellular compartments and entities. Our approach ML-Space combines discrete compartmental dynamics, stochastic spatial approaches in discrete space, and particles moving in continuous space. The rule-based specification language of ML-Space supports concise and compact descriptions of models and to adapt the spatial resolution of models easily.
Development and evaluation of task-specific NLP framework in China.

PubMed

Ge, Caixia; Zhang, Yinsheng; Huang, Zhenzhen; Jia, Zheng; Ju, Meizhi; Duan, Huilong; Li, Haomin

2015-01-01

Natural language processing (NLP) has been designed to convert narrative text into structured data. Although some general NLP architectures have been developed, a task-specific NLP framework to facilitate the effective use of data is still a challenge in lexical resource limited regions, such as China. The purpose of this study is to design and develop a task-specific NLP framework to extract targeted information from particular documents by adopting dedicated algorithms on current limited lexical resources. In this framework, a shared and evolving ontology mechanism was designed. The result has shown that such a free text driven platform will accelerate the NLP technology acceptance in China.

Copilot: Monitoring Embedded Systems

NASA Technical Reports Server (NTRS)

Pike, Lee; Wegmann, Nis; Niller, Sebastian; Goodloe, Alwyn

2012-01-01

Runtime verification (RV) is a natural fit for ultra-critical systems, where correctness is imperative. In ultra-critical systems, even if the software is fault-free, because of the inherent unreliability of commodity hardware and the adversity of operational environments, processing units (and their hosted software) are replicated, and fault-tolerant algorithms are used to compare the outputs. We investigate both software monitoring in distributed fault-tolerant systems, as well as implementing fault-tolerance mechanisms using RV techniques. We describe the Copilot language and compiler, specifically designed for generating monitors for distributed, hard real-time systems. We also describe two case-studies in which we generated Copilot monitors in avionics systems.
Parallel processors and nonlinear structural dynamics algorithms and software

NASA Technical Reports Server (NTRS)

Belytschko, Ted; Gilbertsen, Noreen D.; Neal, Mark O.; Plaskacz, Edward J.

1989-01-01

The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14.
Evaluation of Methods for Multidisciplinary Design Optimization (MDO). Part 2

NASA Technical Reports Server (NTRS)

Kodiyalam, Srinivas; Yuan, Charles; Sobieski, Jaroslaw (Technical Monitor)

2000-01-01

A new MDO method, BLISS, and two different variants of the method, BLISS/RS and BLISS/S, have been implemented using iSIGHT's scripting language and evaluated in this report on multidisciplinary problems. All of these methods are based on decomposing a modular system optimization system into several subtasks optimization, that may be executed concurrently, and the system optimization that coordinates the subtasks optimization. The BLISS method and its variants are well suited for exploiting the concurrent processing capabilities in a multiprocessor machine. Several steps, including the local sensitivity analysis, local optimization, response surfaces construction and updates are all ideally suited for concurrent processing. Needless to mention, such algorithms that can effectively exploit the concurrent processing capabilities of the compute servers will be a key requirement for solving large-scale industrial design problems, such as the automotive vehicle problem detailed in Section 3.4.
A novel method of language modeling for automatic captioning in TC video teleconferencing.

PubMed

Zhang, Xiaojia; Zhao, Yunxin; Schopp, Laura

2007-05-01

We are developing an automatic captioning system for teleconsultation video teleconferencing (TC-VTC) in telemedicine, based on large vocabulary conversational speech recognition. In TC-VTC, doctors' speech contains a large number of infrequently used medical terms in spontaneous styles. Due to insufficiency of data, we adopted mixture language modeling, with models trained from several datasets of medical and nonmedical domains. This paper proposes novel modeling and estimation methods for the mixture language model (LM). Component LMs are trained from individual datasets, with class n-gram LMs trained from in-domain datasets and word n-gram LMs trained from out-of-domain datasets, and they are interpolated into a mixture LM. For class LMs, semantic categories are used for class definition on medical terms, names, and digits. The interpolation weights of a mixture LM are estimated by a greedy algorithm of forward weight adjustment (FWA). The proposed mixing of in-domain class LMs and out-of-domain word LMs, the semantic definitions of word classes, as well as the weight-estimation algorithm of FWA are effective on the TC-VTC task. As compared with using mixtures of word LMs with weights estimated by the conventional expectation-maximization algorithm, the proposed methods led to a 21% reduction of perplexity on test sets of five doctors, which translated into improvements of captioning accuracy.
Constraints in Genetic Programming

NASA Technical Reports Server (NTRS)

Janikow, Cezary Z.

1996-01-01

Genetic programming refers to a class of genetic algorithms utilizing generic representation in the form of program trees. For a particular application, one needs to provide the set of functions, whose compositions determine the space of program structures being evolved, and the set of terminals, which determine the space of specific instances of those programs. The algorithm searches the space for the best program for a given problem, applying evolutionary mechanisms borrowed from nature. Genetic algorithms have shown great capabilities in approximately solving optimization problems which could not be approximated or solved with other methods. Genetic programming extends their capabilities to deal with a broader variety of problems. However, it also extends the size of the search space, which often becomes too large to be effectively searched even by evolutionary methods. Therefore, our objective is to utilize problem constraints, if such can be identified, to restrict this space. In this publication, we propose a generic constraint specification language, powerful enough for a broad class of problem constraints. This language has two elements -- one reduces only the number of program instances, the other reduces both the space of program structures as well as their instances. With this language, we define the minimal set of complete constraints, and a set of operators guaranteeing offspring validity from valid parents. We also show that these operators are not less efficient than the standard genetic programming operators if one preprocesses the constraints - the necessary mechanisms are identified.
On Fast Post-Processing of Global Positioning System Simulator Truth Data and Receiver Measurements and Solutions Data

NASA Technical Reports Server (NTRS)

Kizhner, Semion; Day, John H. (Technical Monitor)

2000-01-01

Post-Processing of data related to a Global Positioning System (GPS) simulation is an important activity in qualification of a GPS receiver for space flight. Because a GPS simulator is a critical resource it is desirable to move off the pertinent simulation data from the simulator as soon as a test is completed. The simulator data files are usually moved to a Personal Computer (PC), where the post-processing of the receiver logged measurements and solutions data and simulated data is performed. Typically post-processing is accomplished using PC-based commercial software languages and tools. Because of commercial software systems generality their general-purpose functions are notoriously slow and more than often are the bottleneck problem even for short duration experiments. For example, it may take 8 hours to post-process data from a 6-hour simulation. There is a need to do post-processing faster, especially in order to use the previous test results as feedback for a next simulation setup. This paper demonstrates that a fast software linear interpolation algorithm is applicable to a large class of engineering problems, like GPS simulation data post-processing, where computational time is a critical resource and is one of the most important considerations. An approach is developed that allows to speed-up post-processing by an order of magnitude. It is based on improving the post-processing bottleneck interpolation algorithm using apriori information that is specific to the GPS simulation application. The presented post-processing scheme was used in support of a few successful space flight missions carrying GPS receivers. A future approach to solving the post-processing performance problem using Field Programmable Gate Array (FPGA) technology is described.
Understanding Language: An Information-Processing Analysis of Speech Perception, Reading, and Psycholinguistics.

ERIC Educational Resources Information Center

Massaro, Dominic W., Ed.

In an information-processing approach to language processing, language processing is viewed as a sequence of psychological stages that occur between the initial presentation of the language stimulus and the meaning in the mind of the language processor. This book defines each of the processes and structures involved, explains how each of them…
Spoken Language Processing Model: Bridging Auditory and Language Processing to Guide Assessment and Intervention

ERIC Educational Resources Information Center

Medwetsky, Larry

2011-01-01

Purpose: This article outlines the author's conceptualization of the key mechanisms that are engaged in the processing of spoken language, referred to as the spoken language processing model. The act of processing what is heard is very complex and involves the successful intertwining of auditory, cognitive, and language mechanisms. Spoken language…
Multiple Lookup Table-Based AES Encryption Algorithm Implementation

NASA Astrophysics Data System (ADS)

Gong, Jin; Liu, Wenyi; Zhang, Huixin

Anew AES (Advanced Encryption Standard) encryption algorithm implementation was proposed in this paper. It is based on five lookup tables, which are generated from S-box(the substitution table in AES). The obvious advantages are reducing the code-size, improving the implementation efficiency, and helping new learners to understand the AES encryption algorithm and GF(28) multiplication which are necessary to correctly implement AES[1]. This method can be applied on processors with word length 32 or above, FPGA and others. And correspondingly we can implement it by VHDL, Verilog, VB and other languages.
Leveraging Python Interoperability Tools to Improve Sapphire's Usability

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gezahegne, A; Love, N S

2007-12-10

The Sapphire project at the Center for Applied Scientific Computing (CASC) develops and applies an extensive set of data mining algorithms for the analysis of large data sets. Sapphire's algorithms are currently available as a set of C++ libraries. However many users prefer higher level scripting languages such as Python for their ease of use and flexibility. In this report, we evaluate four interoperability tools for the purpose of wrapping Sapphire's core functionality with Python. Exposing Sapphire's functionality through a Python interface would increase its usability and connect its algorithms to existing Python tools.
The Bilingual Language Interaction Network for Comprehension of Speech*

PubMed Central

Marian, Viorica

2013-01-01

During speech comprehension, bilinguals co-activate both of their languages, resulting in cross-linguistic interaction at various levels of processing. This interaction has important consequences for both the structure of the language system and the mechanisms by which the system processes spoken language. Using computational modeling, we can examine how cross-linguistic interaction affects language processing in a controlled, simulated environment. Here we present a connectionist model of bilingual language processing, the Bilingual Language Interaction Network for Comprehension of Speech (BLINCS), wherein interconnected levels of processing are created using dynamic, self-organizing maps. BLINCS can account for a variety of psycholinguistic phenomena, including cross-linguistic interaction at and across multiple levels of processing, cognate facilitation effects, and audio-visual integration during speech comprehension. The model also provides a way to separate two languages without requiring a global language-identification system. We conclude that BLINCS serves as a promising new model of bilingual spoken language comprehension. PMID:24363602
Cross-language opinion lexicon extraction using mutual-reinforcement label propagation.

PubMed

Lin, Zheng; Tan, Songbo; Liu, Yue; Cheng, Xueqi; Xu, Xueke

2013-01-01

There is a growing interest in automatically building opinion lexicon from sources such as product reviews. Most of these methods depend on abundant external resources such as WordNet, which limits the applicability of these methods. Unsupervised or semi-supervised learning provides an optional solution to multilingual opinion lexicon extraction. However, the datasets are imbalanced in different languages. For some languages, the high-quality corpora are scarce or hard to obtain, which limits the research progress. To solve the above problems, we explore a mutual-reinforcement label propagation framework. First, for each language, a label propagation algorithm is applied to a word relation graph, and then a bilingual dictionary is used as a bridge to transfer information between two languages. A key advantage of this model is its ability to make two languages learn from each other and boost each other. The experimental results show that the proposed approach outperforms baseline significantly.
Cross-Language Opinion Lexicon Extraction Using Mutual-Reinforcement Label Propagation

PubMed Central

Lin, Zheng; Tan, Songbo; Liu, Yue; Cheng, Xueqi; Xu, Xueke

2013-01-01

There is a growing interest in automatically building opinion lexicon from sources such as product reviews. Most of these methods depend on abundant external resources such as WordNet, which limits the applicability of these methods. Unsupervised or semi-supervised learning provides an optional solution to multilingual opinion lexicon extraction. However, the datasets are imbalanced in different languages. For some languages, the high-quality corpora are scarce or hard to obtain, which limits the research progress. To solve the above problems, we explore a mutual-reinforcement label propagation framework. First, for each language, a label propagation algorithm is applied to a word relation graph, and then a bilingual dictionary is used as a bridge to transfer information between two languages. A key advantage of this model is its ability to make two languages learn from each other and boost each other. The experimental results show that the proposed approach outperforms baseline significantly. PMID:24260190
Monitoring Different Phonological Parameters of Sign Language Engages the Same Cortical Language Network but Distinctive Perceptual Ones.

PubMed

Cardin, Velia; Orfanidou, Eleni; Kästner, Lena; Rönnberg, Jerker; Woll, Bencie; Capek, Cheryl M; Rudner, Mary

2016-01-01

The study of signed languages allows the dissociation of sensorimotor and cognitive neural components of the language signal. Here we investigated the neurocognitive processes underlying the monitoring of two phonological parameters of sign languages: handshape and location. Our goal was to determine if brain regions processing sensorimotor characteristics of different phonological parameters of sign languages were also involved in phonological processing, with their activity being modulated by the linguistic content of manual actions. We conducted an fMRI experiment using manual actions varying in phonological structure and semantics: (1) signs of a familiar sign language (British Sign Language), (2) signs of an unfamiliar sign language (Swedish Sign Language), and (3) invented nonsigns that violate the phonological rules of British Sign Language and Swedish Sign Language or consist of nonoccurring combinations of phonological parameters. Three groups of participants were tested: deaf native signers, deaf nonsigners, and hearing nonsigners. Results show that the linguistic processing of different phonological parameters of sign language is independent of the sensorimotor characteristics of the language signal. Handshape and location were processed by different perceptual and task-related brain networks but recruited the same language areas. The semantic content of the stimuli did not influence this process, but phonological structure did, with nonsigns being associated with longer RTs and stronger activations in an action observation network in all participants and in the supramarginal gyrus exclusively in deaf signers. These results suggest higher processing demands for stimuli that contravene the phonological rules of a signed language, independently of previous knowledge of signed languages. We suggest that the phonological characteristics of a language may arise as a consequence of more efficient neural processing for its perception and production.
Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.

PubMed

Liao, Katherine P; Ananthakrishnan, Ashwin N; Kumar, Vishesh; Xia, Zongqi; Cagan, Andrew; Gainer, Vivian S; Goryachev, Sergey; Chen, Pei; Savova, Guergana K; Agniel, Denis; Churchill, Susanne; Lee, Jaeyoung; Murphy, Shawn N; Plenge, Robert M; Szolovits, Peter; Kohane, Isaac; Shaw, Stanley Y; Karlson, Elizabeth W; Cai, Tianxi

2015-01-01

Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.
Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts

PubMed Central

Liao, Katherine P.; Ananthakrishnan, Ashwin N.; Kumar, Vishesh; Xia, Zongqi; Cagan, Andrew; Gainer, Vivian S.; Goryachev, Sergey; Chen, Pei; Savova, Guergana K.; Agniel, Denis; Churchill, Susanne; Lee, Jaeyoung; Murphy, Shawn N.; Plenge, Robert M.; Szolovits, Peter; Kohane, Isaac; Shaw, Stanley Y.; Karlson, Elizabeth W.; Cai, Tianxi

2015-01-01

Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM. PMID:26301417
A Parallel Genetic Algorithm for Automated Electronic Circuit Design

NASA Technical Reports Server (NTRS)

Long, Jason D.; Colombano, Silvano P.; Haith, Gary L.; Stassinopoulos, Dimitris

2000-01-01

Parallelized versions of genetic algorithms (GAs) are popular primarily for three reasons: the GA is an inherently parallel algorithm, typical GA applications are very compute intensive, and powerful computing platforms, especially Beowulf-style computing clusters, are becoming more affordable and easier to implement. In addition, the low communication bandwidth required allows the use of inexpensive networking hardware such as standard office ethernet. In this paper we describe a parallel GA and its use in automated high-level circuit design. Genetic algorithms are a type of trial-and-error search technique that are guided by principles of Darwinian evolution. Just as the genetic material of two living organisms can intermix to produce offspring that are better adapted to their environment, GAs expose genetic material, frequently strings of 1s and Os, to the forces of artificial evolution: selection, mutation, recombination, etc. GAs start with a pool of randomly-generated candidate solutions which are then tested and scored with respect to their utility. Solutions are then bred by probabilistically selecting high quality parents and recombining their genetic representations to produce offspring solutions. Offspring are typically subjected to a small amount of random mutation. After a pool of offspring is produced, this process iterates until a satisfactory solution is found or an iteration limit is reached. Genetic algorithms have been applied to a wide variety of problems in many fields, including chemistry, biology, and many engineering disciplines. There are many styles of parallelism used in implementing parallel GAs. One such method is called the master-slave or processor farm approach. In this technique, slave nodes are used solely to compute fitness evaluations (the most time consuming part). The master processor collects fitness scores from the nodes and performs the genetic operators (selection, reproduction, variation, etc.). Because of dependency issues in the GA, it is possible to have idle processors. However, as long as the load at each processing node is similar, the processors are kept busy nearly all of the time. In applying GAs to circuit design, a suitable genetic representation 'is that of a circuit-construction program. We discuss one such circuit-construction programming language and show how evolution can generate useful analog circuit designs. This language has the desirable property that virtually all sets of combinations of primitives result in valid circuit graphs. Our system allows circuit size (number of devices), circuit topology, and device values to be evolved. Using a parallel genetic algorithm and circuit simulation software, we present experimental results as applied to three analog filter and two amplifier design tasks. For example, a figure shows an 85 dB amplifier design evolved by our system, and another figure shows the performance of that circuit (gain and frequency response). In all tasks, our system is able to generate circuits that achieve the target specifications.
Sociable Interfaces

DTIC Science & Technology

2005-01-01

Interface Compatibility); the tool is written in Ocaml [10], and the symbolic algorithms for interface compatibility and refinement are built on top...automata for a fire detection and reporting system. be encoded in the input language of the tool TIC. The refinement of sociable interfaces is discussed...are closely related to the I/O Automata Language (IOA) of [11]. Interface models are games between Input and Output, and in the models, it is es
Simulation for Dynamic Situation Awareness and Prediction III

DTIC Science & Technology

2010-03-01

source Java ™ library for capturing and sending network packets; 4) Groovy – an open source, Java -based scripting language (version 1.6 or newer). Open...DMOTH Analyzer application. Groovy is an open source dynamic scripting language for the Java Virtual Machine. It is consistent with Java syntax...between temperature, pressure, wind and relative humidity, and 3) a precipitation editing algorithm. The Editor can be used to prepare scripted changes
Investigation, Development, and Evaluation of Performance Proving for Fault-tolerant Computers

NASA Technical Reports Server (NTRS)

Levitt, K. N.; Schwartz, R.; Hare, D.; Moore, J. S.; Melliar-Smith, P. M.; Shostak, R. E.; Boyer, R. S.; Green, M. W.; Elliott, W. D.

1983-01-01

A number of methodologies for verifying systems and computer based tools that assist users in verifying their systems were developed. These tools were applied to verify in part the SIFT ultrareliable aircraft computer. Topics covered included: STP theorem prover; design verification of SIFT; high level language code verification; assembly language level verification; numerical algorithm verification; verification of flight control programs; and verification of hardware logic.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.