poker parallel programming: Topics by Science.gov

Sample records for poker parallel programming

78 FR 40196 - National Environmental Policy Act; Sounding Rockets Program; Poker Flat Research Range

Federal Register 2010, 2011, 2012, 2013, 2014

2013-07-03

...; Sounding Rockets Program; Poker Flat Research Range AGENCY: National Aeronautics and Space Administration... Sounding Rockets Program (SRP) at Poker Flat Research Range (PFRR), Alaska. SUMMARY: Pursuant to the... government agencies, and educational institutions have conducted suborbital rocket launches from the PFRR...
Poker Flats Mine - Div. of Mining, Land, and Water

Science.gov Websites

Lands Coal Regulatory Program Large Mine Permits Mineral Property and Rights Mining Index Land Fishery Water Resources Factsheets Forms banner image of landscape Poker Flats Mine Home Mining Coal Regulatory Program Poker Flats Mine Mining Coal Regulatory Program Info Chickaloon Chuit Watershed Chuitna
77 FR 61642 - National Environmental Policy Act; Sounding Rockets Program; Poker Flat Research Range

Federal Register 2010, 2011, 2012, 2013, 2014

2012-10-10

...; Sounding Rockets Program; Poker Flat Research Range AGENCY: National Aeronautics and Space Administration... Sounding Rockets Program (SRP) at Poker Flat Research Range (PFRR), Alaska. SUMMARY: Pursuant to the... educational institutions have conducted suborbital rocket launches from the PFRR. While the PFRR is owned and...
Arctic Haze: Natural or Pollution?

DTIC Science & Technology

1980-08-01

any of the four GMCC sites (the others are at Mauna Loa , American Samoa, and the South Pole), and has a program of research second in importance only...to Mauna Loa . Similarly, cooperation with Dr. Neal Brown of Poker Flat Research Range will continue. We expect our program at Poker Flat to last many...energetics and mass balance of the 1976 Augustine Volcano eruptions, J. Volcan . Geother. Res., Vol. 6, pp. 139-164, 1979. Shaw, G. E., Aerosols at Mauna
Rocket measurement of auroral partial parallel distribution functions

NASA Astrophysics Data System (ADS)

Lin, C.-A.

1980-01-01

The auroral partial parallel distribution functions are obtained by using the observed energy spectra of electrons. The experiment package was launched by a Nike-Tomahawk rocket from Poker Flat, Alaska over a bright auroral band and covered an altitude range of up to 180 km. Calculated partial distribution functions are presented with emphasis on their slopes. The implications of the slopes are discussed. It should be pointed out that the slope of the partial parallel distribution function obtained from one energy spectra will be changed by superposing another energy spectra on it.
"Don't worry, it's just poker!"--experience, self-rumination and self-reflection as determinants of decision-making in on-line poker.

PubMed

Palomäki, Jussi; Laakasuo, Michael; Salmela, Mikko

2013-09-01

On-line poker is a game of chance and skill. The construct of poker playing skill has both a technical (game strategy-related) and an emotional (emotion regulation-related) aspect. A correlational on-line study (N = 354) was conducted to assess differences in technical skills and emotional characteristics related to poker playing style between experienced and inexperienced poker players. Results suggest that, with respect to emotional characteristics, experienced poker players engage in less self-rumination and more self-reflection, as compared to inexperienced players. Experienced poker players are also able to make better decisions, by mathematical standards, in a poker decision-making environment, as assessed by two fictitious on-line poker decision-making scenarios. Furthermore, this study provides supportive evidence that experienced poker players conceptualize the construct of "luck" differently from inexperienced players. A new poker playing experience scale (PES) for accurately measuring poker playing experience is presented in this paper.
The online poker sub-culture: dialogues, interactions and networks.

PubMed

O'Leary, Killian; Carroll, Conor

2013-12-01

This paper examines the distinct world of online poker. It outlines the online poker eco-system, and the players which inhabit it, their distinctive attitudes and behaviors towards the game and gambling. These unique patterns of behavior have been created and sustained by the interaction of players within online poker forums. Therefore online poker forums were identified as primary mechanism within which a poker sub-culture may exist. The study conducted an extensive netnography of online poker forums. The study found that within the online poker eco-system there are forums which are elevated to sacred status amongst this online poker sub-culture. The members of these forums enact sub-cultural characteristics such as ethos of collaboration/cooperation, and a competitive hierarchy of status. In particular this paper identifies the importance of identity generation, and communities within the online poker eco-system.
What Differentiates Professional Poker Players from Recreational Poker Players? A Qualitative Interview Study

ERIC Educational Resources Information Center

McCormack, Abby; Griffiths, Mark D.

2012-01-01

The popularity of poker (and in particular online poker) has increasingly grown worldwide in recent years. This increase in the popularity of poker has led to the increased incidence of the "professional poker player". However, very little empirical research has been carried out into this relatively new group of gamblers. The aim was to determine…
Are poker players all the same? Latent class analysis.

PubMed

Dufour, Magali; Brunelle, Natacha; Roy, Élise

2015-06-01

Poker is the gambling game that is currently gaining the most in popularity. However, there is little information on poker players' characteristics and risk factors. Furthermore, the first studies described poker players, often recruited in universities, as an homogeneous group who played in only one of the modes (land based or on the Internet). This study aims to identify, through latent class analyses, poker player subgroups. A convenience sample of 258 adult poker players was recruited across Quebec during special events or through advertising in various media. Participants filled out a series of questionnaires (Canadian Problem Gambling Index, Beck Depression, Beck Anxiety, erroneous belief and alcohol/drug consumption). The latent class analysis suggests that there are three classes of poker players. Class I (recreational poker players) includes those who have the lowest probability of engaging intensively in different game modes. Participants in class II (Internet poker players) all play poker on the Internet. This class includes the highest proportion of players who consider themselves experts or professionals. They make a living in part or in whole from poker. Class III (multiform players) includes participants with the broadest variety of poker patterns. This group is complex: these players are positioned halfway between professional and recreational players. Results indicate that poker players are not an homogeneous group identified simply on the basis of the form of poker played. The specific characteristics associated with each subgroup points to vulnerabilities that could potentially be targeted for preventive interventions.
Internet poker websites and pathological gambling prevention policy.

PubMed

Khazaal, Yasser; Chatton, Anne; Bouvard, Audrey; Khiari, Hiba; Achab, Sophia; Zullino, Daniele

2013-03-01

Despite the widespread increase in online poker playing and the risk related to excessive poker playing, research on online poker websites is still lacking with regard to pathological gambling prevention strategies offered by the websites. The aim of the present study was to assess the pathological gambling-related prevention strategies of online poker websites. Two keywords ("poker" and "poker help") were entered into two popular World Wide Web search engines. The first 20 links related to French and English online poker websites were assessed. Seventy-four websites were assessed with a standardized tool designed to rate sites on the basis of accountability, interactivity, prevention strategies, marketing, and messages related to poker strategies. Prevention strategies appeared to be lacking. Whereas a substantial proportion of the websites offered incitation to gambling such as betting "tips," few sites offered strategies to prevent or address problem gambling. Furthermore, strategies related to poker, such as probability estimation, were mostly reported without acknowledging their limitations. Results of this study suggest that more adequate prevention strategies for risky gambling should be developed for online poker.
Poker as a skill game: rational versus irrational behaviors

NASA Astrophysics Data System (ADS)

Javarone, Marco Alberto

2015-03-01

In many countries poker is one of the most popular card games. Although each variant of poker has its own rules, all involve the use of money to make the challenge meaningful. Nowadays, in the collective consciousness, some variants of poker are referred to as games of skill, others as gambling. A poker table can be viewed as a psychology lab, where human behavior can be observed and quantified. This work provides a preliminary analysis of the role of rationality in poker games, using a stylized version of Texas Hold'em. In particular, we compare the performance of two different kinds of players, i.e. rational versus irrational players, during a poker tournament. Results show that these behaviors (i.e. rationality and irrationality) affect both the outcomes of challenges and the way poker should be classified.
Psychopathology of Online Poker Players: Review of Literature

PubMed Central

Moreau, Axelle; Chabrol, Henri; Chauchard, Emeline

2016-01-01

Background and aims Online Texas Hold’em poker has become a spectacular form of entertainment in our society, and the number of people who use this form of gambling is increasing. It seems that online poker activity challenges existing theoretical concepts about problem gambling behaviors. The purpose of this literature review is to provide a current overview about the population of online poker players. Methods To be selected, articles had to focus on psychopathology in a sample of online poker players, be written in English or French, and be published before November 2015. A total of 17 relevant studies were identified. Results In this population, the proportion of problematic gamblers was higher than in other forms of gambling. Several factors predicting excessive gambling were identified such as stress, internal attribution, dissociation, boredom, negative emotions, irrational beliefs, anxiety, and impulsivity. The population of online poker players is largely heterogeneous, with experimental players forming a specific group. Finally, the validity of the tools used to measure excessive or problematic gambling and irrational beliefs are not suitable for assessing online poker activity. Discussion and conclusions Future studies need to confirm previous findings in the literature of online poker games. Given that skills are important in poker playing, skill development in the frames of excessive use of online poker should be explored more in depth, particularly regarding poker experience and loss chasing. Future research should focus on skills, self-regulation, and psychopathology of online poker players. PMID:27348559
Psychopathology of Online Poker Players: Review of Literature.

PubMed

Moreau, Axelle; Chabrol, Henri; Chauchard, Emeline

2016-06-01

Background and aims Online Texas Hold'em poker has become a spectacular form of entertainment in our society, and the number of people who use this form of gambling is increasing. It seems that online poker activity challenges existing theoretical concepts about problem gambling behaviors. The purpose of this literature review is to provide a current overview about the population of online poker players. Methods To be selected, articles had to focus on psychopathology in a sample of online poker players, be written in English or French, and be published before November 2015. A total of 17 relevant studies were identified. Results In this population, the proportion of problematic gamblers was higher than in other forms of gambling. Several factors predicting excessive gambling were identified such as stress, internal attribution, dissociation, boredom, negative emotions, irrational beliefs, anxiety, and impulsivity. The population of online poker players is largely heterogeneous, with experimental players forming a specific group. Finally, the validity of the tools used to measure excessive or problematic gambling and irrational beliefs are not suitable for assessing online poker activity. Discussion and conclusions Future studies need to confirm previous findings in the literature of online poker games. Given that skills are important in poker playing, skill development in the frames of excessive use of online poker should be explored more in depth, particularly regarding poker experience and loss chasing. Future research should focus on skills, self-regulation, and psychopathology of online poker players.
Are online poker problem gamblers sensation seekers?

PubMed

Bonnaire, Céline; Barrault, Servane

2018-06-01

The purpose of this research was to examine the relationship between sensation seeking and online poker gambling in a community sample of adult online poker players, when controlling for age, gender, anxiety and depression. In total, 288 online poker gamblers were recruited. Sociodemographic data, gambling behavior (CPGI), sensation seeking (SSS), depression and anxiety (HADS) were evaluated. Problem online poker gamblers have higher sensation seeking scores (total, thrill and adventure, disinhibition and boredom susceptibility subscores) and depression scores than non-problem online poker gamblers. Being male, with total sensation seeking, disinhibition and depression scores are factors associated with online poker problem gambling. These findings are interesting in terms of harm reduction. For example, because disinhibition could lead to increased time and money spent, protective behavioral strategies like setting time and monetary limits should be encouraged in poker online gamblers. Copyright © 2018 Elsevier B.V. All rights reserved.
The development of alignment turning system for precision len cells

NASA Astrophysics Data System (ADS)

Huang, Chien-Yao; Ho, Cheng-Fang; Wang, Jung-Hsing; Chung, Chien-Kai; Chen, Jun-Cheng; Chang, Keng-Shou; Kuo, Ching-Hsiang; Hsu, Wei-Yao; Chen, Fong-Zhi

2017-08-01

In general, the drop-in and cell-mounted assembly are used for standard and high performance optical system respectively. The optical performance is limited by the residual centration error and position accuracy of the conventional assembly. Recently, the poker chip assembly with high precision lens barrels that can overcome the limitation of conventional assembly is widely applied to ultra-high performance optical system. ITRC also develops the poker chip assembly solution for high numerical aperture objective lenses and lithography projection lenses. In order to achieve high precision lens cell for poker chip assembly, an alignment turning system (ATS) is developed. The ATS includes measurement, alignment and turning modules. The measurement module including a non-contact displacement sensor and an autocollimator can measure centration errors of the top and the bottom surface of a lens respectively. The alignment module comprising tilt and translation stages can align the optical axis of the lens to the rotating axis of the vertical lathe. The key specifications of the ATS are maximum lens diameter, 400mm, and radial and axial runout of the rotary table < 2 μm. The cutting performances of the ATS are surface roughness Ra < 1 μm, flatness < 2 μm, and parallelism < 5 μm. After measurement, alignment and turning processes on our ATS, the centration error of a lens cell with 200mm in diameter can be controlled in 10 arcsec. This paper also presents the thermal expansion of the hydrostatic rotating table. A poker chip assembly lens cell with three sub-cells is accomplished with average transmission centration error in 12.45 arcsec by fresh technicians. The results show that ATS can achieve high assembly efficiency for precision optical systems.
Profiling Online Poker Players: Are Executive Functions Correlated with Poker Ability and Problem Gambling?

PubMed

Schiavella, Mauro; Pelagatti, Matteo; Westin, Jerker; Lepore, Gabriele; Cherubini, Paolo

2018-01-12

Poker playing and responsible gambling both entail the use of the executive functions (EF), which are higher-level cognitive abilities. This study investigated if online poker players of different ability showed different performances in their EF and if so, which functions were the most discriminating for their playing ability. Furthermore, it assessed if the EF performance was correlated to the quality of gambling, according to self-reported questionnaires (PGSI, SOGS, GRCS). Three poker experts evaluated anonymized poker hand history files and, then, a trained professional administered an extensive neuropsychological test battery. Data analysis determined which variables of the tests correlated with poker ability and gambling quality scores. The highest correlations between EF test results and poker ability and between EF test results and gambling quality assessment showed that mostly different clusters of executive functions characterize the profile of the strong(er) poker player and those ones of the problem gamblers (PGSI and SOGS) and the one of the cognitions related to gambling (GRCS). Taking into consideration only the variables overlapping between PGSI and SOGS, we found some key predictive factors for a more risky and harmful online poker playing: a lower performance in the emotional intelligence competences (Emotional Quotient inventory Short) and, in particular, those grouped in the Intrapersonal scale (emotional self-awareness, assertiveness, self-regard, independence and self-actualization).
Decision-Making and Thought Processes among Poker Players

ERIC Educational Resources Information Center

St. Germain, Joseph; Tenenbaum, Gershon

2011-01-01

This study was aimed at delineating decision-making and thought processing among poker players who vary in skill-level. Forty-five participants, 15 in each group, comprised expert, intermediate, and novice poker players. They completed the Computer Poker Simulation Task (CPST) comprised of 60 hands of No-Limit Texas Hold 'Em. During the CPST, they…
Purification and characterization of hemagglutinating proteins from Poker-chip Venus (Meretrix lusoria) and Corbicula clam (Corbicula fluminea).

PubMed

Cheng, Chin-Fu; Hung, Shao-Wen; Chang, Yung-Chung; Chen, Ming-Hui; Chang, Chen-Hsuan; Tsou, Li-Tse; Tu, Ching-Yu; Lin, Yu-Hsing; Liu, Pan-Chen; Lin, Shiun-Long; Wang, Way-Shyan

2012-01-01

Hemagglutinating proteins (HAPs) were purified from Poker-chip Venus (Meretrix lusoria) and Corbicula clam (Corbicula fluminea) using gel-filtration chromatography on a Sephacryl S-300 column. The molecular weights of the HAPs obtained from Poker-chip Venus and Corbicula clam were 358 kDa and 380 kDa, respectively. Purified HAP from Poker-chip Venus yielded two subunits with molecular weights of 26 kDa and 29 kDa. However, only one HAP subunit was purified from Corbicula clam, and its molecular weight was 32 kDa. The two Poker-chip Venus HAPs possessed hemagglutinating ability (HAA) for erythrocytes of some vertebrate animal species, especially tilapia. Moreover, HAA of the HAP purified from Poker-chip Venus was higher than that of the HAP of Corbicula clam. Furthermore, Poker-chip Venus HAPs possessed better HAA at a pH higher than 7.0. When the temperature was at 4°C-10°C or the salinity was less than 0.5‰, the two Poker-chip Venus HAPs possessed better HAA compared with that of Corbicula clam.
Online Poker Gambling in University Students: Further Findings from an Online Survey

ERIC Educational Resources Information Center

Griffiths, Mark; Parke, Jonathan; Wood, Richard; Rigbye, Jane

2010-01-01

Online poker is one of the fastest growing forms of online gambling yet there has been relatively little research to date. This study comprised 422 online poker players (362 males and 60 females) and investigated some of the predicting factors of online poker success and problem gambling using an online questionnaire. Results showed that length of…
Purification and Characterization of Hemagglutinating Proteins from Poker-Chip Venus (Meretrix lusoria) and Corbicula Clam (Corbicula fluminea)

PubMed Central

Cheng, Chin-Fu; Hung, Shao-Wen; Chang, Yung-Chung; Chen, Ming-Hui; Chang, Chen-Hsuan; Tsou, Li-Tse; Tu, Ching-Yu; Lin, Yu-Hsing; Liu, Pan-Chen; Lin, Shiun-Long; Wang, Way-Shyan

2012-01-01

Hemagglutinating proteins (HAPs) were purified from Poker-chip Venus (Meretrix lusoria) and Corbicula clam (Corbicula fluminea) using gel-filtration chromatography on a Sephacryl S-300 column. The molecular weights of the HAPs obtained from Poker-chip Venus and Corbicula clam were 358 kDa and 380 kDa, respectively. Purified HAP from Poker-chip Venus yielded two subunits with molecular weights of 26 kDa and 29 kDa. However, only one HAP subunit was purified from Corbicula clam, and its molecular weight was 32 kDa. The two Poker-chip Venus HAPs possessed hemagglutinating ability (HAA) for erythrocytes of some vertebrate animal species, especially tilapia. Moreover, HAA of the HAP purified from Poker-chip Venus was higher than that of the HAP of Corbicula clam. Furthermore, Poker-chip Venus HAPs possessed better HAA at a pH higher than 7.0. When the temperature was at 4°C–10°C or the salinity was less than 0.5‰, the two Poker-chip Venus HAPs possessed better HAA compared with that of Corbicula clam. PMID:22666167

Predictive factors of excessive online poker playing.

PubMed

Hopley, Anthony A B; Nicki, Richard M

2010-08-01

Despite the widespread rise of online poker playing, there is a paucity of research examining potential predictors for excessive poker playing. The aim of this study was to build on recent research examining motives for Texas Hold'em play in students by determining whether predictors of other kinds of excessive gambling apply to Texas Hold'em. Impulsivity, negative mood states, dissociation, and boredom proneness have been linked to general problem gambling and may play a role in online poker. Participants of this study were self-selected online poker players (N = 179) who completed an online survey. Results revealed that participants played an average of 20 hours of online poker a week and approximately 9% of the sample was classified as a problem gambler according to the Canadian Problem Gambling Index. Problem gambling, in this sample, was uniquely predicted by time played, dissociation, boredom proneness, impulsivity, and negative affective states, namely depression, anxiety, and stress.
77 FR 59611 - Environmental Impacts Statements; Notice of Availability

Federal Register 2010, 2011, 2012, 2013, 2014

2012-09-28

...: Sandy Hurlocker 505-753-7331. EIS No. 20120308, Draft EIS (Tiering), NASA, AK, Sounding Rocket Program (SRP) at Poker Flat Research Range (PFRR), Continuing Sounding Rocket Launches, Alaska, Comment Period...
Cognitive distortions, anxiety, and depression among regular and pathological gambling online poker players.

PubMed

Barrault, Servane; Varescon, Isabelle

2013-03-01

The aims were to assess cognitive distortions and psychological distress (anxiety and depression) among online poker players of different levels of gambling intensity (non-pathological gamblers [NPG], problem gamblers [PbG], and pathological gamblers [PG]), and to examine the relationship between these variables and gambling pathology. Overall, 245 regular online poker players recruited on an Internet forum completed online self-report scales assessing pathological gambling (South Oaks Gambling Screen [SOGS]), psychological distress (Hospital Anxiety and Depression Scale [HADS]) and cognitive distortions (Gambling-Related Cognition Scale). Based on their SOGS scores, poker players were ranked into three groups: NPG (n=146), PbG (n=55), and PG (n=44). All poker players appeared to be more anxious than depressive. PG exhibited higher levels of depression and anxiety than did PbG and NPG. Cognitive distortions also significantly discriminated PG from PbG and NPG. A regression model showed that the perceived inability to stop gambling, the illusion of control, depression (HADS D), and anxiety were good predictors for pathological gambling among poker players. Our results suggest that cognitive distortions play an important role in the development and maintenance of gambling pathology. This study also underlines the role of anxiety and depression in pathological gambling among poker players. It seems relevant to take these elements into account in the research, prevention, and treatment of pathological gambling poker players.
One-Year Prospective Study on Passion and Gambling Problems in Poker Players.

PubMed

Morvannou, Adèle; Dufour, Magali; Brunelle, Natacha; Berbiche, Djamal; Roy, Élise

2018-06-01

The concept of passion is relevant to understanding gambling behaviours and gambling problems. Longitudinal studies are useful to better understand the absence and development of gambling problems; however, only one study has specifically considered poker players. Using a longitudinal design, this study aims to determine the influence, 1 year later, of two forms of passion-harmonious and obsessive-on gambling problems in poker players. A total of 116 poker players was recruited from across Quebec, Canada. The outcome variable of interest was participants' category on the Canadian Pathological Gambling Index, and the predictive variable was the Gambling Passion Scale. Multiple logistic regression analyses were conducted to identify independent risk factors of at-risk poker players 1 year later. Obsessive passion at baseline doubled the risk of gambling problems 1 year later (p < 0.01); for harmonious passion, there was no association. Number of gambling activities, drug problems, and impulsivity were also associated with at-risk gambling. This study highlights the links between obsessive passion and at-risk behaviours among poker players. It is therefore important to prevent the development of obsessive passion among poker players.
Is poker a skill game? New insights from statistical physics

NASA Astrophysics Data System (ADS)

Javarone, Marco Alberto

2015-06-01

During last years poker has gained a lot of prestige in several countries and, besides being one of the most famous card games, it represents a modern challenge for scientists belonging to different communities, spanning from artificial intelligence to physics and from psychology to mathematics. Unlike games like chess, the task of classifying the nature of poker (i.e., as “skill game” or gambling) seems really hard and it also constitutes a current problem, whose solution has several implications. In general, gambling offers equal winning probabilities both to rational players (i.e., those that use a strategy) and to irrational ones (i.e., those without a strategy). Therefore, in order to uncover the nature of poker, a viable way is comparing performances of rational vs. irrational players during a series of challenges. Recently, a work on this topic revealed that rationality is a fundamental ingredient to succeed in poker tournaments. In this study we analyze a simple model of poker challenges by a statistical physics approach, with the aim to uncover the nature of this game. As main result we found that, under particular conditions, few irrational players can turn poker into gambling. Therefore, although rationality is a key ingredient to succeed in poker, also the format of challenges has an important role in these dynamics, as it can strongly influence the underlying nature of the game. The importance of our results lies on the related implications, as for instance in identifying the limits within which poker can be considered as a “skill game” and, as a consequence, which kind of format must be chosen to devise algorithms able to face humans.
Stripped-Down Poker: A Classroom Game with Signaling and Bluffing

ERIC Educational Resources Information Center

Reiley, David H.; Urbancic, Michael B.; Walker, Mark

2008-01-01

The authors present a simplified, "stripped-down" version of poker as an instructional classroom game. Although Stripped-Down Poker is extremely simple, it nevertheless provides an excellent illustration of a number of topics: signaling, bluffing, mixed strategies, the value of information, and Bayes's Rule. The authors begin with a description of…
Opponent Classification in Poker

NASA Astrophysics Data System (ADS)

Ahmad, Muhammad Aurangzeb; Elidrisi, Mohamed

Modeling games has a long history in the Artificial Intelligence community. Most of the games that have been considered solved in AI are perfect information games. Imperfect information games like Poker and Bridge represent a domain where there is a great deal of uncertainty involved and additional challenges with respect to modeling the behavior of the opponent etc. Techniques developed for playing imperfect games also have many real world applications like repeated online auctions, human computer interaction, opponent modeling for military applications etc. In this paper we explore different techniques for playing poker, the core of these techniques is opponent modeling via classifying the behavior of opponent according to classes provided by domain experts. We utilize windows of full observation in the game to classify the opponent. In Poker, the behavior of an opponent is classified into four standard poker-playing styles based on a subjective function.
Texas Hold 'em Online Poker: A Further Examination

ERIC Educational Resources Information Center

Hopley, Anthony A. B.; Dempsey, Kevin; Nicki, Richard

2012-01-01

Playing Texas Hold 'em Online Poker (THOP) is on the rise. However, there is relatively little research examining factors that contribute to problem gambling in poker players. The aim of this study was to extend the research findings of Hopley and Nicki (2010). The negative mood states of depression, anxiety and stress were found to be linked to…
Strategic Style in Pared-Down Poker

NASA Astrophysics Data System (ADS)

Burns, Kevin

This chapter deals with the manner of making diagnoses and decisions, called strategic style, in a gambling game called Pared-down Poker. The approach treats style as a mental mode in which choices are constrained by expected utilities. The focus is on two classes of utility, i.e., money and effort, and how cognitive styles compare to normative strategies in optimizing these utilities. The insights are applied to real-world concerns like managing the war against terror networks and assessing the risks of system failures. After "Introducing the Interactions" involved in playing poker, the contents are arranged in four sections, as follows. "Underpinnings of Utility" outlines four classes of utility and highlights the differences between them: economic utility (money), ergonomic utility (effort), informatic utility (knowledge), and aesthetic utility (pleasure). "Inference and Investment" dissects the cognitive challenges of playing poker and relates them to real-world situations of business and war, where the key tasks are inference (of cards in poker, or strength in war) and investment (of chips in poker, or force in war) to maximize expected utility. "Strategies and Styles" presents normative (optimal) approaches to inference and investment, and compares them to cognitive heuristics by which people play poker--focusing on Bayesian methods and how they differ from human styles. The normative strategy is then pitted against cognitive styles in head-to-head tournaments, and tournaments are also held between different styles. The results show that style is ergonomically efficient and economically effective, i.e., style is smart. "Applying the Analysis" explores how style spaces, of the sort used to model individual behavior in Pared-down Poker, might also be applied to real-world problems where organizations evolve in terror networks and accidents arise from system failures.
Acquisition, development, and maintenance of online poker playing in a student sample.

PubMed

Wood, Richard T A; Griffiths, Mark D; Parke, Jonathan

2007-06-01

To date there has been very little empirical research into Internet gambling and none relating to the recent rise in popularity of online poker. Given that recent reports have claimed that students may be a vulnerable group, the aim of the current study was to establish basic information regarding Internet poker playing behavior among the student population, including various motivators for participation and predictors of problematic play. The study examined a self-selected sample of student online poker players using an online survey (n=422). Results showed that online poker playing was undertaken at least twice per week by a third of the participants. Almost one in five of the sample (18%) was defined as a problem gambler using the DSM-IV criteria. Findings demonstrated that problem gambling in this population was best predicted by negative mood states after playing, gender swapping whilst playing, and playing to escape from problems.
Clausewitz: On Poker, How Today’s Leaders Can Use Poker to Better Prepare Tomorrow’s Warriors

DTIC Science & Technology

2007-01-01

thing applies to combat leaders, and that is why the military could benefit by teaching poker. Rocks, Maniacs, Calling Stations and TA’s Clausewitz...much like churches around the world have done with bingo . Instead of playing for money, perhaps contestants could compete for a 3-day pass, or free
Effects of alcohol, initial gambling outcomes, impulsivity, and gambling cognitions on gambling behavior using a video poker task.

PubMed

Corbin, William R; Cronce, Jessica M

2017-06-01

Drinking and gambling frequently co-occur, and concurrent gambling and drinking may lead to greater negative consequences than either behavior alone. Building on prior research on the effects of alcohol, initial gambling outcomes, impulsivity, and gambling cognitions on gambling behaviors using a chance-based (nonstrategic) slot-machine task, the current study explored the impact of these factors on a skill-based (strategic) video poker task. We anticipated larger average bets and greater gambling persistence under alcohol relative to placebo, and expected alcohol effects to be moderated by initial gambling outcomes, impulsivity, and gambling cognitions. Participants (N = 162; 25.9% female) were randomly assigned to alcohol (target BrAC = .08g%) or placebo and were given $10 to wager on a simulated video poker task, which was programmed to produce 1 of 3 initial outcomes (win, breakeven, or lose) before beginning a progressive loss schedule. Despite evidence for validity of the video poker task and alcohol administration paradigm, primary hypotheses were not supported. Individuals who received alcohol placed smaller wagers than participants in the placebo condition, though this effect was not statistically significant, and the direction of effects was reversed in at-risk gamblers (n = 41). These findings contradict prior research and suggest that alcohol effects on gambling behavior may differ by gambling type (nonstrategic vs. strategic games). Interventions that suggest alcohol is universally disinhibiting may be at odds with young adults' lived experience and thus be less effective than those that recognize the greater complexity of alcohol effects. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Effects of alcohol, initial gambling outcomes, impulsivity and gambling cognitions on gambling behavior using a video poker task

PubMed Central

Corbin, William R.; Cronce, Jessica M.

2017-01-01

Drinking and gambling frequently co-occur, and concurrent gambling and drinking may lead to greater negative consequences than either behavior alone. Building on prior research on the effects of alcohol, initial gambling outcomes, impulsivity, and gambling cognitions on gambling behaviors using a chance-based (non-strategic) slot-machine task, the current study explored the impact of these factors on a skill-based (strategic) video poker task. We anticipated larger average bets and greater gambling persistence under alcohol relative to placebo, and expected alcohol effects to be moderated by initial gambling outcomes, impulsivity, and gambling cognitions. Participants (N=162; 25.9% female) were randomly assigned to alcohol (target BrAC = .08 g%) or placebo and were given $10 to wager on a simulated video poker task, which was programmed to produce 1 of 3 initial outcomes (win, breakeven, or lose) before beginning a progressive loss schedule. Despite evidence for validity of the video poker task and alcohol administration paradigm, primary hypotheses were not supported. Individuals who received alcohol placed smaller wagers than participants in the placebo condition, though this effect was not statistically significant, and the direction of effects was reversed in at-risk gamblers (n=41). These findings contradict prior research and suggest that alcohol effects on gambling behavior may differ by gambling type (non-strategic vs. strategic games). Interventions that suggest alcohol is universally disinhibiting may be at odds with young adults’ lived experience and thus be less effective than those that recognize the greater complexity of alcohol effects. PMID:28493742
Poker, Sports Betting, and Less Popular Alternatives: Status, Friendship Networks, and Male Adolescent Gambling

ERIC Educational Resources Information Center

DiCicco-Bloom, Benjamin; Romer, Daniel

2012-01-01

The authors argue that the recent increase in poker play among adolescent males in the United States was primarily attributable to high-status male youth who are more able to organize "informal" gambling games (e.g., poker and sports betting) than are low-status male youth who are left to gamble on "formal" games (e.g., lotteries and slot…
Clausewitz: On Poker (Clausewitz was a TA). How Today’s Leaders Can Use Poker to Better Prepare Tomorrow’s Warriors

DTIC Science & Technology

2007-01-01

thing applies to combat leaders, and that is why the military could benefit by teaching poker. Rocks, Maniacs, Calling Stations and TA’s Clausewitz...much like churches around the world have done with bingo . Instead of playing for money, perhaps contestants could compete for a 3-day pass, or free
"To Bluff like a Man or Fold like a Girl?" - Gender Biased Deceptive Behavior in Online Poker.

PubMed

Palomäki, Jussi; Yan, Jeff; Modic, David; Laakasuo, Michael

2016-01-01

Evolutionary psychology suggests that men are more likely than women to deceive to bolster their status and influence. Also gender perception influences deceptive behavior, which is linked to pervasive gender stereotypes: women are typically viewed as weaker and more gullible than men. We assessed bluffing in an online experiment (N = 502), where participants made decisions to bluff or not in simulated poker tasks against opponents represented by avatars. Participants bluffed on average 6% more frequently at poker tables with female-only avatars than at tables with male-only or gender mixed avatars-a highly significant effect in games involving repeated decisions. Nonetheless, participants did not believe the avatar genders affected their decisions. Males bluffed 13% more frequently than females. Unlike most economic games employed exclusively in research contexts, online poker is played for money by tens of millions of people worldwide. Thus, gender effects in bluffing have significant monetary consequences for poker players.
AmeriFlux US-Prr Poker Flat Research Range Black Spruce Forest

DOE Data Explorer

Suzuki, Rikie [Japan Agency for Marine-Earth Science and Technology

2016-01-01

This is the AmeriFlux version of the carbon flux data for the site US-Prr Poker Flat Research Range Black Spruce Forest. Site Description - This site is located in a blackspruce forest within the property of the Poker Flat Research Range, University of Alaska, Fairbanks. Time-lapse image of the canopy is measured at the same time to relate flux data to satellite images.
Increased ventral-striatal activity during monetary decision making is a marker of problem poker gambling severity.

PubMed

Brevers, Damien; Noël, Xavier; He, Qinghua; Melrose, James A; Bechara, Antoine

2016-05-01

The aim of this study was to examine the impact of different neural systems on monetary decision making in frequent poker gamblers, who vary in their degree of problem gambling. Fifteen frequent poker players, ranging from non-problem to high-problem gambling, and 15 non-gambler controls were scanned using functional magnetic resonance imaging (fMRI) while performing the Iowa Gambling Task (IGT). During IGT deck selection, between-group fMRI analyses showed that frequent poker gamblers exhibited higher ventral-striatal but lower dorsolateral prefrontal and orbitofrontal activations as compared with controls. Moreover, using functional connectivity analyses, we observed higher ventral-striatal connectivity in poker players, and in regions involved in attentional/motor control (posterior cingulate), visual (occipital gyrus) and auditory (temporal gyrus) processing. In poker gamblers, scores of problem gambling severity were positively associated with ventral-striatal activations and with the connectivity between the ventral-striatum seed and the occipital fusiform gyrus and the middle temporal gyrus. Present results are consistent with findings from recent brain imaging studies showing that gambling disorder is associated with heightened motivational-reward processes during monetary decision making, which may hamper one's ability to moderate his level of monetary risk taking. © 2015 Society for the Study of Addiction.
Emotional and Social Factors influence Poker Decision Making Accuracy.

PubMed

Laakasuo, Michael; Palomäki, Jussi; Salmela, Mikko

2015-09-01

Poker is a social game, where success depends on both game strategic knowledge and emotion regulation abilities. Thus, poker provides a productive environment for studying the effects of emotional and social factors on micro-economic decision making. Previous research indicates that experiencing negative emotions, such as moral anger, reduces mathematical accuracy in poker decision making. Furthermore, various social aspects of the game—such as losing against "bad players" due to "bad luck"—seem to fuel these emotional states. We designed an Internet-based experiment, where participants' (N = 459) mathematical accuracy in five different poker decision making tasks were assessed. In addition, we manipulated the emotional and social conditions under which the tasks were presented, in a 2 × 2 experimental setup: (1) Anger versus neutral emotional state—participants were primed either with an anger-inducing, or emotionally neutral story, and (2) Social cue versus non-social cue—during the tasks, either an image of a pair of human eyes was "following" the mouse cursor, or an image of a black moving box was presented. The results showed that anger reduced mathematical accuracy of decision making only when participants were "being watched" by a pair of moving eyes. Experienced poker players made mathematically more accurate decisions than inexperienced ones. The results contribute to current understanding on how emotional and social factors influence decision making accuracy in economic games.
Trait Mindfulness, Problem-Gambling Severity, Altered State of Awareness and Urge to Gamble in Poker-Machine Gamblers.

PubMed

McKeith, Charles F A; Rock, Adam J; Clark, Gavin I

2017-06-01

In Australia, poker-machine gamblers represent a disproportionate number of problem gamblers. To cultivate a greater understanding of the psychological mechanisms involved in poker-machine gambling, a repeated measures cue-reactivity protocol was administered. A community sample of 38 poker-machine gamblers was assessed for problem-gambling severity and trait mindfulness. Participants were also assessed regarding altered state of awareness (ASA) and urge to gamble at baseline, following a neutral cue, and following a gambling cue. Results indicated that: (a) urge to gamble significantly increased from neutral cue to gambling cue, while controlling for baseline urge; (b) cue-reactive ASA did not significantly mediate the relationship between problem-gambling severity and cue-reactive urge (from neutral cue to gambling cue); (c) trait mindfulness was significantly negatively associated with both problem-gambling severity and cue-reactive urge (i.e., from neutral cue to gambling cue, while controlling for baseline urge); and (d) trait mindfulness did not significantly moderate the effect of problem-gambling severity on cue-reactive urge (from neutral cue to gambling cue). This is the first study to demonstrate a negative association between trait mindfulness and cue-reactive urge to gamble in a population of poker-machine gamblers. Thus, this association merits further evaluation both in relation to poker-machine gambling and other gambling modalities.

Anxiety, Depression and Emotion Regulation Among Regular Online Poker Players.

PubMed

Barrault, Servane; Bonnaire, Céline; Herrmann, Florian

2017-12-01

Poker is a type of gambling that has specific features, including the need to regulate one's emotion to be successful. The aim of the present study is to assess emotion regulation, anxiety and depression in a sample of regular poker players, and to compare the results of problem and non-problem gamblers. 416 regular online poker players completed online questionnaires including sociodemographic data, measures of problem gambling (CPGI), anxiety and depression (HAD scale), and emotion regulation (ERQ). The CPGI was used to divide participants into four groups according to the intensity of their gambling practice (non-problem, low risk, moderate risk and problem gamblers). Anxiety and depression were significantly higher among severe-problem gamblers than among the other groups. Both significantly predicted problem gambling. On the other hand, there was no difference between groups in emotion regulation (cognitive reappraisal and expressive suppression), which was linked neither to problem gambling nor to anxiety and depression (except for cognitive reappraisal, which was significantly correlated to anxiety). Our results underline the links between anxiety, depression and problem gambling among poker players. If emotion regulation is involved in problem gambling among poker players, as strongly suggested by data from the literature, the emotion regulation strategies we assessed (cognitive reappraisal and expressive suppression) may not be those involved. Further studies are thus needed to investigate the involvement of other emotion regulation strategies.
Is poker a game of skill or chance? A quasi-experimental study.

PubMed

Meyer, Gerhard; von Meduna, Marc; Brosowski, Tim; Hayer, Tobias

2013-09-01

Due to intensive marketing and the rapid growth of online gambling, poker currently enjoys great popularity among large sections of the population. Although poker is legally a game of chance in most countries, some (particularly operators of private poker web sites) argue that it should be regarded as a game of skill or sport because the outcome of the game primarily depends on individual aptitude and skill. The available findings indicate that skill plays a meaningful role; however, serious methodological weaknesses and the absence of reliable information regarding the relative importance of chance and skill considerably limit the validity of extant research. Adopting a quasi-experimental approach, the present study examined the extent to which the influence of poker playing skill was more important than card distribution. Three average players and three experts sat down at a six-player table and played 60 computer-based hands of the poker variant "Texas Hold'em" for money. In each hand, one of the average players and one expert received (a) better-than-average cards (winner's box), (b) average cards (neutral box) and (c) worse-than-average cards (loser's box). The standardized manipulation of the card distribution controlled the factor of chance to determine differences in performance between the average and expert groups. Overall, 150 individuals participated in a "fixed-limit" game variant, and 150 individuals participated in a "no-limit" game variant. ANOVA results showed that experts did not outperform average players in terms of final cash balance. Rather, card distribution was the decisive factor for successful poker playing. However, expert players were better able to minimize losses when confronted with disadvantageous conditions (i.e., worse-than-average cards). No significant differences were observed between the game variants. Furthermore, supplementary analyses confirm differential game-related actions dependent on the card distribution, player status, and game variant. In conclusion, the study findings indicate that poker should be regarded as a game of chance, at least under certain basic conditions, and suggest new directions for further research.
An examination of participation in online gambling activities and the relationship with problem gambling.

PubMed

McCormack, Abby; Shorter, Gillian W; Griffiths, Mark D

2013-03-01

Background and aims Online gambling participation is increasing rapidly, with relatively little research about the possible effects of different gambling activities on problem gambling behaviour. The aim of this exploratory study was to examine the participation in online gambling activities and the relationship with problem gambling among an international sample of online gamblers. Methods An online gambling survey was posted on 32 international gambling websites and resulted in 1,119 respondents over a four-month period. Results Poker was the most popular gambling activity online. A number of online activities were associated with problem gambling, including: roulette, poker, horse race betting, sports betting, spread betting and fruit (slot) machines. Not surprisingly, those that gambled on these activities regularly (except poker) were more likely to be a problem gambler, however, what is interesting is that the reverse is true for poker players; those that gambled regularly on poker were less likely to be a problem gambler compared to the non-regular poker players. The majority of the players also gambled offline, but there was no relationship between problem gambling and whether or not a person also gambled offline. Discussion Problem gambling is associated more with certain online gambling activities than others, and those gambling on two or more activities online were more likely to be a problem gambler. Conclusion This paper can help explain the impact different online gambling activities may have on gambling behaviour. Consideration needs to be given to the gambling activity when developing and implementing treatment programmes.
76 FR 20715 - National Environmental Policy Act; Sounding Rockets Program; Poker Flat Research Range

Federal Register 2010, 2011, 2012, 2013, 2014

2011-04-13

...), and NASA's NEPA policy and procedures (14 CFR part 1216, subpart 1216.3), NASA intends to prepare an... to apprise interested agencies, organizations, tribal governments, and individuals of NASA's intent... significant environmental issues to be evaluated in the EIS. In cooperation with BLM, UAF, and USFWS, NASA...
A computer model for high-latitude phase scintillation based on wideband satellite data from Poker Flat

NASA Astrophysics Data System (ADS)

Fremouw, E. J.; Lansinger, J. M.

1981-02-01

A mathematical model has been developed for describing plasma-density irregularities responsible for radiowave scintillation produced in the auroral ionosphere, and the model has been committed to an applications-oriented computer code, WBMOD. The model characterizes the three-dimensional configuration, gradient sharpness, and height-integrated strength of irregularities represented by a power-law spatial spectrum as functions of geomagnetic latitude, time of day, sunspot number, and planetary geomagnetic activity index. Program WBMOD permits calculation of the power-law index and spectral strength (at a fluctuation frequency of 1 Hz) of phase scintillation, together with scintillation indices (variances) for phase and intensity, using a phase-screen scattering theory. The model has been calibrated and iteratively tested against phase-scintillation data from the DNA Wideband Satellite Experiment, collected at Poker Flat, Alaska. It does not account for seasonal variations in high-latitude scintillation observed in other longitude sectors. The program contains a model for middle-latitude and equatorial irregularities as well as for auroral latitudes, but only the latter has been tested extensively against high-quality scintillation data.
DeepStack: Expert-level artificial intelligence in heads-up no-limit poker.

PubMed

Moravčík, Matej; Schmid, Martin; Burch, Neil; Lisý, Viliam; Morrill, Dustin; Bard, Nolan; Davis, Trevor; Waugh, Kevin; Johanson, Michael; Bowling, Michael

2017-05-05

Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker, the quintessential game of imperfect information, is a long-standing challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect-information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated, with statistical significance, professional poker players in heads-up no-limit Texas hold'em. The approach is theoretically sound and is shown to produce strategies that are more difficult to exploit than prior approaches. Copyright © 2017, American Association for the Advancement of Science.
Beyond Chance? The Persistence of Performance in Online Poker

PubMed Central

Potter van Loon, Rogier J. D.; van den Assem, Martijn J.; van Dolder, Dennie

2015-01-01

A major issue in the widespread controversy about the legality of poker and the appropriate taxation of winnings is whether poker should be considered a game of skill or a game of chance. To inform this debate we present an analysis into the role of skill in the performance of online poker players, using a large database with hundreds of millions of player-hand observations from real money ring games at three different stakes levels. We find that players whose earlier profitability was in the top (bottom) deciles perform better (worse) and are substantially more likely to end up in the top (bottom) performance deciles of the following time period. Regression analyses of performance on historical performance and other skill-related proxies provide further evidence for persistence and predictability. Simulations point out that skill dominates chance when performance is measured over 1,500 or more hands of play. PMID:25729862
Gambling-Related Cognition Scale (GRCS): Are skills-based games at a disadvantage?

PubMed

Lévesque, David; Sévigny, Serge; Giroux, Isabelle; Jacques, Christian

2017-09-01

The Gambling-Related Cognition Scale (GRCS; Raylu & Oei, 2004) was developed to evaluate gambling-related cognitive distortions for all types of gamblers, regardless of their gambling activities (poker, slot machine, etc.). It is therefore imperative to ascertain the validity of its interpretation across different types of gamblers; however, some skills-related items endorsed by players could be interpreted as a cognitive distortion despite the fact that they play skills-related games. Using an intergroup (168 poker players and 73 video lottery terminal [VLT] players) differential item functioning (DIF) analysis, this study examined the possible manifestation of item biases associated with the GRCS. DIF was analyzed with ordinal logistic regressions (OLRs) and Ramsay's (1991) nonparametric kernel smoothing approach with TestGraf. Results show that half of the items display at least moderate DIF between groups and, depending on the type of analysis used, 3 to 7 items displayed large DIF. The 5 items with the most DIF were more significantly endorsed by poker players (uniform DIF) and were all related to skills, knowledge, learning, or probabilities. Poker players' interpretations of some skills-related items may lead to an overestimation of their cognitive distortions due to their total score increased by measurement artifact. Findings indicate that the current structure of the GRCS contains potential biases to be considered when poker players are surveyed. The present study conveys new and important information on bias issues to ponder carefully before using and interpreting the GRCS and other similar wide-range instruments with poker players. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Determinants of internet poker adoption.

PubMed

Philander, Kahlil S; Abarbanel, B Lillian

2014-09-01

In nearly all jurisdictions, adoption of a new form of gambling has been a controversial and contentious subject. Online gambling has been no different, though there are many aspects that affect online gambling that do not appear in the brick and mortar environment. This study seeks to identify whether demographic, economic, political, technological, and/or sociological determinants contribute to online poker gambling adoption. A theoretical discussion of these categories' importance to online poker is provided and exploratory empirical analysis is used to examine their potential validity. The analysis revealed support for all of the proposed categories of variables thought to be predictive of online gambling legality.
Pinochle Poker: An Activity for Counting and Probability

ERIC Educational Resources Information Center

Wroughton, Jacqueline; Nolan, Joseph

2012-01-01

Understanding counting rules is challenging for students; in particular, they struggle with determining when and how to implement combinations, permutations, and the multiplication rule as tools for counting large sets and computing probability. We present an activity--using ideas from the games of poker and pinochle--designed to help students…
Exploring Chemical Equilibrium with Poker Chips: A General Chemistry Laboratory Exercise

ERIC Educational Resources Information Center

Bindel, Thomas H.

2012-01-01

A hands-on laboratory exercise at the general chemistry level introduces students to chemical equilibrium through a simulation that uses poker chips and rate equations. More specifically, the exercise allows students to explore reaction tables, dynamic chemical equilibrium, equilibrium constant expressions, and the equilibrium constant based on…
Strategic Alliance Poker: Demonstrating the Importance of Complementary Resources and Trust in Strategic Alliance Management

ERIC Educational Resources Information Center

Reutzel, Christopher R.; Worthington, William J.; Collins, Jamie D.

2012-01-01

Strategic Alliance Poker (SAP) provides instructors with an opportunity to integrate the resource based view with their discussion of strategic alliances in undergraduate Strategic Management courses. Specifically, SAP provides Strategic Management instructors with an experiential exercise that can be used to illustrate the value creation…
Build a Winning Hand

ERIC Educational Resources Information Center

Glover, Sandy

2005-01-01

What do poker and bridge have in common? Both are card games that originated in Europe (although poker's modern form developed in the frontier towns of the American West, while bridge still reflects its British heritage). Both use a regular 52-card playing deck, both involve bidding, and both have experienced renewed popularity in recent years.…
Selling Internet Gambling: Advertising, New Media and the Content of Poker Promotion

ERIC Educational Resources Information Center

McMullan, John L.; Kervin, Melissa

2012-01-01

This study examines the web design and engineering, advertising and marketing, and pedagogical features present at a random sample of 71 international poker sites obtained from the Casino City directory in the summer of 2009. We coded for 22 variables related to access, appeal, player protection, customer services, on-site security, use of images,…
Bonanza Creek Experimental Forest & Caribou-Poker Creeks Research Watershed.

Treesearch

Valerie Rapp

2003-01-01

Bonanza Creek Experimental Forest and Caribou-Poker Creeks Research Watershed are located in the boreal forest of interior Alaska. Research focuses on basic ecological processes, hydrology, disturbance regimes, and climate change in the boreal forest region. Interior Alaska lies between the Alaska Range to the south and the Brooks Range to the north and covers an area...
78 FR 40391 - Special Local Regulations; Dinghy Poker Run, Middle River; Baltimore County, Essex, MD

Federal Register 2010, 2011, 2012, 2013, 2014

2013-07-05

... the rule is to ensure safety of life on navigable waters of the United States during the Dinghy Poker...: Coast Guard, DHS. ACTION: Temporary final rule. SUMMARY: The Coast Guard proposes to establish special... River. These special local regulations are necessary to provide for the safety of life on navigable...
The Effect of Near-Miss Rate and Card Control when American Indians and Non-Indians Gamble in a Laboratory Situation: The Influence of Alcohol

ERIC Educational Resources Information Center

Whitton, Melissa; Weatherly, Jeffrey N.

2009-01-01

Twelve American Indian (AI) and 12 non-AI participants gambled on a slot-machine simulation and on video poker. Prior to the gambling sessions, half of the participants consumed alcohol while the other half consumed a placebo beverage. They then played the slot-machine simulation three times, with the percentage of programmed "near misses" varying…
Effects of Testosterone Administration on Strategic Gambling in Poker Play.

PubMed

van Honk, Jack; Will, Geert-Jan; Terburg, David; Raub, Werner; Eisenegger, Christoph; Buskens, Vincent

2016-01-04

Testosterone has been associated with economically egoistic and materialistic behaviors, but -defensibly driven by reputable status seeking- also with economically fair, generous and cooperative behaviors. Problematically, social status and economic resources are inextricably intertwined in humans, thus testosterone's primal motives are concealed. We critically addressed this issue by performing a placebo-controlled single-dose testosterone administration in young women, who played a game of bluff poker wherein concerns for status and resources collide. The profit-maximizing strategy in this game is to mislead the other players by bluffing randomly (independent of strength of the hand), thus also when holding very poor cards (cold bluffing). The profit-maximizing strategy also dictates the players in this poker game to never call the other players' bluffs. For reputable-status seeking these materialistic strategies are disadvantageous; firstly, being caught cold bluffing damages one's reputation by revealing deceptive intent, and secondly, not calling the other players' bluffs signals submission in blindly tolerating deception. Here we show that testosterone administration in this game of bluff poker significantly reduces random bluffing, as well as cold bluffing, while significantly increasing calling. Our data suggest that testosterone in humans primarily motivates for reputable-status seeking, even when this elicits behaviors that are economically disadvantageous.
Synergistic Information Processing Encrypts Strategic Reasoning in Poker.

PubMed

Frey, Seth; Albino, Dominic K; Williams, Paul L

2018-06-14

There is a tendency in decision-making research to treat uncertainty only as a problem to be overcome. But it is also a feature that can be leveraged, particularly in social interaction. Comparing the behavior of profitable and unprofitable poker players, we reveal a strategic use of information processing that keeps decision makers unpredictable. To win at poker, a player must exploit public signals from others. But using public inputs makes it easier for an observer to reconstruct that player's strategy and predict his or her behavior. How should players trade off between exploiting profitable opportunities and remaining unexploitable themselves? Using a recent multivariate approach to information theoretic data analysis and 1.75 million hands of online two-player No-Limit Texas Hold'em, we find that the important difference between winning and losing players is not in the amount of information they process, but how they process it. In particular, winning players are better at integrative information processing-creating new information from the interaction between their cards and their opponents' signals. We argue that integrative information processing does not just produce better decisions, it makes decision-making harder for others to reverse engineer, as an expert poker player's cards act like the private key in public-key cryptography. Poker players encrypt their reasoning with the way they process information. The encryption function of integrative information processing makes it possible for players to exploit others while remaining unexploitable. By recognizing the act of information processing as a strategic behavior in its own right, we offer a detailed account of how experts use endemic uncertainty to conceal their intentions in high-stakes competitive environments, and we highlight new opportunities between cognitive science, information theory, and game theory. Copyright © 2018 Cognitive Science Society, Inc.
Cue-Reactive Altered State of Consciousness Mediates the Relationship Between Problem-Gambling Severity and Cue-Reactive Urge in Poker-Machine Gamblers.

PubMed

Tricker, Christopher; Rock, Adam J; Clark, Gavin I

2016-06-01

In order to enhance our understanding of the nature of poker-machine problem-gambling, a community sample of 37 poker-machine gamblers (M age = 32 years, M PGSI = 5; PGSI = Problem Gambling Severity Index) were assessed for urge to gamble (responses on a visual analogue scale) and altered state of consciousness (assessed by the Altered State of Awareness dimension of the Phenomenology of Consciousness Inventory) at baseline, after a neutral cue, and after a gambling cue. It was found that (a) problem-gambling severity (PGSI score) predicted increase in urge (from neutral cue to gambling cue, controlling for baseline; sr (2) = .19, p = .006) and increase in altered state of consciousness (from neutral cue to gambling cue, controlling for baseline; sr (2) = .57, p < .001), and (b) increase in altered state of consciousness (from neutral cue to gambling cue) mediated the relationship between problem-gambling severity and increase in urge (from neutral cue to gambling cue; κ(2) = .40, 99 % CI [.08, .71]). These findings suggest that cue-reactive altered state of consciousness is an important component of cue-reactive urge in poker-machine problem-gamblers.

Precision lens assembly with alignment turning system

NASA Astrophysics Data System (ADS)

Ho, Cheng-Fang; Huang, Chien-Yao; Lin, Yi-Hao; Kuo, Hui-Jean; Kuo, Ching-Hsiang; Hsu, Wei-Yao; Chen, Fong-Zhi

2017-10-01

The poker chip assembly with high precision lens barrels is widely applied to ultra-high performance optical system. ITRC applies the poker chip assembly technology to the high numerical aperture objective lenses and lithography projection lenses because of its high efficiency assembly process. In order to achieve high precision lens cell for poker chip assembly, an alignment turning system (ATS) is developed. The ATS includes measurement, alignment and turning modules. The measurement module is equipped with a non-contact displacement sensor (NCDS) and an autocollimator (ACM). The NCDS and ACM are used to measure centration errors of the top and the bottom surface of a lens respectively; then the amount of adjustment of displacement and tilt with respect to the rotational axis of the turning machine for the alignment module can be determined. After measurement, alignment and turning processes on the ATS, the centration error of a lens cell with 200 mm in diameter can be controlled within 10 arcsec. Furthermore, a poker chip assembly lens cell with three sub-cells is demonstrated, each sub-cells are measured and accomplished with alignment and turning processes. The lens assembly test for five times by each three technicians; the average transmission centration error of assembly lens is 12.45 arcsec. The results show that ATS can achieve high assembly efficiency for precision optical systems.
Effects of Testosterone Administration on Strategic Gambling in Poker Play

PubMed Central

van Honk, Jack; Will, Geert-Jan; Terburg, David; Raub, Werner; Eisenegger, Christoph; Buskens, Vincent

2016-01-01

Testosterone has been associated with economically egoistic and materialistic behaviors, but -defensibly driven by reputable status seeking- also with economically fair, generous and cooperative behaviors. Problematically, social status and economic resources are inextricably intertwined in humans, thus testosterone’s primal motives are concealed. We critically addressed this issue by performing a placebo-controlled single-dose testosterone administration in young women, who played a game of bluff poker wherein concerns for status and resources collide. The profit-maximizing strategy in this game is to mislead the other players by bluffing randomly (independent of strength of the hand), thus also when holding very poor cards (cold bluffing). The profit-maximizing strategy also dictates the players in this poker game to never call the other players’ bluffs. For reputable-status seeking these materialistic strategies are disadvantageous; firstly, being caught cold bluffing damages one’s reputation by revealing deceptive intent, and secondly, not calling the other players’ bluffs signals submission in blindly tolerating deception. Here we show that testosterone administration in this game of bluff poker significantly reduces random bluffing, as well as cold bluffing, while significantly increasing calling. Our data suggest that testosterone in humans primarily motivates for reputable-status seeking, even when this elicits behaviors that are economically disadvantageous. PMID:26727636
SAM II aerosol profile measurements, Poker Flat, Alaska; July 16-19, 1979

NASA Technical Reports Server (NTRS)

Mccormick, M. P.; Chu, W. P.; Mcmaster, L. R.; Grams, G. W.; Herman, B. M.; Pepin, T. J.; Russell, P. B.; Swissler, T. J.

1981-01-01

SAM II satellite measurements during the July 1979 Poker Flat mission, yielded an aerosol extinction coefficient of 0.0004/km at 1.0 micron wavelength, in the region of the stratospheric aerosol mixing ratio peak (12-16 km). The stratospheric aerosol optical depth for these data, calculated from the tropopause through 30 km, is approximately 0.001. These results are consistent with the average 1979 summertime values found throughout the Arctic.
Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.

PubMed

Brown, Noam; Sandholm, Tuomas

2018-01-26

No-limit Texas hold'em is the most popular form of poker. Despite artificial intelligence (AI) successes in perfect-information games, the private information and massive game tree have made no-limit poker difficult to tackle. We present Libratus, an AI that, in a 120,000-hand competition, defeated four top human specialist professionals in heads-up no-limit Texas hold'em, the leading benchmark and long-standing challenge problem in imperfect-information game solving. Our game-theoretic approach features application-independent techniques: an algorithm for computing a blueprint for the overall strategy, an algorithm that fleshes out the details of the strategy for subgames that are reached during play, and a self-improver algorithm that fixes potential weaknesses that opponents have identified in the blueprint strategy. Copyright © 2018, The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Factor Structure of the Internet Addiction Test in Online Gamers and Poker Players.

PubMed

Khazaal, Yasser; Achab, Sophia; Billieux, Joel; Thorens, Gabriel; Zullino, Daniele; Dufour, Magali; Rothen, Stéphane

2015-01-01

The Internet Addiction Test (IAT) is the most widely used questionnaire to screen for problematic Internet use. Nevertheless, its factorial structure is still debated, which complicates comparisons among existing studies. Most previous studies were performed with students or community samples despite the probability of there being more problematic Internet use among users of specific applications, such as online gaming or gambling. To assess the factorial structure of a modified version of the IAT that addresses specific applications, such as video games and online poker. Two adult samples-one sample of Internet gamers (n=920) and one sample of online poker players (n=214)-were recruited and completed an online version of the modified IAT. Both samples were split into two subsamples. Two principal component analyses (PCAs) followed by two confirmatory factor analyses (CFAs) were run separately. The results of principal component analysis indicated that a one-factor model fit the data well across both samples. In consideration of the weakness of some IAT items, a 17-item modified version of the IAT was proposed. This study assessed, for the first time, the factorial structure of a modified version of an Internet-administered IAT on a sample of Internet gamers and a sample of online poker players. The scale seems appropriate for the assessment of such online behaviors. Further studies on the modified 17-item IAT version are needed.
Factor Structure of the Internet Addiction Test in Online Gamers and Poker Players

PubMed Central

Achab, Sophia; Billieux, Joel; Thorens, Gabriel; Zullino, Daniele; Dufour, Magali; Rothen, Stéphane

2015-01-01

Background The Internet Addiction Test (IAT) is the most widely used questionnaire to screen for problematic Internet use. Nevertheless, its factorial structure is still debated, which complicates comparisons among existing studies. Most previous studies were performed with students or community samples despite the probability of there being more problematic Internet use among users of specific applications, such as online gaming or gambling. Objective To assess the factorial structure of a modified version of the IAT that addresses specific applications, such as video games and online poker. Methods Two adult samples—one sample of Internet gamers (n=920) and one sample of online poker players (n=214)—were recruited and completed an online version of the modified IAT. Both samples were split into two subsamples. Two principal component analyses (PCAs) followed by two confirmatory factor analyses (CFAs) were run separately. Results The results of principal component analysis indicated that a one-factor model fit the data well across both samples. In consideration of the weakness of some IAT items, a 17-item modified version of the IAT was proposed. Conclusions This study assessed, for the first time, the factorial structure of a modified version of an Internet-administered IAT on a sample of Internet gamers and a sample of online poker players. The scale seems appropriate for the assessment of such online behaviors. Further studies on the modified 17-item IAT version are needed. PMID:26543917
Social and psychological challenges of poker.

PubMed

Siler, Kyle

2010-09-01

Poker is a competitive, social game of skill and luck, which presents players with numerous challenging strategic and interpersonal decisions. The adaptation of poker into a game played over the internet provides the unprecedented opportunity to quantitatively analyze extremely large numbers of hands and players. This paper analyzes roughly twenty-seven million hands played online in small-stakes, medium-stakes and high-stakes games. Using PokerTracker software, statistics are generated to (a) gauge the types of strategies utilized by players (i.e. the 'strategic demography') at each level and (b) examine the various payoffs associated with different strategies at varying levels of play. The results show that competitive edges attenuate as one moves up levels, and tight-aggressive strategies--which tend to be the most remunerative--become more prevalent. Further, payoffs for different combinations of cards, varies between levels, showing how strategic payoffs are derived from competitive interactions. Smaller-stakes players also have more difficulty appropriately weighting incentive structures with frequent small gains and occasional large losses. Consequently, the relationship between winning a large proportion of hands and profitability is negative, and is strongest in small-stakes games. These variations reveal a meta-game of rationality and psychology which underlies the card game. Adopting risk-neutrality to maximize expected value, aggression and appropriate mental accounting, are cognitive burdens on players, and underpin the rationality work--reconfiguring of personal preferences and goals--players engage into be competitive, and maximize their winning and profit chances.
Preliminary Results of the VLFE Quadrupole Instrumentation From The PARX Sounding Rocket

NASA Astrophysics Data System (ADS)

Reinleitner, L. A.; Holzworth, R. H.; Meadows, A. L.

2003-12-01

The NASA Pulsating Auroral Rocket eXperiment (PARX - March '97 from Poker Flat, AK) was equipped with 4 electric field probes oriented (X and Y) perpendicular to the ambient magnetic field, and one probe (along the Z axis) to obtain the parallel electric field. The rocket also included a three-axis VLF search coil magnetometer. The VLF measurements for both instruments were from 100 Hz - 8 KHz. Additionally, the electric field information was used onboard the rocket to obtain the "quadrupole" electric field, defined to be {(V1+V2) - (V3+V4)}/2d, which shows significant response only to short wavelength waves. This instrumentation clearly shows the long wavelength nature of features tentatively described as auroral hiss, and the shorter wavelength nature of the electrostatic and/or quasi-electrostatic waves.
CEDAR/TIMED: Thermospheric Vertical Wind Observations from Three Sites in the Northern Auroral Zone

NASA Technical Reports Server (NTRS)

Lummerzheim, D.

2005-01-01

The objective of this project was to operate ground based Fabry-Perot Interferometers at several points under the auroral zone to analyze and quantify the vertical wind in the thermosphere. These measurements were made in conjunction with TIMED, especially GUVI data, to relate the observed wind to the resulting mixing and compositional changes in the thermosphere. The ground based wind measurements were obtained from a scanning Doppler imager (SDI) in Poker Flat, and a vertically aligned Fabry Perot Imager (FPI) in Inuvik. A third FPI at Eagle, Alaska, was operated for a brief overlapping period as well. The SDI at Poker Flat had been in operation for several years, and was continued to run with little support from this grant. The much more expensive operation, maintenance, and data acquisition of the remote Inuvik FPI was made possible with funds from this project. During the 2003/2004 and 2004/2005 seasons, we operated the Inuvik FPI from September to April during hours of darkness. Two trips to service the instrument were required per year, and a local caretaker was funded to help keep the instrument going during the winter seasons. The data were transfered via modem and phone line to Poker Flat and were then analyzed to obtain wind and temperature at the altitude of the auroral green line OI(557.7 nm). The final data product was archived and transferred to the GEDDS system at Poker Flat were it is available on the web: http://gedds.pfrr.alaska.edu/. The data set is also available from the CEDAR data base: http://cedarweb.hao.ucar.edu/.
Acceleration of barium ions near 8000 km above an aurora

NASA Technical Reports Server (NTRS)

Stenbaek-Nielsen, H. C.; Hallinan, T. J.; Wescott, E. M.; Foeppl, H.

1984-01-01

A barium shaped charge, named Limerick, was released from a rocket launched from Poker Flat Research Range, Alaska, on March 30, 1982, at 1033 UT. The release took place in a small auroral breakup. The jet of ionized barium reached an altitude of 8100 km 14.5 min after release, indicating that there were no parallel electric fields below this altitude. At 8100 km the jet appeared to stop. Analysis shows that the barium at this altitude was effectively removed from the tip. It is concluded that the barium was actually accelerated upward, resulting in a large decrease in the line-of-sight density and hence the optical intensity. The parallel electric potential in the acceleration region must have been greater than 1 kV over an altitude interval of less than 200 km. The acceleration region, although presumably auroral in origin, did not seem to be related to individual auroral structures, but appeared to be a large-scale horizontal structure. The perpendicular electric field below, as deduced from the drift of the barium, was temporally and spatially very uniform and showed no variation related to individual auroral structures passing through.
Alaskan Auroral All-Sky Images on the World Wide Web

NASA Technical Reports Server (NTRS)

Stenbaek-Nielsen, H. C.

1997-01-01

In response to a 1995 NASA SPDS announcement of support for preservation and distribution of important data sets online, the Geophysical Institute, University of Alaska Fairbanks, Alaska, proposed to provide World Wide Web access to the Poker Flat Auroral All-sky Camera images in real time. The Poker auroral all-sky camera is located in the Davis Science Operation Center at Poker Flat Rocket Range about 30 miles north-east of Fairbanks, Alaska, and is connected, through a microwave link, with the Geophysical Institute where we maintain the data base linked to the Web. To protect the low light-level all-sky TV camera from damage due to excessive light, we only operate during the winter season when the moon is down. The camera and data acquisition is now fully computer controlled. Digital images are transmitted each minute to the Web linked data base where the data are available in a number of different presentations: (1) Individual JPEG compressed images (1 minute resolution); (2) Time lapse MPEG movie of the stored images; and (3) A meridional plot of the entire night activity.
Magnetosphere-Ionosphere Coupling During a Geomagnetic Substorm on March 1, 2017

NASA Astrophysics Data System (ADS)

Coster, A. J.; Hampton, D. L.; Sazykin, S. Y.; Wolf, R.; Huba, J.; Varney, R. H.; Reimer, A.; Lynch, K. A.; Samara, M.; Michell, R.

2017-12-01

On March 1, 2017, at approximately 10 UT, magnetometers at Ft Yukon and Poker Flat in Alaska measured the classic signature of an auroral substorm: a rapid decrease in the northward component of the magnetic field. Nearby, a camera at Venetie Alaska captured intensive visual brightening of multiple auroral arcs at approximately the same time. Our data and model analysis focuses on this time period. We are taking advantage of the extensive instrumentation that was in place in Northern Alaska on this date due to the ISINGLASS rocket campaign. Although no rockets were flown on March 1, 2017, this substorm was monitored at Poker by the three-filter all-sky survey and at Venetie by three all-sky cameras running simultaneously with each filtered for a different wavelength. Our analysis includes co-incidental high precision GNSS receiver data providing total electron content (TEC) measurements during the overhead auroral arcs. The receiver at Venetie also monitored L-band scintillation. In addition, the Poker Flat Incoherent Scatter radar captured the rapid ionization enhancement in the 100-200 km region across multiple beams looking to the north of Poker. The timing of these events between the multiple sites is closely monitored, and inferences of the propagation of this event are described. The available SuperDARN data from this time period indicates this substorm happened at about the same time within the Harang discontinuity. This event presented an unprecedented opportunity to observe occurrence and development of a substorm with a combination of ground-based remote sensing instruments. To support our interpretation of the data, we present first simulations of the magnetosphere-ionosphere coupled system during a substorm with the self-consistently coupled SAMI/RCM code.
Cue-Reactive Rationality, Visual Imagery and Volitional Control Predict Cue-Reactive Urge to Gamble in Poker-Machine Gamblers.

PubMed

Clark, Gavin I; Rock, Adam J; McKeith, Charles F A; Coventry, William L

2017-09-01

Poker-machine gamblers have been demonstrated to report increases in the urge to gamble following exposure to salient gambling cues. However, the processes which contribute to this urge to gamble remain to be understood. The present study aimed to investigate whether changes in the conscious experience of visual imagery, rationality and volitional control (over one's thoughts, images and attention) predicted changes in the urge to gamble following exposure to a gambling cue. Thirty-one regular poker-machine gamblers who reported at least low levels of problem gambling on the Problem Gambling Severity Index (PGSI), were recruited to complete an online cue-reactivity experiment. Participants completed the PGSI, the visual imagery, rationality and volitional control subscales of the Phenomenology of Consciousness Inventory (PCI), and a visual analogue scale (VAS) assessing urge to gamble. Participants completed the PCI subscales and VAS at baseline, following a neutral video cue and following a gambling video cue. Urge to gamble was found to significantly increase from neutral cue to gambling cue (while controlling for baseline urge) and this increase was predicted by PGSI score. After accounting for the effects of problem-gambling severity, cue-reactive visual imagery, rationality and volitional control significantly improved the prediction of cue-reactive urge to gamble. The small sample size and limited participant characteristic data restricts the generalizability of the findings. Nevertheless, this is the first study to demonstrate that changes in the subjective experience of visual imagery, volitional control and rationality predict changes in the urge to gamble from neutral to gambling cue. The results suggest that visual imagery, rationality and volitional control may play an important role in the experience of the urge to gamble in poker-machine gamblers.
Soft Particle Spectrometer, Langmuir Probe, and Data Analysis for Aerospace Magnetospheric/Thermospheric Coupling Rocket Program

NASA Technical Reports Server (NTRS)

Sharber, J. R.; Frahm, R. A.; Scherrer, J. R.

1997-01-01

Under this grant two instruments, a soft particle spectrometer and a Langmuir probe, were refurbished and calibrated, and flown on three instrumented rocket payloads as part of the Magnetosphere/Thermosphere Coupling program. The flights took place at the Poker Flat Research Range on February 12, 1994 (T(sub o) = 1316:00 UT), February 2, 1995 (T(sub o) = 1527:20 UT), and November 27, 1995 (T(sub o) = 0807:24 UT). In this report the observations of the particle instrumentation flown on all three of the flights are described, and brief descriptions of relevant geophysical activity for each flight are provided. Calibrations of the particle instrumentation for all ARIA flights are also provided.
MSL's Widgets: Adding Rebustness to Martian Sample Acquisition, Handling, and Processing

NASA Technical Reports Server (NTRS)

Roumeliotis, Chris; Kennedy, Brett; Lin, Justin; DeGrosse, Patrick; Cady, Ian; Onufer, Nicholas; Sigel, Deborah; Jandura, Louise; Anderson, Robert; Katz, Ira;

2013-01-01

Mars Science Laboratory's (MSL) Sample Acquisition Sample Processing and Handling (SA-SPaH) system is one of the most ambitious terrain interaction and manipulation systems ever built and successfully used outside of planet earth. Mars has a ruthless environment that has surprised many who have tried to explore there. The robustness widget program was implemented by the MSL project to help ensure the SA-SPaH system would be robust enough to the surprises of this ruthless Martian environment. The robustness widget program was an effort of extreme schedule pressure and responsibility, but was accomplished with resounding success. This paper will focus on a behind the scenes look at MSL's robustness widgets: the particle fun zone, the wind guards, and the portioner pokers.

Mesospheric momentum fluxes observed by the MST radar at Poker Flat, Alaska

NASA Technical Reports Server (NTRS)

Wang, Ding-Yi; Fritts, David C.

1990-01-01

An analysis of the wave motions observed with the Poker Flat MST radar during the winter, summer, and fall of 1986 is presented. Monthly and daily mean winds, momentum fluxes, and velocity variances are investigated in detail. While several features are in agreement with previous measurements, some significant differences also are found to exist in the observations. Monthly mean horizontal winds between 82 and 89 km have amplitudes of 20-40 m/s westward and 10-25 m/s southward in July and August. In fall and winter, the horizontal winds between 58 and 75 km are weaker and essentially eastward.
Sounding Rocket Launches Successfully from Alaska

NASA Image and Video Library

2015-01-28

A NASA Oriole IV sounding rocket with the Aural Spatial Structures Probe leaves the launch pad on Jan. 28, 2015, from the Poker Flat Research Range in Alaska. Credit: NASA/Lee Wingfield More info: On count day number 15, the Aural Spatial Structures Probe, or ASSP, was successfully launched on a NASA Oriole IV sounding rocket at 5:41 a.m. EST on Jan. 28, 2015, from the Poker Flat Research Range in Alaska. Preliminary data show that all aspects of the payload worked as designed and the principal investigator Charles Swenson at Utah State University described the mission as a “raging success.” “This is likely the most complicated mission the sounding rocket program has ever undertaken and it was not easy by any stretch," said John Hickman, operations manager of the NASA sounding rocket program office at the Wallops Flight Facility, Virginia. "It was technically challenging every step of the way.” “The payload deployed all six sub-payloads in formation as planned and all appeared to function as planned. Quite an amazing feat to maneuver and align the main payload, maintain the proper attitude while deploying all six 7.3-pound sub payloads at about 40 meters per second," said Hickman. Read more: www.nasa.gov/content/assp-sounding-rocket-launches-succes... NASA image use policy. NASA Goddard Space Flight Center enables NASA’s mission through four scientific endeavors: Earth Science, Heliophysics, Solar System Exploration, and Astrophysics. Goddard plays a leading role in NASA’s accomplishments by contributing compelling scientific knowledge to advance the Agency’s mission. Follow us on Twitter Like us on Facebook Find us on Instagram
Evidence of gravity wave-tidal interaction observed near the summer mesopause at Poker Flat, Alaska

NASA Technical Reports Server (NTRS)

Wang, Ding-Yi; Fritts, David C.

1991-01-01

An analysis of gravity wave-tidal interaction observed near the mesopause by the MST radar at Poker Flat in July of 1986 is presented. The observations revealed daily mean wind maxima of about 60 m/sec westward and 20 m/sec southward with daily mean momentum fluxes, contributed by gravity waves with periods less than 1 hour of 4-5 sq m/sec sq eastward and 1-2 sq m/sec sq northward. Considerable hourly height variability was found to exist for both winds and momentum fluxes. A significant modulation of the fluxes by tidal winds was observed, characterized by out-of-phase correlations over a number of heights.
Investigating the illusion of control in mildly depressed and nondepressed individuals during video-poker play.

PubMed

Dannewitz, Holly; Weatherly, Jeffrey N

2007-05-01

Cognitive fallacies, such as the illusion of control, and psychological disorders, such as depression, may perpetuate gambling and thus contribute to problem gambling (e.g., R. Ladouceur, C. Sylvan, C. Boutin, & C. Doucet, 2002). Gender differences may exist across these variables (e.g., N. M. Petry, 2005). The authors investigated these possibilities by recruiting mildly depressed and nondepressed individuals to play jacks or better, 5-card draw, video poker. Across three poker sessions, participants were given (a) no choice of which cards to play, (b) information on the best cards to play but control over which cards were played, or (c) no information and complete control over which cards were played. The total amount of money gambled increased as control over the game decreased, but this result correlated with an increase in the rate of play. Depressed and nondepressed participants did not differ in how they gambled, but men gambled significantly more and sometimes made more mistakes during play than did women. These results question the role of the illusion of control and depression in perpetuating gambling. They also suggest that providing players information about which cards to play may indirectly promote gambling and provide insight as to why men are more prone to suffer from gambling problems than are women.
Chasing losses in online poker and casino games: characteristics and game play of Internet gamblers at risk of disordered gambling.

PubMed

Gainsbury, Sally M; Suhonen, Niko; Saastamoinen, Jani

2014-07-30

Disordered Internet gambling is a psychological disorder that represents an important public health issue due to the increase in highly available and conveniently accessible Internet gambling sites. Chasing losses is one of the few observable markers of at-risk and problem gambling that may be used to detect early signs of disordered Internet gambling. This study examined loss chasing behaviour in a sample of Internet casino and poker players and the socio-demographic variables, irrational beliefs, and gambling behaviours associated with chasing losses. An online survey was completed by 10,838 Internet gamblers (58% male) from 96 countries. The results showed that Internet casino players had a greater tendency to report chasing losses than poker players and gamblers who reported chasing losses were more likely to hold irrational beliefs about gambling and spend more time and money gambling than those who reported that they were unaffected by previous losses. Gamblers who played for excitement and to win money were more likely to report chasing losses. This study is one of the largest ever studies of Internet gamblers and the results are highly significant as they provide insight into the characteristics and behaviours of gamblers using this mode of access. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Komhyr, W.D.; Quincy, D.M.; Grass, R.D.

This report describes work to improve the quality of total ozone and Umkehr data obtained in the past at the NOAA Climate Monitoring and Diagnostics Laboratory and the Dobson spectrophotometer ozone observatories. The authors present results of total ozone data re-evaluations for ten stations: Byrd, Antarctica; Fairbanks, Alaska; Hallett, Antarctica; Huancayo, Peru; Haute Provence, France; Lauder, New Zealand; Perth, Australia; Poker Flat, Alaska; Puerto Montt, Chile; and South Pole, Antarctica. The improved data will be submitted in early 1996 to the World Meteorological Organization (WMO) World Ozone Data Center (WODC), and the Atmospheric Environment Service for archiving. Considerable work hasmore » been accomplished, also, in reevaluating Umkehr data from seven of the stations, viz., Huancayo, Haute Provence, Lauder, Perth, Poker Flat, Boulder, Colorado; and Mauna Loa, Hawaii.« less
Ionospheric ion temperature forecasting in multiples of 27 days

NASA Astrophysics Data System (ADS)

Sojka, Jan J.; Schunk, Robert W.; Nicolls, Michael J.

2014-03-01

The ionospheric variability found at auroral locations is usually assumed to be unpredictable. The magnetosphere, which drives this ionospheric variability via storms and substorms, is at best only qualitatively describable. In this study we demonstrate that over a 3 year period, ionospheric variability observed from Poker Flat, Alaska, has, in fact, a high degree of long-term predictability. The observations used in this study are (a) the solar wind high speed stream velocity measured by the NASA Advanced Composition Explorer satellite, used to define the corotating interaction region (CIR), and (b) the ion temperature at 300 km altitude measured by the National Science Foundation Poker Flat Incoherent Scatter Radar over Poker Flat, Alaska. After determining a seasonal and diurnal climatology for the ion temperature, we show that the residual ion temperature heating events occur synchronously with CIR-geospace interactions. Furthermore, we demonstrate examples of ion temperature forecasting at 27, 54, and 81 days. A rudimentary operational forecasting scenario is described for forecasting recurrence 27 days ahead for the CIR-generated geomagnetic storms. These forecasts apply specifically to satellite tracking operations (thermospheric drag) and emergency HF-radio communications (ionospheric modifications) in the polar regions. The forecast is based on present-day solar and solar wind observations that can be used to uniquely identify the coronal hole and its CIR. From this CIR epoch, a 27 day forecast is then made.
Sounding rocket study of auroral electron precipitation

DOE Office of Scientific and Technical Information (OSTI.GOV)

McFadden, J.P.

1985-01-01

Measurement of energetic electrons in the auroral zone have proved to be one of the most useful tools in investigating the phenomena of auroral arc formation. This dissertation presents a detailed analysis of the electron data from two sounding rocket campaigns and interprets the measurements in terms of existing auroral models. The Polar Cusp campaign consisted of a single rocket launched from Cape Parry, Canada into the afternoon auroral zone at 1:31:13 UT on January 21, 1982. The results include the measurement of a narrow, magnetic field aligned electron flux at the edge of an arc. This electron precipitation wasmore » found to have a remarkably constant 1.2 eV temperature perpendicular to the magnetic field over a 200 to 900 eV energy range. The payload also made simultaneous measurements of both energetic electrons and 3-MHz plasma waves in an auroral arc. Analysis has shown that the waves are propagating in the upper hybrid band and should be generated by a positive slope in the parallel electron distribution. A correlation was found between the 3-MHz waves and small positive slopes in the parallel electron distribution but experimental uncertainties in the electron measurement were large enough to influence the analysis. The BIDARCA campaign consisted of two sounding rockets launched from Poker Flat and Fort Yukon, Alaska at 9:09:00 UT and 9:10:40 UT on February 7, 1984.« less
Calculating ellipse area by the Monte Carlo method and analysing dice poker with Excel at high school

NASA Astrophysics Data System (ADS)

Benacka, Jan

2016-08-01

This paper reports on lessons in which 18-19 years old high school students modelled random processes with Excel. In the first lesson, 26 students formulated a hypothesis on the area of ellipse by using the analogy between the areas of circle, square and rectangle. They verified the hypothesis by the Monte Carlo method with a spreadsheet model developed in the lesson. In the second lesson, 27 students analysed the dice poker game. First, they calculated the probability of the hands by combinatorial formulae. Then, they verified the result with a spreadsheet model developed in the lesson. The students were given a questionnaire to find out if they found the lesson interesting and contributing to their mathematical and technological knowledge.
NASA Launches Rocket Into Active Auroras

NASA Image and Video Library

2017-12-08

A test rocket is launched the night of Feb. 17 from the Poker Flat Research Range in Alaska. Test rockets are launched as part of the countdown to test out the radar tracking systems. NASA is launching five sounding rockets from the Poker Range into active auroras to explore the Earth's magnetic environment and its impact on Earth’s upper atmosphere and ionosphere. The launch window for the four remaining rockets runs through March 3. Credit: NASA/Terry Zaperach NASA image use policy. NASA Goddard Space Flight Center enables NASA’s mission through four scientific endeavors: Earth Science, Heliophysics, Solar System Exploration, and Astrophysics. Goddard plays a leading role in NASA’s accomplishments by contributing compelling scientific knowledge to advance the Agency’s mission. Follow us on Twitter Like us on Facebook Find us on Instagram
Pathological and nonpathological gamblers: a survey in gambling settings.

PubMed

Oliveira, M P; Silva, M T

2000-09-01

In this first study on gambling in Brazil, pathological and non-pathological gamblers were surveyed at three bingo clubs, one video poker club, and one horse-racing club in São Paulo. The South Oaks Gambling Screen and a questionnaire were administered to 171 subjects. When compared to nonpathological gamblers, a significantly higher proportion of pathological gamblers played cards, horse races, video poker, and dice in their lifetime. The two groups were similar with respect to socially acceptable games such as lotteries, bingo, sports, and the stock market. No significant differences were observed in drug consumption except for a higher lifetime consumption of tobacco among pathological gamblers. Only 4.9% of the gamblers sought help for gambling-related problems, suggesting that gambling is not generally perceived as a mental health problem by these subjects.
Laser Dentistry

MedlinePlus

... geta poker friv Home InfoBites Find an AGD Dentist Your Family's Oral Health About the AGD Dental ... of using dental lasers? There are several advantages. Dentists may not need to use a drill or ...
Taking Responsibility: A Chairperson at Mid-Career.

ERIC Educational Resources Information Center

McGowan, Martha

1985-01-01

Reflects on being selected as the first female head of the English department. Discusses the selection process (done at a poker game), women's academic advancement, power, and job related responsibilities. (EL)
Minimally Invasive Dentistry

MedlinePlus

... geta poker friv Home InfoBites Find an AGD Dentist Your Family's Oral Health About the AGD Dental ... structure. It focuses on prevention, remineralization, and minimal dentist intervention. Using scientific advances, minimally invasive dentistry allows ...
77 FR 76514 - Indian Gaming

Federal Register 2010, 2011, 2012, 2013, 2014

2012-12-28

... the number of Video Gaming Machines from 76 to 816, and authorizes the operation of additional types of games including live poker and simulcast racing. The term of the Compact runs for 10 years from...
Simple Models for Tough Concepts

ERIC Educational Resources Information Center

Cavagnoi, Richard M.; Barnett, Thomas

1976-01-01

Describes the construction of teaching models made from a variety of materials such as poker chips and cardboard that illustrate many chemical phenomena, including subatomic particles, molecular structure, solvation and dissociation, and enzyme-substrate interactions. (MLH)
Semi-supervised Discriminative Structured Prediction

DTIC Science & Technology

2013-10-30

16 5-CV Nursery 12960 5 8 5-CV Poker525k 525010/500000 10 11 test error Segmentation 210/2100 7 19 test error Waveform 5000 3 21 5-CV Yeast 1484 10 8 5...1.5) 74.03(2.0) Car 10.96(2.5) 4.75(1.0) 3.60(1.1) 2.78(0.8) Krkopt - 64.33(0.9) 26.55(0.4) 22.76(0.7) Letter - 24.94(0.9) 5.40(1.3) 4.89(0.3) Nursery ...0.3) 11.04(0.2) Letter 3.48(0.3) 3.1(0.1) 4.88(0.3) 3.37(0.2) 3.1(0.2) Nursery 0.12(0.1) 0.03(0.0) 0.16(0.1) 0.0(0.0) 0.0(0.0) Poker525k 30.19 2.01
Compulsive Gambling

MedlinePlus

Many people enjoy gambling, whether it's betting on a horse or playing poker on the Internet. Most people who gamble don't have a problem, but some lose control of their gambling. Signs of problem gambling include Always thinking about ...
Biology Notes.

ERIC Educational Resources Information Center

School Science Review, 1982

1982-01-01

Describes laboratory procedures, demonstrations, and classroom materials, including "diet poker" (nutrition game); an experiment on enzyme characteristics; demonstrations of yeast anaerobic respiration and color preference in Calliphora larvae; method to extract eugenol from clove oil to show antibiotic properties; and Benedict's test.…
A study of gravity-wave spectra in the troposphere and stratosphere at 5-min to 5-day periods with the Poker Flat MST radar

NASA Technical Reports Server (NTRS)

Bemra, R. S.; Rastogi, P. K.; Balsley, B. B.

1986-01-01

An analysis of frequency spectra at periods of about 5 days to 5 min from two 20-day sets of velocity measurements in the stratosphere and troposphere region obtained with the Poker Flat mesosphere-stratosphere-troposphere (MST) radar during January and June, 1984 is presented. A technique based on median filtering and averaged order statistics for automatic editing, smoothing and spectral analysis of velocity time series contaminated with spurious data points or outliers is outlined. The validity of this technique and its effects on the inferred spectral index was tested through simulation. Spectra obtained with this technique are discussed. The measured spectral indices show variability with season and height, especially across the tropopause. The discussion briefly outlines the need for obtaining better climatologies of velocity spectra and for the refinements of the existing theories to explain their behavior.
Collaborative Sounding Rocket launch in Alaska and Development of Hybrid Rockets

NASA Astrophysics Data System (ADS)

Ono, Tomohisa; Tsutsumi, Akimasa; Ito, Toshiyuki; Kan, Yuji; Tohyama, Fumio; Nakashino, Kyouichi; Hawkins, Joseph

Tokai University student rocket project (TSRP) was established in 1995 for a purpose of the space science and engineering hands-on education, consisting of two space programs; the one is sounding rocket experiment collaboration with University of Alaska Fairbanks and the other is development and launch of small hybrid rockets. In January of 2000 and March 2002, two collaborative sounding rockets were successfully launched at Poker Flat Research Range in Alaska. In 2001, the first Tokai hybrid rocket was successfully launched at Alaska. After that, 11 hybrid rockets were launched to the level of 180-1,000 m high at Hokkaido and Akita in Japan. Currently, Tokai students design and build all parts of the rockets. In addition, they are running the organization and development of the project under the tight budget control. This program has proven to be very effective in providing students with practical, real-engineering design experience and this program also allows students to participate in all phases of a sounding rocket mission. Also students learn scientific, engineering subjects, public affairs and system management through experiences of cooperative teamwork. In this report, we summarize the TSRP's hybrid rocket program and discuss the effectiveness of the program in terms of educational aspects.
Gambling Motives: Do They Explain Cognitive Distortions in Male Poker Gamblers?

PubMed

Mathieu, Sasha; Barrault, Servane; Brunault, Paul; Varescon, Isabelle

2018-03-01

Gambling behavior is partly the result of varied motivations leading individuals to participate in gambling activities. Specific motivational profiles are found in gamblers, and gambling motives are closely linked to the development of cognitive distortions. This cross-sectional study aimed to predict cognitive distortions from gambling motives in poker players. The population was recruited in online gambling forums. Participants reported gambling at least once a week. Data included sociodemographic characteristics, the South Oaks Gambling Screen, the Gambling Motives Questionnaire-Financial and the Gambling-Related Cognition Scale. This study was conducted on 259 male poker gamblers (aged 18-69 years, 14.3% probable pathological gamblers). Univariate analyses showed that cognitive distortions were independently predicted by overall gambling motives (34.8%) and problem gambling (22.4%) (p < .05). The multivariate model, including these two variables, explained 39.7% of cognitive distortions (p < .05). The results associated with the literature data highlight that cognitive distortions are a good discriminating factor of gambling problems, showing a close inter-relationship between gambling motives, cognitive distortions and the severity of gambling. These data are consistent with the following theoretical process model: gambling motives lead individuals to practice and repeat the gambling experience, which may lead them to develop cognitive distortions, which in turn favor problem gambling. This study opens up new research perspectives to understand better the mechanisms underlying gambling practice and has clinical implications in terms of prevention and treatment. For example, a coupled motivational and cognitive intervention focused on gambling motives/cognitive distortions could be beneficial for individuals with gambling problems.
Sensor emplacement testing at Poker Flat, Alaska

NASA Astrophysics Data System (ADS)

Reusch, A.; Beaudoin, B. C.; Anderson, K. E.; Azevedo, S.; Carothers, L.; Love, M.; Miller, P. E.; Parker, T.; Pfeifer, M.; Slad, G.; Thomas, D.; Aderhold, K.

2013-12-01

PASSCAL provides equipment and support for temporary seismic projects. Speed and efficiency of deployments are essential. A revised emplacement technique of putting broadband sensors directly into soil (aka direct burial) is being tested. The first phase (fall 2011 to spring 2013) comparing data quality and sensor stability between the direct burial and the traditional 1 m deep temporary PASSCAL-style vault in a wet and noisy site near San Antonio, NM is complete. Results suggest there is little or no difference in sensor performance in the relatively high-noise environment of this initial test. The second phase was started in November 2012 with the goal of making the same comparison, but at Poker Flat, Alaska, in a low-noise, high-signal, cold and wet environment, alongside a Transportable Array (TA) deployment to be used as a performance control. This location is in an accessible and secure area with very low site noise. In addition to benefiting future worldwide PASSCAL deployments, the Poker Flat experiment serves a secondary purpose of testing modifications necessary to successfully deploy and recover broadband stations in a cold environment with the limited logistics anticipated for remote Flexible Array (FA) and PASSCAL Program deployments in Alaska. Developing emplacement techniques that maintain high data quality and data return while minimizing logistics is critical to enable principle investigators to effectively and efficiently co-locate within the future TA Alaska footprint. Three Nanometrics sensors were installed in November 2012 in power-augered holes 76 cm in depth: a Trillium Compact Posthole (PH) and two Trillium 120PH units (one standard PH and one enhanced PHQ). The installations took less than 8 hours in -30°C conditions with 4 hours of usable daylight. The Compact PH and the 120PHQ are delivering data in realtime, while the 120PH is testing standalone power and data collection systems. Preliminary results compare favorably to each other as well as the nearby Trillium 240 in a traditional TA surface vault and a 120PH in a 5 m machine-drilled borehole. This summer, two Trillium 120PA sensors were installed at a depth of 54 cm in traditional PASSCAL-style vaults, adjacent to the Trillium Compact PH, Trillium 120PH and 120PHQ emplacements. Analysis of the data collected from these five sensors will include the use of probability density functions of power spectral density to examine temporal trends in noise, signal-to-noise ratios for local, regional, and teleseismic earthquakes, and coherence of both noise and earthquake signal recordings to compare the data quality of direct burial versus temporary PASSCAL-style vaults sensor emplacements.
Deceit and facial expression in children: the enabling role of the “poker face” child and the dependent personality of the detector

PubMed Central

Gadea, Marien; Aliño, Marta; Espert, Raúl; Salvador, Alicia

2015-01-01

This study presents the relation between the facial expression of a group of children when they told a lie and the accuracy in detecting the lie by a sample of adults. To evaluate the intensity and type of emotional content of the children’s faces, we applied an automated method capable of analyzing the facial information from the video recordings (FaceReader 5.0 software). The program classified videos as showing a neutral facial expression or an emotional one. There was a significant higher mean of hits for the emotional than for the neutral videos, and a significant negative correlation between the intensity of the neutral expression and the number of hits from the detectors. The lies expressed with emotional facial expression were more easily recognized by adults than the lies expressed with a “poker face”; thus, the less expressive the child the harder it was to guess. The accuracy of the lie detectors was then correlated with their subclinical traits of personality disorders, to find that participants scoring higher in the dependent personality were significantly better lie detectors. A non-significant tendency for women to discriminate better was also found, whereas men tended to be more suspicious than women when judging the children’s veracity. This study is the first to automatically decode the facial information of the lying child and relate these results with personality characteristics of the lie detectors in the context of deceptive behavior research. Implications for forensic psychology were suggested: to explore whether the induction of an emotion in a child during an interview could be useful to evaluate the testimony during legal trials. PMID:26284012
LWIR signature from EXCEDE SPECTRAL

NASA Astrophysics Data System (ADS)

Bien, F.

1984-03-01

EXCEDE/SPECTRAL was launched from Poker Flat Research Range, Alaska, on 19 October 1979. This report presents selected LWIR data obtained both during electron gun operation and non-operation. It presents a simplified outgassing model and discusses CO2(v2) emissions measured.

Rocket measurements of electrons in a system of multiple auroral arcs

NASA Technical Reports Server (NTRS)

Boyd, J. S.; Davis, T. N.

1977-01-01

A Nike-Tomahawk rocket was launched into a system of auroral arcs northward of Poker Flat Research Range, Fairbanks, Alaska. The pitch-angle distribution of electrons was measured at 2.5, 5, and 10 keV and also at 10 keV on a separating forward section of the payload. The auroral activity appeared to be the extension of substorm activity centered to the east. The rocket crossed a westward-propagating fold in the brightest band. The electron spectrum was relatively hard through most of the flight, showing a peak in the range from 2.5 to 10 keV in the weaker aurora and below 5 keV in the brightest arc. The detailed structure of the pitch-angle distribution suggested that, at times, a very selective process was accelerating some electrons in the magnetic field direction, so that a narrow field-aligned component appeared superimposed on a more isotropic distribution. It is concluded that this process could not be a near-ionosphere field-aligned potential drop, although the more isotropic component may have been produced by a parallel electric field extending several thousand kilometers along the field line above the ionosphere.
Kinetic modeling of auroral ion outflows observed by the VISIONS sounding rocket

NASA Astrophysics Data System (ADS)

Albarran, R. M.; Zettergren, M. D.

2017-12-01

The VISIONS (VISualizing Ion Outflow via Neutral atom imaging during a Substorm) sounding rocket was launched on Feb. 7, 2013 at 8:21 UTC from Poker Flat, Alaska, into an auroral substorm with the objective of identifying the drivers and dynamics of the ion outflow below 1000km. Energetic ion data from the VISIONS polar cap boundary crossing show evidence of an ion "pressure cooker" effect whereby ions energized via transverse heating in the topside ionosphere travel upward and are impeded by a parallel potential structure at higher altitudes. VISIONS was also instrumented with an energetic neutral atom (ENA) detector which measured neutral particles ( 50-100 eV energy) presumably produced by charge-exchange with the energized outflowing ions. Hence, inferences about ion outflow may be made via remotely-sensing measurements of ENAs. This investigation focuses on modeling energetic outflowing ion distributions observed by VISIONS using a kinetic model. This kinetic model traces large numbers of individual particles, using a guiding-center approximation, in order to allow calculation of ion distribution functions and moments. For the present study we include mirror and parallel electric field forces, and a source of ion cyclotron resonance (ICR) wave heating, thought to be central to the transverse energization of ions. The model is initiated with a steady-state ion density altitude profile and Maxwellian velocity distribution characterizing the initial phase-space conditions for multiple particle trajectories. This project serves to advance our understanding of the drivers and particle dynamics in the auroral ionosphere and to improve data analysis methods for future sounding rocket and satellite missions.
Kinetic modeling of auroral ion Outflows observed by the VISIONS sounding rocket

NASA Astrophysics Data System (ADS)

Albarran, R. M.; Zettergren, M. D.; Rowland, D. E.; Klenzing, J.; Clemmons, J. H.

2016-12-01

The VISIONS (VISualizing Ion Outflow via Neutral atom imaging during a Substorm) sounding rocket was launched on Feb. 7, 2013 at 8:21 UTC from Poker Flat, Alaska, into an auroral substorm with the objective of identifying the drivers and dynamics of the ion outflow below 1000km. Energetic ion data from the VISIONS polar cap boundary crossing show evidence of an ion "pressure cooker" effect whereby ions energized via transverse heating in the topside ionosphere travel upward and are impeded by a parallel potential structure at higher altitudes. VISIONS was also instrumented with an energetic neutral atom (ENA) detector which measured neutral particles ( 50-100 eV energy) presumably produced by charge-exchange with the energized outflowing ions. Hence, inferences about ion outflow may be made via remotely-sensing measurements of ENAs. This investigation focuses on modeling energetic outflowing ion distributions observed by VISIONS using a kinetic model. This kinetic model traces large numbers of individual particles, using a guiding-center approximation, in order to allow calculation of ion distribution functions and moments. For the present study we include mirror and parallel electric field forces, and a source of ion cyclotron resonance (ICR) wave heating, thought to be central to the transverse energization of ions. The model is initiated with a steady-state ion density altitude profile and Maxwellian velocity distribution characterizing the initial phase-space conditions for multiple particle trajectories. This project serves to advance our understanding of the drivers and particle dynamics in the auroral ionosphere and to improve data analysis methods for future sounding rocket and satellite missions.
Arithmetic Procedures are Induced from Examples.

DTIC Science & Technology

1985-08-13

concrete numerals (eg. coins. Dienes blocks, poker chips. Montessori rods etc Analogy is included as a third hypothesis even though it is not particularly...collections of coins. Diennes blocks. Montessori rods and so forth. This is a mapping between two kinds of numerals. and not two procedures Later. this
A Tutorial on Parallel and Concurrent Programming in Haskell

NASA Astrophysics Data System (ADS)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Complexities of Counting.

ERIC Educational Resources Information Center

Stake, Bernadine Evans

This document focuses on one child's skip counting methods. The pupil, a second grade student at Steuben School, in Kankakee, Illinois, was interviewed as she made several attempts at counting twenty-five poker chips on a circular piece of paper. The interview was part of a larger study of "Children's Conceptions of Number and Numeral,"…
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
A Hands-On Activity to Introduce the Effects of Transmission by an Invasive Species

ERIC Educational Resources Information Center

May, Barbara Jean

2013-01-01

This activity engages students to better understand the impact of transmission by invasive species. Using dice, poker chips, and paper plates, an entire class mimics the spread of an invasive species within a geographic region. The activity can be modified and conducted at the K-16 levels.
Sustained Authorship: Digital Writing, Self-Publishing, and the Ebook

ERIC Educational Resources Information Center

Laquintano, Tim

2010-01-01

This article reports on a digital ethnography that examines writing, authorship, and self-publication in an online niche market. Drawing on interview and web data collected over 3 years, it focuses on the writing practices that have supported the production, distribution, and sanction of 13 ebooks self-published by online poker players. The…
Stream dissolved organic matter bioavailability and composition in watersheds underlain with discontinuous permafrost

Treesearch

Kelly L. Balcarczyk; Jeremy B. Jones; Rudolf Jaffe; Nagamitsu Maie

2009-01-01

We examined the impact of permafrost on dissolved organic matter (DOM) composition in Caribou-Poker Creeks Research Watershed (CPCRW), a watershed underlain with discontinuous permafrost, in interior Alaska. The stream draining the high permafrost watershed had higher DOC and dissolved organic nitrogen (DON) concentrations, higher DOCDON and greater specific...
75 FR 19246 - Safety Zone; Desert Storm, Lake Havasu, AZ

Federal Register 2010, 2011, 2012, 2013, 2014

2010-04-14

...-AA00 Safety Zone; Desert Storm, Lake Havasu, AZ AGENCY: Coast Guard, DHS. ACTION: Temporary final rule... navigable waters of the Colorado River in Lake Havasu, Lake Havasu City, Arizona in support of the Desert.... Background and Purpose The Lake Racer LLC is sponsoring the Desert Storm Charity Poker Run and Exhibition Run...
Differences in the Gambling Behavior of Online and Non-Online Student Gamblers in a Controlled Laboratory Environment

PubMed Central

Montes, Kevin S.; Weatherly, Jeffrey N.

2016-01-01

Although research suggests that approximately 1 in 4 college students report having gambled online, few laboratory-based studies have been conducted enlisting online student gamblers. Moreover, it is unclear the extent to which differences in gambling behavior exist between online and non-online student gamblers. The current study examined if online gamblers would play more hands, commit more errors, and wager more credits than non-online student gamblers in a controlled, laboratory environment. Online (n = 19) and non-online (n = 26) student gamblers played video poker in three separate sessions and the number of hands played, errors committed, and credits wagered were recorded. Results showed that online student gamblers played more hands and committed more errors playing video poker than non-online student gamblers. The results from the current study extend previous research by suggesting that online gamblers engage in potentially more deleterious gambling behavior (e.g., playing more hands and committing more errors) than non-online gamblers. Additional research is needed to examine differences in the gambling behavior of online and non-online gamblers in a controlled, laboratory environment. PMID:27106027
A Statistical Analysis of Langmuir Wave-Electron Correlations Observed by the CHARM II Auroral Sounding Rocket

NASA Astrophysics Data System (ADS)

Dombrowski, M. P.; Labelle, J. W.; Kletzing, C.; Bounds, S. R.; Kaeppler, S. R.

2014-12-01

Langmuir-mode electron plasma waves are frequently observed by spacecraft in active plasma environments such as the ionosphere. Ionospheric Langmuir waves may be excited by the bump-on-tail instability generated by impinging beams of electrons traveling parallel to the background magnetic field (B). The Correlation of High-frequencies and Auroral Roar Measurement (CHARM II) sounding rocket was launched into a substorm at 9:49 UT on 17 February 2010, from the Poker Flat Research Range in Alaska. The primary instruments included the University of Iowa Wave-Particle Correlator (WPC), the Dartmouth High-Frequency Experiment (HFE), several charged particle detectors, low-frequency wave instruments, and a magnetometer. The HFE is a receiver system which effectively yields continuous (100% duty cycle) electric-field waveform measurements from 100 kHz to 5 MHz, and which had its detection axis aligned nominally parallel to B. The HFE output was fed on-payload to the WPC, which uses a phase-locked loop to track the incoming wave frequency with the most power, then sorting incoming electrons at eight energy levels into sixteen wave-phase bins. CHARM II encountered several regions of strong Langmuir wave activity throughout its 15-minute flight, and the WPC showed wave-lock and statistically significant particle correlation distributions during several time periods. We show results of an in-depth analysis of the CHARM II WPC data for the entire flight, including statistical analysis of correlations which show evidence of direct interaction with the Langmuir waves, indicating (at various times) trapping of particles and both driving and damping of Langmuir waves by particles. In particular, the sign of the gradient in particle flux appears to correlate with the phase relation between the electrons and the wave field, with possible implications for the wave physics.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Parallel programming with Easy Java Simulations

NASA Astrophysics Data System (ADS)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

PubMed Central

Nadkarni, P. M.; Miller, P. L.

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Bilingual parallel programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; Overbeek, R.

1990-01-01

Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less
Introducing B2B Service Level Measures via a Poker-Card Activity

ERIC Educational Resources Information Center

Chen, Chun-Miin; Bailey, Matthew D.

2016-01-01

To determine the appropriate level of product availability, most operations management textbooks introduce and define service level measures in a Business-to-Customer context. In other words, a retailer that wants to measure product availability in their store calculates the fill rate (FR) or cycle service level over an infinite review horizon.…
Water quality and streamflow in the Caribou-Poker Creeks Research Watershed, central Alaska, 1978.

Treesearch

Jerry W. Hilgert; Charles W. Slaughter

1983-01-01

Baseline data from 1978 are presented on precipitation, streamflow, and chemical and biological water quality in a subarctic, taiga watershed. First-, second-, and third-order streams that drain undisturbed catchments embracing permafrost-underlain and permafrost-free landscapes were monitored; results are being used in analysis of the natural, undisturbed condition of...

Water quality and streamflow in the Caribou-Poker Creeks Research Watershed, central Alaska, 1979.

Treesearch

Jerry W. Hilgert; Charles W. Slaughter

1987-01-01

Baseline data from 1979 are presented on precipitation, streamflow, occurrence of permafrost, and physical and chemical water quality in a subarctic, tiaga watershed. First- to third-order streams drain catchments embracing permafrost-underlain and permafrost-free landscapes in the undisturbed research watershed. The data are compared to those from a fourth-order...
Machine Learning in Medicine

PubMed Central

Deo, Rahul C.

2015-01-01

Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games – tasks which would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in healthcare. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades – and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. PMID:26572668
Application Portable Parallel Library

NASA Technical Reports Server (NTRS)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
The long term response of stream flow to climatic warming in headwater streams of interior Alaska

Treesearch

Jeremy B. Jones; Amanda J. Rinehart

2010-01-01

Warming in the boreal forest of interior Alaska will have fundamental impacts on stream ecosystems through changes in stream hydrology resulting from upslope loss of permafrost, alteration of availability of soil moisture, and the distribution of vegetation. We examined stream flow in three headwater streams of the Caribou-Poker Creeks Research Watershed (CPCRW) in...
Using Pinochle to Motivate the Restricted Combinations with Repetitions Problem

ERIC Educational Resources Information Center

Gorman, Patrick S.; Kunkel, Jeffrey D.; Vasko, Francis J.

2011-01-01

A standard example used in introductory combinatoric courses is to count the number of five-card poker hands possible from a straight deck of 52 distinct cards. A more interesting problem is to count the number of distinct hands possible from a Pinochle deck in which there are multiple, but obviously limited, copies of each type of card (two…
The specificity of attentional biases by type of gambling: An eye-tracking study.

PubMed

McGrath, Daniel S; Meitner, Amadeus; Sears, Christopher R

2018-01-01

A growing body of research indicates that gamblers develop an attentional bias for gambling-related stimuli. Compared to research on substance use, however, few studies have examined attentional biases in gamblers using eye-gaze tracking, which has many advantages over other measures of attention. In addition, previous studies of attentional biases in gamblers have not directly matched type of gambler with personally-relevant gambling cues. The present study investigated the specificity of attentional biases for individual types of gambling using an eye-gaze tracking paradigm. Three groups of participants (poker players, video lottery terminal/slot machine players, and non-gambling controls) took part in one test session in which they viewed 25 sets of four images (poker, VLTs/slot machines, bingo, and board games). Participants' eye fixations were recorded throughout each 8-second presentation of the four images. The results indicated that, as predicted, the two gambling groups preferentially attended to their primary form of gambling, whereas control participants attended to board games more than gambling images. The findings have clinical implications for the treatment of individuals with gambling disorder. Understanding the importance of personally-salient gambling cues will inform the development of effective attentional bias modification treatments for problem gamblers.
Observation and Modeling of the Generation Mechanism of Ion Upflow during Sudden Commencement

NASA Astrophysics Data System (ADS)

Zou, S.; Ozturk, D. S.; Li, C.; Varney, R. H.; Reimer, A.

2017-12-01

Sudden commencement (SC) induced by solar wind pressure enhancement can produce significant global impact on the coupled magnetosphere-ionosphere (MI) system, and its effects have been studied extensively using ground magnetometers and coherent scatter radars. However, very limited observations have been reported about the effects of SC on the ionospheric plasma. We study the ionosphere response to the SC using the Poker Flat incoherent scatter radar (PFISR) and numerical simulations. A detailed case study of SC during the 17 March 2015 storm was conducted. PFISR observed lifting of the F region ionosphere, transient field-aligned ion upflow, prompt but short-lived ion temperature increase, subsequent F region density decrease, and persistent electron temperature increase. A global magnetohydrodynamic (MHD) simulation has been carried out to characterize the SC-induced current, convection, and magnetic perturbations. Simulated magnetic perturbations at Poker Flat show a satisfactory agreement with observations. The simulation provides a global context for linking localized PFISR observations to large-scale dynamic processes in the MI system. Following the case study, we also perform a statistical study of the effects of SC on the ionosphere focusing on the magnetic local time and latitudinal asymmetries using PFISR and GPS TEC.
First light from a kilometer-baseline Scintillation Auroral GPS Array.

PubMed

Datta-Barua, S; Su, Y; Deshpande, K; Miladinovich, D; Bust, G S; Hampton, D; Crowley, G

2015-05-28

We introduce and analyze the first data from an array of closely spaced Global Positioning System (GPS) scintillation receivers established in the auroral zone in late 2013 to measure spatial and temporal variations in L band signals at 100-1000 m and subsecond scales. The seven receivers of the Scintillation Auroral GPS Array (SAGA) are sited at Poker Flat Research Range, Alaska. The receivers produce 100 s scintillation indices and 100 Hz carrier phase and raw in-phase and quadrature-phase samples. SAGA is the largest existing array with baseline lengths of the ionospheric diffractive Fresnel scale at L band. With an initial array of five receivers, we identify a period of simultaneous amplitude and phase scintillation. We compare SAGA power and phase data with collocated 630.0 nm all-sky images of an auroral arc and incoherent scatter radar electron precipitation measurements, to illustrate how SAGA can be used in multi-instrument observations for subkilometer-scale studies. A seven-receiver Scintillation Auroral GPS Array (SAGA) is now at Poker Flat, Alaska SAGA is the largest subkilometer array to enable phase/irregularities studies Simultaneous scintillation, auroral arc, and electron precipitation are observed.
First light from a kilometer-baseline Scintillation Auroral GPS Array

PubMed Central

Datta-Barua, S; Su, Y; Deshpande, K; Miladinovich, D; Bust, G S; Hampton, D; Crowley, G

2015-01-01

We introduce and analyze the first data from an array of closely spaced Global Positioning System (GPS) scintillation receivers established in the auroral zone in late 2013 to measure spatial and temporal variations in L band signals at 100–1000 m and subsecond scales. The seven receivers of the Scintillation Auroral GPS Array (SAGA) are sited at Poker Flat Research Range, Alaska. The receivers produce 100 s scintillation indices and 100 Hz carrier phase and raw in-phase and quadrature-phase samples. SAGA is the largest existing array with baseline lengths of the ionospheric diffractive Fresnel scale at L band. With an initial array of five receivers, we identify a period of simultaneous amplitude and phase scintillation. We compare SAGA power and phase data with collocated 630.0 nm all-sky images of an auroral arc and incoherent scatter radar electron precipitation measurements, to illustrate how SAGA can be used in multi-instrument observations for subkilometer-scale studies. Key Points A seven-receiver Scintillation Auroral GPS Array (SAGA) is now at Poker Flat, Alaska SAGA is the largest subkilometer array to enable phase/irregularities studies Simultaneous scintillation, auroral arc, and electron precipitation are observed PMID:26709318
Climatology of tropospheric vertical velocity spectra

NASA Technical Reports Server (NTRS)

Ecklund, W. L.; Gage, K. S.; Balsley, B. B.; Carter, D. A.

1986-01-01

Vertical velocity power spectra obtained from Poker Flat, Alaska; Platteville, Colorado; Rhone Delta, France; and Ponape, East Caroline Islands using 50-MHz clear-air radars with vertical beams are given. The spectra were obtained by analyzing the quietest periods from the one-minute-resolution time series for each site. The lengths of available vertical records ranged from as long as 6 months at Poker Flat to about 1 month at Platteville. The quiet-time vertical velocity spectra are shown. Spectral period ranging from 2 minutes to 4 hours is shown on the abscissa and power spectral density is given on the ordinate. The Brunt-Vaisala (B-V) periods (determined from nearby sounding balloons) are indicated. All spectra (except the one from Platteville) exhibit a peak at periods slightly longer than the B-V period, are flat at longer periods, and fall rapidly at periods less than the B-V period. This behavior is expected for a spectrum of internal waves and is very similar to what is observed in the ocean (Eriksen, 1978). The spectral amplitudes vary by only a factor of 2 or 3 about the mean, and show that under quiet conditions vertical velocity spectra from the troposphere are very similar at widely different locations.
Differences in the Gambling Behavior of Online and Non-online Student Gamblers in a Controlled Laboratory Environment.

PubMed

Montes, Kevin S; Weatherly, Jeffrey N

2017-03-01

Although research suggests that approximately 1 in 4 college students report having gambled online, few laboratory-based studies have been conducted enlisting online student gamblers. Moreover, it is unclear the extent to which differences in gambling behavior exist between online and non-online student gamblers. The current study examined if online gamblers would play more hands, commit more errors, and wager more credits than non-online student gamblers in a controlled, laboratory environment. Online (n = 19) and non-online (n = 26) student gamblers played video poker in three separate sessions and the number of hands played, errors committed, and credits wagered were recorded. Results showed that online student gamblers played more hands and committed more errors playing video poker than non-online student gamblers. The results from the current study extend previous research by suggesting that online gamblers engage in potentially more deleterious gambling behavior (e.g., playing more hands and committing more errors) than non-online gamblers. Additional research is needed to examine differences in the gambling behavior of online and non-online gamblers in a controlled, laboratory environment.
Partitioning problems in parallel, pipelined and distributed computing

NASA Technical Reports Server (NTRS)

Bokhari, S.

1985-01-01

The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
An interactive parallel programming environment applied in atmospheric science

NASA Technical Reports Server (NTRS)

vonLaszewski, G.

1996-01-01

This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
PFISR nightside observations of naturally enhanced ion acoustic lines, and their relation to boundary auroral features

NASA Astrophysics Data System (ADS)

Michell, R. G.; Lynch, K. A.; Heinselman, C. J.; Stenbaek-Nielsen, H. C.

2008-11-01

We present results from a coordinated camera and radar study of the auroral ionosphere conducted during March of 2006 from Poker Flat, Alaska. The campaign was conducted to coincide with engineering tests of the first quarter installation of the Poker Flat Incoherent Scatter Radar (PFISR). On 31 March 2006, a moderately intense auroral arc, (~10 kR at 557.7 nm), was located in the local magnetic zenith at Poker Flat. During this event the radar observed 7 distinct periods of abnormally large backscattered power from the F-region. These were only observed in the field-aligned radar beam, and radar spectra from these seven times show naturally enhanced ion-acoustic lines (NEIALs), the first observed with PFISR. These times corresponded to (a) when the polar cap boundary of the auroral oval passed through the magnetic zenith, and (b) when small-scale filamentary dark structures were visible in the magnetic zenith. The presence of both (a) and (b) was necessary for their occurrence. Soft electron precipitation occurs near the magnetic zenith during these same times. The electron density in the vicinity where NEIALs have been observed by previous studies is roughly between 5 and 30×1010 m-3. Broad-band extremely low frequency (BBELF) wave activity is observed in situ by satellites and sounding rockets to occur with similar morphology, during active auroral conditions, associated with the poleward edge of the aurora and soft electron precipitation. The observations presented here suggest further investigation of the idea that NEIALs and BBELF wave activity are differently-observed aspects of the same wave phenomenon. If a connection between NEIALs and BBELF can be established with more data, this could provide a link between in situ measurements of downward current regions (DCRs) and dynamic aurora, and ground-based observations of dark auroral structures and NEIALs. Identification of in situ processes, namely wave activity, in ground-based signatures could have many implications. One specific example of interest is identifying and following the temporal and spatial evolution of regions of potential ion outflow over large spatial and temporal scales using ground-based optical observations.
Architecture Adaptive Computing Environment

NASA Technical Reports Server (NTRS)

Dorband, John E.

2006-01-01

Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

NASA Technical Reports Server (NTRS)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

NASA Astrophysics Data System (ADS)

Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
An object-oriented approach to nested data parallelism

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.; Chatterjee, Siddhartha

1994-01-01

This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.

The BLAZE language: A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Adapting high-level language programs for parallel processing using data flow

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Machine Learning in Medicine.

PubMed

Deo, Rahul C

2015-11-17

Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome. © 2015 American Heart Association, Inc.
Collectively loading programs in a multiple program multiple data environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
The BLAZE language - A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

NASA Astrophysics Data System (ADS)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
IOPA: I/O-aware parallelism adaption for parallel programs

PubMed Central

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.

PubMed

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
Poker, Blackjack, Rummy, and War: The Face of American Strategic Culture

DTIC Science & Technology

2006-03-15

international relations. The result of this geographic advantage was the belief over time that “the need to fight 4 a war was a decision to be made... international relationships; political culture and ideology; military culture— military history, traditions, and education ; civil-military...Studies Degree. The U.S. Army War College is accredited by the Commission on Higher Education of the Middle States Association of Colleges and
The specificity of attentional biases by type of gambling: An eye-tracking study

PubMed Central

Meitner, Amadeus; Sears, Christopher R.

2018-01-01

A growing body of research indicates that gamblers develop an attentional bias for gambling-related stimuli. Compared to research on substance use, however, few studies have examined attentional biases in gamblers using eye-gaze tracking, which has many advantages over other measures of attention. In addition, previous studies of attentional biases in gamblers have not directly matched type of gambler with personally-relevant gambling cues. The present study investigated the specificity of attentional biases for individual types of gambling using an eye-gaze tracking paradigm. Three groups of participants (poker players, video lottery terminal/slot machine players, and non-gambling controls) took part in one test session in which they viewed 25 sets of four images (poker, VLTs/slot machines, bingo, and board games). Participants' eye fixations were recorded throughout each 8-second presentation of the four images. The results indicated that, as predicted, the two gambling groups preferentially attended to their primary form of gambling, whereas control participants attended to board games more than gambling images. The findings have clinical implications for the treatment of individuals with gambling disorder. Understanding the importance of personally-salient gambling cues will inform the development of effective attentional bias modification treatments for problem gamblers. PMID:29385164
Manipulations of the features of standard video lottery terminal (VLT) games: effects in pathological and non-pathological gamblers.

PubMed

Loba, P; Stewart, S H; Klein, R M; Blackburn, J R

2001-01-01

The present study was conducted to identify game parameters that would reduce the risk of abuse of video lottery terminals (VLTs) by pathological gamblers, while exerting minimal effects on the behavior of non-pathological gamblers. Three manipulations of standard VLT game features were explored. Participants were exposed to: a counter which displayed a running total of money spent; a VLT spinning reels game where participants could no longer "stop" the reels by touching the screen; and sensory feature manipulations. In control conditions, participants were exposed to standard settings for either a spinning reels or a video poker game. Dependent variables were self-ratings of reactions to each set of parameters. A set of 2(3) x 2 x 2 (game manipulation [experimental condition(s) vs. control condition] x game [spinning reels vs. video poker] x gambler status [pathological vs. non-pathological]) repeated measures ANOVAs were conducted on all dependent variables. The findings suggest that the sensory manipulations (i.e., fast speed/sound or slow speed/no sound manipulations) produced the most robust reaction differences. Before advocating harm reduction policies such as lowering sensory features of VLT games to reduce potential harm to pathological gamblers, it is important to replicate findings in a more naturalistic setting, such as a real bar.
The Drivers of the CH4 Seasonal Cycle in the Arctic and What Long-Term Observations of CH4 Imply About Trends in Arctic CH4 Fluxes

NASA Astrophysics Data System (ADS)

Sweeney, C.; Karion, A.; Bruhwiler, L.; Miller, J. B.; Wofsy, S. C.; Miller, C. E.; Chang, R. Y.; Dlugokencky, E. J.; Daube, B.; Pittman, J. V.; Dinardo, S. J.

2012-12-01

The large seasonal change in the atmospheric column for CH4 in the Arctic is driven by two dominant processes: transport of CH4 from low latitudes and surface emissions throughout the Arctic region. The NOAA ESRL Carbon Cycle Group Aircraft Program along with the NASA funded Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) have initiated an effort to better understand the factors controlling the seasonal changes in the mole fraction of CH4 in the Arctic with a multi-scale aircraft observing network in Alaska. The backbone of this network is multi-species flask sampling from 500 to 8000 masl that has been conducted every two weeks for the last 10 years over Poker Flat, AK. In addition regular profiles at the interior Alaska site at Poker Flat, NOAA has teamed up with the United States Coast Guard to make profiling flights with continuous observations of CO2, CO, CH4 and Ozone between Kodiak and Barrow every 2 weeks. More recently, CARVE has significantly added to this observational network with targeted flights focused on exploring the variability of CO2, CH4 and CO in the boundary layer both in the interior and the North Slope regions of Alaska. Taken together with the profiling of HIAPER Pole-to-Pole Observations (HIPPO), ground sites at Barrow and a new CARVE interior Alaska surface site just north of Fairbanks, AK, we now have the ability to investigate the full evolution of the seasonal cycle in the Arctic using both the multi-scale sampling offered by the different aircraft platforms as well as the multi-species sampling offered by in-situ and flask sampling. The flasks also provide a valuable tie-point between different platforms so that spatial and temporal gradients can be properly interpreted. In the context of the seasonal cycle observed by the aircraft platforms we will look at long term ground observations over the last 20 years to assess changes in Arctic CH4 emissions which have occurred as a result of 0.6C/decade changes in mean surface temperatures.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1989-01-01

Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
Methods for design and evaluation of parallel computating systems (The PISCES project)

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.; Wise, Robert; Haught, Mary JO

1989-01-01

The PISCES project started in 1984 under the sponsorship of the NASA Computational Structural Mechanics (CSM) program. A PISCES 1 programming environment and parallel FORTRAN were implemented in 1984 for the DEC VAX (using UNIX processes to simulate parallel processes). This system was used for experimentation with parallel programs for scientific applications and AI (dynamic scene analysis) applications. PISCES 1 was ported to a network of Apollo workstations by N. Fitzgerald.
Computer-aided programming for message-passing system; Problems and a solution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, M.Y.; Gajski, D.D.

1989-12-01

As the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. Parallel models of computation, parallelization problems, and tools for computer-aided programming (CAP) are discussed. As an example, a CAP tool that performs scheduling and inserts communication primitives automatically is described. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-26

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Medicare and Medicaid Services [CMS-3180-N2] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical... technologies to participate in a program of parallel FDA-CMS review. The document was published with an...
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

PubMed

Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

2014-07-05

A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Saini, Subhash (Technical Monitor)

1998-01-01

Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).

Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Using CLIPS in the domain of knowledge-based massively parallel programming

NASA Technical Reports Server (NTRS)

Dvorak, Jiri J.

1994-01-01

The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Programming parallel architectures: The BLAZE family of languages

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush

1988-01-01

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
Hydrology and Climatology of the Caribou-Poker Creeks Research Watershed, Alaska,

DTIC Science & Technology

1982-10-01

system the watershed falls within the Inter- dense brush of willow, alder and dwarf birch in ior Alaska Forest ( Taiga ) designation. open forests near...and C.T. Cushwina (1973) Research stream flow characteristics in the discontinuous opportunities and needs in the Taiga of Alaska. permafrost zone of...edition. taiga research watershed. Institute of Water Re- linkenson, W.M. nB. Lotspeich and lW. Mueller sources, University of Alaska, Fairbanks, Alaska
Winning the War and the Relationships: Preparing Military Officers for Negotiations With Non-Combatants

DTIC Science & Technology

2007-08-01

that Iraqis make “great poker players,” and “Every thing is gray. They never say yes or no. Every thing is open for negotiation.” This made it...U.S. Army Research Institute for the Behavioral and Social Sciences Research Report 1877 Winning the War and...United States Military Academy, West Point August 2007 Approved for public release; distribution is unlimited. U.S. Army
Doppler-shifting effects on frequency spectra of gravity waves observed near the summer mesopause at high latitude

NASA Technical Reports Server (NTRS)

Fritts, David C.; Wang, Ding-Yi

1991-01-01

Results are presented of radar observations of horizontal and vertical velocities near the summer mesopause at Poker Flat (Alaska), showing that the observed vertical velocity spectra were influenced strongly by Doppler-shifting effects. The horizontal velocity spectra, however, were relatively insensitive to horizontal wind speed. The observed spectra are compared with predicted spectra for various models of the intrinsic motion spectrum and degrees of Doppler shifting.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

NASA Technical Reports Server (NTRS)

Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;

1998-01-01

Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

NASA Astrophysics Data System (ADS)

Akil, Mohamed

2017-05-01

The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

NASA Technical Reports Server (NTRS)

Gelenbe, Erol

1988-01-01

An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.
Experiences with hypercube operating system instrumentation

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Rudolph, David C.

1989-01-01

The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

PubMed

Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

2013-10-24

Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (<100 ms) moved to the center only one saccade later. Our data on eye movement positions provide novel evidence of parallel programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.
PyPele Rewritten To Use MPI

NASA Technical Reports Server (NTRS)

Hockney, George; Lee, Seungwon

2008-01-01

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.
A survey of parallel programming tools

NASA Technical Reports Server (NTRS)

Cheng, Doreen Y.

1991-01-01

This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
Backtracking and Re-execution in the Automatic Debugging of Parallelized Programs

NASA Technical Reports Server (NTRS)

Matthews, Gregory; Hood, Robert; Johnson, Stephen; Leggett, Peter; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we describe a new approach using relative debugging to find differences in computation between a serial program and a parallel version of th it program. We use a combination of re-execution and backtracking in order to find the first difference in computation that may ultimately lead to an incorrect value that the user has indicated. In our prototype implementation we use static analysis information from a parallelization tool in order to perform the backtracking as well as the mapping required between serial and parallel computations.
The Infrared Handbook

DTIC Science & Technology

1978-01-01

wavelength, infrared range, SW1R, from 1.6 to 5.6 jum and in the long wavelength infrared, LWIR , from 7.0 to 23.0 fun. Figure 3-87 shows a profile of...si n n Wavelength (M ra) Fig. 3-86. Sample spectrum scan (vertical) from a LWIR spectrometer aboard a Black Brant rocket flown from Poker Flat...radiation of a finite cross-section. Laser, lidar, or search-light probes fall into this category. Because of the nonuniform illumination of the

Momentum Flux Determination Using the Multi-beam Poker Flat Incoherent Scatter Radar

NASA Technical Reports Server (NTRS)

Nicolls, M. J.; Fritts, D. C.; Janches, Diego; Heinselman, C. J.

2012-01-01

In this paper, we develop an estimator for the vertical flux of horizontal momentum with arbitrary beam pointing, applicable to the case of arbitrary but fixed beam pointing with systems such as the Poker Flat Incoherent Scatter Radar (PFISR). This method uses information from all available beams to resolve the variances of the wind field in addition to the vertical flux of both meridional and zonal momentum, targeted for high-frequency wave motions. The estimator utilises the full covariance of the distributed measurements, which provides a significant reduction in errors over the direct extension of previously developed techniques and allows for the calculation of an error covariance matrix of the estimated quantities. We find that for the PFISR experiment, we can construct an unbiased and robust estimator of the momentum flux if sufficient and proper beam orientations are chosen, which can in the future be optimized for the expected frequency distribution of momentum-containing scales. However, there is a potential trade-off between biases and standard errors introduced with the new approach, which must be taken into account when assessing the momentum fluxes. We apply the estimator to PFISR measurements on 23 April 2008 and 21 December 2007, from 60-85 km altitude, and show expected results as compared to mean winds and in relation to the measured vertical velocity variances.
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems.

PubMed

Stone, John E; Gohara, David; Shi, Guochun

2010-05-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Efficient partitioning and assignment on programs for multiprocessor execution

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1993-01-01

The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
A mechanism for efficient debugging of parallel programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, B.P.; Choi, J.D.

1988-01-01

This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

PubMed Central

Stone, John E.; Gohara, David; Shi, Guochun

2010-01-01

We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures. PMID:21037981
Genetic algorithms using SISAL parallel programming language

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tejada, S.

1994-05-06

Genetic algorithms are a mathematical optimization technique developed by John Holland at the University of Michigan [1]. The SISAL programming language possesses many of the characteristics desired to implement genetic algorithms. SISAL is a deterministic, functional programming language which is inherently parallel. Because SISAL is functional and based on mathematical concepts, genetic algorithms can be efficiently translated into the language. Several of the steps involved in genetic algorithms, such as mutation, crossover, and fitness evaluation, can be parallelized using SISAL. In this paper I will l discuss the implementation and performance of parallel genetic algorithms in SISAL.
An Expert System for the Development of Efficient Parallel Code

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

2004-01-01

We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.
Optics Program Modified for Multithreaded Parallel Computing

NASA Technical Reports Server (NTRS)

Lou, John; Bedding, Dave; Basinger, Scott

2006-01-01

A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
Solving Integer Programs from Dependence and Synchronization Problems

DTIC Science & Technology

1993-03-01

DEFF.NSNE Solving Integer Programs from Dependence and Synchronization Problems Jaspal Subhlok March 1993 CMU-CS-93-130 School of Computer ScienceT IC...method Is an exact and efficient way of solving integer programming problems arising in dependence and synchronization analysis of parallel programs...7/;- p Keywords: Exact dependence tesing, integer programming. parallelilzng compilers, parallel program analysis, synchronization analysis Solving
The Fate and Effects of Crude Oil Spilled on Subarctic Permafrost Terrain in Interior Alaska,

DTIC Science & Technology

1980-12-01

and Poker Creek, al soils are acidic , generally about pH 4 (Troth et guarding against inadvertent oil (ontamination al 1975, Riegeretal 1972) I2 - IV...of root mediately adjacent to surface oil. Deciduous nutrient uptake, 2) coating of the roots by a leaves turned brown and abscised before ever...Although spruce in the upper 2 m of the winter foliage became chlorotic, turned brown, and spill began dropping needles by mid-June, the abscised
Thoughts on The Battle for the Minds: IO and COIN in the Pashtun Belt

DTIC Science & Technology

2010-10-01

the blackjack table. Some of the world’s best poker players are those who have grown up in the modern battlefields of insurgency. Local... life in a hard society. Such concepts as vegetable diversity and the ability to obtain wider access to markets matter daily to the average Afghan on... the Pashtunwali code are certainly central to much of daily life and social The concerns of the aver- age citizen on an average day should be the
The Role of Neutral Atmospheric Dynamics in Cusp Density and Ionospheric Patch Formation

DTIC Science & Technology

2012-10-01

J . Moen, K. Oksavik, C. P. Nielsen, I. W. McCrea, T . R. Pedersen, and P. Gallop, Direct observations of injection events of subauroral plasma into...reconnection events at high end (3km/s) of spectrum of typical plasma flow jet events. The doubling is attributed not t o any difference in the model, but...world (at Poker Flat in Alaska, and Mawson in Antarctica), both belonging to Dr Mark Conde at the University of Alaska, who pioneered the technique (e.g
The FORCE - A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

DOE PAGES

Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

2013-01-01

Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
Distributed and parallel Ada and the Ada 9X recommendations

NASA Technical Reports Server (NTRS)

Volz, Richard A.; Goldsack, Stephen J.; Theriault, R.; Waldrop, Raymond S.; Holzbacher-Valero, A. A.

1992-01-01

Recently, the DoD has sponsored work towards a new version of Ada, intended to support the construction of distributed systems. The revised version, often called Ada 9X, will become the new standard sometimes in the 1990s. It is intended that Ada 9X should provide language features giving limited support for distributed system construction. The requirements for such features are given. Many of the most advanced computer applications involve embedded systems that are comprised of parallel processors or networks of distributed computers. If Ada is to become the widely adopted language envisioned by many, it is essential that suitable compilers and tools be available to facilitate the creation of distributed and parallel Ada programs for these applications. The major languages issues impacting distributed and parallel programming are reviewed, and some principles upon which distributed/parallel language systems should be built are suggested. Based upon these, alternative language concepts for distributed/parallel programming are analyzed.
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
National Report on the NASA Sounding Rocket and Balloon Programs

NASA Technical Reports Server (NTRS)

Eberspeaker, Philip; Fairbrother, Debora

2013-01-01

The U. S. National Aeronautics and Space Administration (NASA) Sounding Rockets and Balloon Programs conduct a total of 30 to 40 missions per year in support of the NASA scientific community and other users. The NASA Sounding Rockets Program supports the science community by integrating their experiments into the sounding rocket payloads, and providing both the rocket vehicle and launch operations services. Activities since 2011 have included two flights from Andoya Rocket Range, more than eight flights from White Sands Missile Range, approximately sixteen flights from Wallops Flight Facility, two flights from Poker Flat Research Range, and four flights from Kwajalein Atoll. Other activities included the final developmental flight of the Terrier-Improved Malemute launch vehicle, a test flight of the Talos-Terrier-Oriole launch vehicle, and a host of smaller activities to improve program support capabilities. Several operational missions have utilized the new Terrier-Malemute vehicle. The NASA Sounding Rockets Program is currently engaged in the development of a new sustainer motor known as the Peregrine. The Peregrine development effort will involve one static firing and three flight tests with a target completion data of August 2014. The NASA Balloon Program supported numerous scientific and developmental missions since its last report. The program conducted flights from the U.S., Sweden, Australia, and Antarctica utilizing standard and experimental vehicles. Of particular note are the successful test flights of the Wallops Arc Second Pointer (WASP), the successful demonstration of a medium-size Super Pressure Balloon (SPB), and most recently, three simultaneous missions aloft over Antarctica. NASA continues its successful incremental design qualification program and will support a science mission aboard WASP in late 2013 and a science mission aboard the SPB in early 2015. NASA has also embarked on an intra-agency collaboration to launch a rocket from a balloon to conduct supersonic decelerator tests. An overview of NASA's Sounding Rockets and Balloon Operations, Technology Development and Science support activities will be presented.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele

2001-01-01

This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.
Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

NASA Astrophysics Data System (ADS)

Work, Paul R.

1991-12-01

This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

1997-01-01

Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

2004-01-01

With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
WFIRST: Science from the Guest Investigator and Parallel Observation Programs

NASA Astrophysics Data System (ADS)

Postman, Marc; Nataf, David; Furlanetto, Steve; Milam, Stephanie; Robertson, Brant; Williams, Ben; Teplitz, Harry; Moustakas, Leonidas; Geha, Marla; Gilbert, Karoline; Dickinson, Mark; Scolnic, Daniel; Ravindranath, Swara; Strolger, Louis; Peek, Joshua; Marc Postman

2018-01-01

The Wide Field InfraRed Survey Telescope (WFIRST) mission will provide an extremely rich archival dataset that will enable a broad range of scientific investigations beyond the initial objectives of the proposed key survey programs. The scientific impact of WFIRST will thus be significantly expanded by a robust Guest Investigator (GI) archival research program. We will present examples of GI research opportunities ranging from studies of the properties of a variety of Solar System objects, surveys of the outer Milky Way halo, comprehensive studies of cluster galaxies, to unique and new constraints on the epoch of cosmic re-ionization and the assembly of galaxies in the early universe.WFIRST will also support the acquisition of deep wide-field imaging and slitless spectroscopic data obtained in parallel during campaigns with the coronagraphic instrument (CGI). These parallel wide-field imager (WFI) datasets can provide deep imaging data covering several square degrees at no impact to the scheduling of the CGI program. A competitively selected program of well-designed parallel WFI observation programs will, like the GI science above, maximize the overall scientific impact of WFIRST. We will give two examples of parallel observations that could be conducted during a proposed CGI program centered on a dozen nearby stars.
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

DOE PAGES

Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

2015-01-01

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
Programming Probabilistic Structural Analysis for Parallel Processing Computer

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

1991-01-01

The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

NASA Technical Reports Server (NTRS)

Weeks, Cindy Lou

1986-01-01

Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Performance Implications of Synchronization Support for Parallel FORTRAN Programs

DTIC Science & Technology

1991-06-17

applications we used in this study are BDNA and FLO52. BDNA is a molecular dy- I namics simulator for biomolecules in water and it uses ordinary...parallelism structures and loop granularity. In the BDNA program, most of the parallel loops are not nested and the iterations are 200-1000 instructions long...are of concern. The BDNA curve in Figure 21 shows that for this program only 17% of all 32 I I 100 BDNA -4 FLO52 -I 80 3 CumuilatQe percentage of3
Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Selective, Embedded, Just-In-Time Specialization (SEJITS): Portable Parallel Performance from Sequential, Productive, Embedded Domain-Specific Languages

DTIC Science & Technology

2012-12-01

identity operation SIMD Single instruction, multiple datastream parallel computing Scala A byte-compiled programming language featuring dynamic type...Specific Languages 5a. CONTRACT NUMBER FA8750-10-1-0191 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 61101E 6. AUTHOR(S) Armando Fox 5d...application performance, but usually must rely on efficiency programmers who are experts in explicit parallel programming to achieve it. Since such efficiency
Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

PubMed

Bellucci, Michael A; Coker, David F

2011-07-28

We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
Concepts of Concurrent Programming

DTIC Science & Technology

1990-04-01

to the material presented. Carriero89 Carriero, N., and Gelernter, D. " How to Write Parallel Programs : A Guide to the Perplexed." ACM...between the architectures on which programs can be executed and the application domains from which problems are drawn. Our goal is to show how programs ...Sept. 1989), 251-510. Abstract: There are four papers: 1. Programming Languages for Distributed Computing Systems (52); 2. How to Write Parallel
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei; Xu, Jingling

2006-01-01

This slide presentation reviews some of the issues around distributed parallel programming. It compares and contrast two methods of programming: Single Program Multiple Data (SPMD) with the Navigational Programming (NAVP). It then reviews the distributed sequential computing (DSC) method and the methodology of NavP. Case studies are presented. It also reviews the work that is being done to enable the NavP system.

High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
On program restructuring, scheduling, and communication for parallel processor systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Polychronopoulos, Constantine D.

1986-08-01

This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, thesemore » algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.« less
Modelling parallel programs and multiprocessor architectures with AXE

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.

1991-01-01

AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
Web Based Parallel Programming Workshop for Undergraduate Education.

ERIC Educational Resources Information Center

Marcus, Robert L.; Robertson, Douglass

Central State University (Ohio), under a contract with Nichols Research Corporation, has developed a World Wide web based workshop on high performance computing entitled "IBN SP2 Parallel Programming Workshop." The research is part of the DoD (Department of Defense) High Performance Computing Modernization Program. The research…
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

NASA Technical Reports Server (NTRS)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Instrumentation, performance visualization, and debugging tools for multiprocessors

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

1991-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Parallel computation with the force

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1985-01-01

A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

2003-01-01

Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
76 FR 62808 - Pilot Program for Parallel Review of Medical Products

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-11

... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...
Algorithms and programming tools for image processing on the MPP

NASA Technical Reports Server (NTRS)

Reeves, A. P.

1985-01-01

Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
Execution models for mapping programs onto distributed memory parallel computers

NASA Technical Reports Server (NTRS)

Sussman, Alan

1992-01-01

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Program Correctness, Verification and Testing for Exascale (Corvette)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sen, Koushik; Iancu, Costin; Demmel, James W

The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. In the project we have demonstrated novel approaches in several areas: 1. Low overhead automated and precise detection of concurrency bugs at scale. 2. Using low overhead bug detection tools to guide speculative program transformations for performance. 3. Techniques to reduce the concurrency required to reproduce a bug using partialmore » program restart/replay. 4. Techniques to provide reproducible execution of floating point programs. 5. Techniques for tuning the floating point precision used in codes.« less
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Trace-Driven Debugging of Message Passing Programs

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Hood, Robert; Lopez, Louis; Bailey, David (Technical Monitor)

1998-01-01

In this paper we report on features added to a parallel debugger to simplify the debugging of parallel message passing programs. These features include replay, setting consistent breakpoints based on interprocess event causality, a parallel undo operation, and communication supervision. These features all use trace information collected during the execution of the program being debugged. We used a number of different instrumentation techniques to collect traces. We also implemented trace displays using two different trace visualization systems. The implementation was tested on an SGI Power Challenge cluster and a network of SGI workstations.
Exploiting Symmetry on Parallel Architectures.

NASA Astrophysics Data System (ADS)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

NASA Astrophysics Data System (ADS)

Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.

1995-03-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Simunovic, S.; Zacharia, T.; Baltas, N.

1995-04-01

PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
Sounding of the Ion Energization Region: Resolving Ambiguities

NASA Technical Reports Server (NTRS)

LaBelle, James

2003-01-01

Dartmouth College provided a single-channel high-frequency wave receiver to the Sounding of the Ion Energization Region: Resolving Ambiguities (SIERRA) rocket experiment launched from Poker Flat, Alaska, in January 2002. The receiver used signals from booms, probes, preamplifiers, and differential amplifiers provided by Cornell University coinvestigators. Output was to a dedicated 5 MHz telemetry link provided by WFF, with a small amount of additional Pulse Code Modulation (PCM) telemetry required for the receiver gain information. We also performed preliminary analysis of the data. The work completed is outlined below, in chronological order.

Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snyder, L.; Notkin, D.; Adams, L.

1990-03-31

This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2002-01-01

Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
The parallel programming of voluntary and reflexive saccades.

PubMed

Walker, Robin; McSorley, Eugene

2006-06-01

A novel two-step paradigm was used to investigate the parallel programming of consecutive, stimulus-elicited ('reflexive') and endogenous ('voluntary') saccades. The mean latency of voluntary saccades, made following the first reflexive saccades in two-step conditions, was significantly reduced compared to that of voluntary saccades made in the single-step control trials. The latency of the first reflexive saccades was modulated by the requirement to make a second saccade: first saccade latency increased when a second voluntary saccade was required in the opposite direction to the first saccade, and decreased when a second saccade was required in the same direction as the first reflexive saccade. A second experiment confirmed the basic effect and also showed that a second reflexive saccade may be programmed in parallel with a first voluntary saccade. The results support the view that voluntary and reflexive saccades can be programmed in parallel on a common motor map.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
78 FR 76628 - Pilot Program for Parallel Review of Medical Products; Extension of the Duration of the Program

Federal Register 2010, 2011, 2012, 2013, 2014

2013-12-18

...The Food and Drug Administration (FDA) and the Centers for Medicare and Medicaid Services (CMS) (the Agencies) are announcing the extension of the ``Pilot Program for Parallel Review of Medical Products.'' The Agencies have decided to continue the program as currently designed for an additional period of 2 years from the date of publication of this notice.
Networked high-speed auroral observations combined with radar measurements for multi-scale insights

NASA Astrophysics Data System (ADS)

Hirsch, M.; Semeter, J. L.

2015-12-01

Networks of ground-based instruments to study terrestrial aurora for the purpose of analyzing particle precipitation characteristics driving the aurora have been established. Additional funding is pouring into future ground-based auroral observation networks consisting of combinations of tossable, portable, and fixed installation ground-based legacy equipment. Our approach to this problem using the High Speed Tomography (HiST) system combines tightly-synchronized filtered auroral optical observations capturing temporal features of order 10 ms with supporting measurements from incoherent scatter radar (ISR). ISR provides a broader spatial context up to order 100 km laterally on one minute time scales, while our camera field of view (FOV) is chosen to be order 10 km at auroral altitudes in order to capture 100 m scale lateral auroral features. The dual-scale observations of ISR and HiST fine-scale optical observations may be coupled through a physical model using linear basis functions to estimate important ionospheric quantities such as electron number density in 3-D (time, perpendicular and parallel to the geomagnetic field).Field measurements and analysis using HiST and PFISR are presented from experiments conducted at the Poker Flat Research Range in central Alaska. Other multiscale configuration candidates include supplementing networks of all-sky cameras such as THEMIS with co-locations of HiST-like instruments to fuse wide FOV measurements with the fine-scale HiST precipitation characteristic estimates. Candidate models for this coupling include GLOW and TRANSCAR. Future extensions of this work may include incorporating line of sight total electron count estimates from ground-based networks of GPS receivers in a sensor fusion problem.
Statistical Analysis of Bursty Langmuir Waves, Alfvén and Whistler Waves, and Precipitating Electrons Seen by the CHARM II Nightside Sounding Rocket

NASA Astrophysics Data System (ADS)

Dombrowski, M. P.; Labelle, J. W.; Kletzing, C.; Bounds, S. R.; Kaeppler, S. R.

2013-12-01

Bursty Langmuir waves have been interpreted as the result of the superposition of multiple Langmuir normal-mode waves, with the resultant modulation being the beat pattern between waves with e.g. 10 kHz frequency differences. The normal-mode waves could be generated either through wave-wave interactions with VLF waves, or through independent linear processes. The CHARM II sounding rocket was launched into a substorm at 9:49 UT on 15 February 2010, from the Poker Flat Research Range in Alaska. The primary instruments included the Dartmouth High-Frequency Experiment (HFE), a receiver system which effectively yields continuous (100% duty cycle) E-field waveform measurements up to 5 MHz, as well as a number of charged particle detectors, including a wave-particle correlator. The payload also included a magnetometer and several low-frequency wave instruments. CHARM II encountered several regions of strong Langmuir wave activity throughout its 15-minute flight, including several hundred discrete Langmuir-wave bursts. We show results of a statistical analysis of CHARM II data for the entire flight, comparing HFE data with the other payload instruments, specifically looking at timings and correlations between bursty Langmuir waves, Alfvén and whistler-mode waves, and electrons precipitating parallel to the magnetic field. Following a similar analysis on TRICE dayside sounding rocket data, we also calculate the fraction of correlated waves with VLF waves at appropriate frequencies to support the wave-wave interaction bursty Langmuir wave generation mechanism, and compare to results from CHARM II nightside data.
Context, culture and (non-verbal) communication affect handover quality.

PubMed

Frankel, Richard M; Flanagan, Mindy; Ebright, Patricia; Bergman, Alicia; O'Brien, Colleen M; Franks, Zamal; Allen, Andrew; Harris, Angela; Saleem, Jason J

2012-12-01

Transfers of care, also known as handovers, remain a substantial patient safety risk. Although research on handovers has been done since the 1980s, the science is incomplete. Surprisingly few interventions have been rigorously evaluated and, of those that have, few have resulted in long-term positive change. Researchers, both in medicine and other high reliability industries, agree that face-to-face handovers are the most reliable. It is not clear, however, what the term face-to-face means in actual practice. We studied the use of non-verbal behaviours, including gesture, posture, bodily orientation, facial expression, eye contact and physical distance, in the delivery of information during face-to-face handovers. To address this question and study the role of non-verbal behaviour on the quality and accuracy of handovers, we videotaped 52 nursing, medicine and surgery handovers covering 238 patients. Videotapes were analysed using immersion/crystallisation methods of qualitative data analysis. A team of six researchers met weekly for 18 months to view videos together using a consensus-building approach. Consensus was achieved on verbal, non-verbal, and physical themes and patterns observed in the data. We observed four patterns of non-verbal behaviour (NVB) during handovers: (1) joint focus of attention; (2) 'the poker hand'; (3) parallel play and (4) kerbside consultation. In terms of safety, joint focus of attention was deemed to have the best potential for high quality and reliability; however, it occurred infrequently, creating opportunities for education and improvement. Attention to patterns of NVB in face-to-face handovers coupled with education and practice can improve quality and reliability.
Integrated Task and Data Parallel Programming

NASA Technical Reports Server (NTRS)

Grimshaw, A. S.

1998-01-01

This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Integrated Task And Data Parallel Programming: Language Design

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.
Automatic Management of Parallel and Distributed System Resources

NASA Technical Reports Server (NTRS)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Describing, using 'recognition cones'. [parallel-series model with English-like computer program

NASA Technical Reports Server (NTRS)

Uhr, L.

1973-01-01

A parallel-serial 'recognition cone' model is examined, taking into account the model's ability to describe scenes of objects. An actual program is presented in an English-like language. The concept of a 'description' is discussed together with possible types of descriptive information. Questions regarding the level and the variety of detail are considered along with approaches for improving the serial representations of parallel systems.
PISCES: An environment for parallel scientific computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
Eigensolver for a Sparse, Large Hermitian Matrix

NASA Technical Reports Server (NTRS)

Tisdale, E. Robert; Oyafuso, Fabiano; Klimeck, Gerhard; Brown, R. Chris

2003-01-01

A parallel-processing computer program finds a few eigenvalues in a sparse Hermitian matrix that contains as many as 100 million diagonal elements. This program finds the eigenvalues faster, using less memory, than do other, comparable eigensolver programs. This program implements a Lanczos algorithm in the American National Standards Institute/ International Organization for Standardization (ANSI/ISO) C computing language, using the Message Passing Interface (MPI) standard to complement an eigensolver in PARPACK. [PARPACK (Parallel Arnoldi Package) is an extension, to parallel-processing computer architectures, of ARPACK (Arnoldi Package), which is a collection of Fortran 77 subroutines that solve large-scale eigenvalue problems.] The eigensolver runs on Beowulf clusters of computers at the Jet Propulsion Laboratory (JPL).
Parallelization of elliptic solver for solving 1D Boussinesq model

NASA Astrophysics Data System (ADS)

Tarwidi, D.; Adytia, D.

2018-03-01

In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

1997-12-31

The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Relative Debugging of Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2002-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Paralex: An Environment for Parallel Programming in Distributed Systems

DTIC Science & Technology

1991-12-07

distributed systems is coni- parable to assembly language programming for traditional sequential systems - the user must resort to low-level primitives ...to accomplish data encoding/decoding, communication, remote exe- cution, synchronization , failure detection and recovery. It is our belief that... synchronization . Finally, composing parallel programs by interconnecting se- quential computations allows automatic support for heterogeneity and fault tolerance

GPS Sounding Rocket Developments

NASA Technical Reports Server (NTRS)

Bull, Barton

1999-01-01

Sounding rockets are suborbital launch vehicles capable of carrying scientific payloads several hundred miles in altitude. These missions return a variety of scientific data including; chemical makeup and physical processes taking place In the atmosphere, natural radiation surrounding the Earth, data on the Sun, stars, galaxies and many other phenomena. In addition, sounding rockets provide a reasonably economical means of conducting engineering tests for instruments and devices used on satellites and other spacecraft prior to their use in more expensive activities. The NASA Sounding Rocket Program is managed by personnel from Goddard Space Flight Center Wallops Flight Facility (GSFC/WFF) in Virginia. Typically around thirty of these rockets are launched each year, either from established ranges at Wallops Island, Virginia, Poker Flat Research Range, Alaska; White Sands Missile Range, New Mexico or from Canada, Norway and Sweden. Many times launches are conducted from temporary launch ranges in remote parts of the world requi6ng considerable expense to transport and operate tracking radars. An inverse differential GPS system has been developed for Sounding Rocket. This paper addresses the NASA Wallops Island history of GPS Sounding Rocket experience since 1994 and the development of a high accurate and useful system.
Interfacing Computer Aided Parallelization and Performance Analysis

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

2003-01-01

When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
M-TeX and MIST Experiments Launched from Alaska

NASA Image and Video Library

2017-12-08

Caption: Composite shot of all four rockets for the M-TeX and MIST experiments is made up of 30 second exposures. The rocket salvo began at 4:13 a.m. EST, Jan. 26, 2015, from the Poker Flat Research Range, Alaska. Credit: NASA/Jamie Adkins More info: The Mesosphere-Lower Thermosphere Turbulence Experiment, or M-TeX, and the Mesospheric Inversion-layer Stratified Turbulence, or MIST, experiment were successfully conducted the morning of Jan. 26, 2015, from the Poker Flat Research Range, Alaska. The first M-Tex rocket, a NASA Terrier-Improved Malemute sounding rocket, was launched at 4:13 a.m. EST and was followed one-minute later by the first MIST experiment payload on a NASA Terrier-Improved Orion. The second M-TeX payload was launched at 4:46 a.m. EST and also was followed one minute later by the second MIST payload. Preliminary data show that all four payloads worked as planned and the trimethyl aluminum, or TMA, vapor trails were seen at the various land-based observation sites in Alaska. A fifth rocket carrying the Auroral Spatial Structures Probe remains ready on the launch pad. The launch window for this experiment runs through Jan. 27. NASA image use policy. NASA Goddard Space Flight Center enables NASA’s mission through four scientific endeavors: Earth Science, Heliophysics, Solar System Exploration, and Astrophysics. Goddard plays a leading role in NASA’s accomplishments by contributing compelling scientific knowledge to advance the Agency’s mission. Follow us on Twitter Like us on Facebook Find us on Instagram
Cognitive restructuring of gambling-related thoughts: A systematic review.

PubMed

Chrétien, Maxime; Giroux, Isabelle; Goulet, Annie; Jacques, Christian; Bouchard, Stéphane

2017-12-01

Gamblers' thoughts have a fundamental influence on their gambling problem. Cognitive restructuring is the intervention of choice to correct those thoughts. However, certain difficulties are noted in the application of cognitive restructuring techniques and the comprehension of their guidelines. Furthermore, the increase of skill game players (e.g. poker) entering treatment creates a challenge for therapists, as these gamblers present with different thoughts than those of the gamblers usually encountered in treatment (e.g. chance-only games like electronic gambling machines). This systematic review aims to describe how cognitive restructuring is carried out with gamblers based on the evidence available in empirical studies that include cognitive interventions for gambling. Of the 2607 studies collected, 39 were retained. The results highlight exposure as the most frequently used technique to facilitate identification of gambling-related thoughts (imaginal=28.2%; in vivo=10.3%). More than half of the studies (69.2%) clearly reported therapeutic techniques aimed to correct gamblers' thoughts, of which 37% involved visual support to challenge those thoughts (e.g. ABC log). Of the 39 studies retained, 48.7% included skill game players (i.e., poker, blackjack, sports betting) in their sample. However, none of these studies mentioned whether cognitive restructuring had been adapted for these gamblers. Several terms referring to gamblers' thoughts were used interchangeably (e.g. erroneous, dysfunctional or inadequate thoughts), although each of these terms could refer to specific content. Clinical implications of the results are discussed with regard to the needs of therapists. This review also suggests recommendations for future research. Copyright © 2017 Elsevier Ltd. All rights reserved.
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Dual and parallel postdoctoral training programs: implications for the osteopathic medical profession.

PubMed

Burkhart, Diane N; Lischka, Terri A

2011-04-01

Students in colleges of osteopathic medicine have several options when considering postdoctoral training programs. In addition to training programs approved solely by the American Osteopathic Association or accredited solely by the Accreditation Council for Graduate Medical Education (ACGME), students can pursue programs accredited by both organizations (ie, dually accredited programs) or osteopathic programs that occur side-by-side with ACGME programs (ie, parallel programs). In the present article, we report on the availability and growth of these 2 training options and describe their benefits and drawbacks for trainees and the osteopathic medical profession as a whole.
The paradigm compiler: Mapping a functional language for the connection machine

NASA Technical Reports Server (NTRS)

Dennis, Jack B.

1989-01-01

The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
Charon Toolkit for Parallel, Implicit Structured-Grid Computations: Functional Design

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Kutler, Paul (Technical Monitor)

1997-01-01

In a previous report the design concepts of Charon were presented. Charon is a toolkit that aids engineers in developing scientific programs for structured-grid applications to be run on MIMD parallel computers. It constitutes an augmentation of the general-purpose MPI-based message-passing layer, and provides the user with a hierarchy of tools for rapid prototyping and validation of parallel programs, and subsequent piecemeal performance tuning. Here we describe the implementation of the domain decomposition tools used for creating data distributions across sets of processors. We also present the hierarchy of parallelization tools that allows smooth translation of legacy code (or a serial design) into a parallel program. Along with the actual tool descriptions, we will present the considerations that led to the particular design choices. Many of these are motivated by the requirement that Charon must be useful within the traditional computational environments of Fortran 77 and C. Only the Fortran 77 syntax will be presented in this report.
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

DOE Office of Scientific and Technical Information (OSTI.GOV)

The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Exploiting loop level parallelism in nonprocedural dataflow programs

NASA Technical Reports Server (NTRS)

Gokhale, Maya B.

1987-01-01

Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.
Tolerant (parallel) Programming

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Bailey, David H. (Technical Monitor)

1997-01-01

In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

PubMed

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
Exploring types of play in an adapted robotics program for children with disabilities.

PubMed

Lindsay, Sally; Lam, Ashley

2018-04-01

Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

USDA-ARS?s Scientific Manuscript database

This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
Research Experience for Undergraduates: Understanding the Arctic as a System

NASA Astrophysics Data System (ADS)

Alexeev, V. A.; Walsh, J. E.; Arp, C. D.; Hock, R.; Euskirchen, E. S.; Kaden, U.; Polyakov, I.; Romanovsky, V. E.; Trainor, S.

2017-12-01

Today, more than ever, an integrated cross-disciplinary approach is necessary to understand and explain changes in the Arctic and the implications of those changes. Responding to needs in innovative research and education for understanding high-latitude rapid climate change, scientists at the International Arctic research Center of the University of Alaska Fairbanks (UAF) established a new REU (=Research Experience for Undergraduates) NSF-funded site, aiming to attract more undergraduates to arctic sciences. The science focus of this program, building upon the research strengths of UAF, is on understanding the Arctic as a system with emphasis on its physical component. The goals, which were to disseminate new knowledge at the frontiers of polar science and to ignite the enthusiasm of the undergraduates about the Arctic, are pursued by involving undergraduate students in research and educational projects with their mentors using the available diverse on-campus capabilities. IARC hosted the first group of eight students this past summer, focusing on a variety of different disciplines of the Arctic System Science. Students visited research sites around Fairbanks and in remote parts of Alaska (Toolik Lake Field Station, Gulkana glacier, Bonanza Creek, Poker Flats, the CRREL Permafrost Tunnel and others) to see and experience first-hand how the arctic science is done. Each student worked on a research project guided by an experienced instructor. The summer program culminated with a workshop that consisted of reports from the students about their experiences and the results of their projects.
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Parallel transformation of K-SVD solar image denoising algorithm

NASA Astrophysics Data System (ADS)

Liang, Youwen; Tian, Yu; Li, Mei

2017-02-01

The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.

Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

PubMed

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
A parallel solver for huge dense linear systems

NASA Astrophysics Data System (ADS)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

2016-08-01

We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.
A high-altitude barium radial injection experiment

NASA Technical Reports Server (NTRS)

Wescott, E. M.; Stenbaek-Nielsen, H. C.; Hallinan, T. J.; Deehr, C. S.; Romick, G. J.; Olson, J. V.; Roederer, J. G.; Sydora, R.

1980-01-01

A rocket launched from Poker Flat, Alaska, carried a new type of high-explosive barium shaped charge to 571 km, where detonation injected a thin disk of barium vapor with high velocity nearly perpendicular to the magnetic field. The TV images of the injection are spectacular, revealing three major regimes of expanding plasma which showed early instabilities in the neutral gas. The most unusual effect of the injection is a peculiar rayed barium-ion structure lying in the injection plane and centered on a 5 km 'black hole' surrounding the injection point. Preliminary electrostatic computer simulations show a similar rayed development.
NA-9 Solar BackscAtter Ultraviolet (SBUV/2) Instrument and Derived Ozone Data: A Status Report Based on a Review on January 29, 1990

DTIC Science & Technology

1990-06-01

consistency. In particular, we are concerned with a 5 station subset c:nsisting of the stations; Boulder, Belsk, Arosa , Tateno and ’.isbon as these are the...Station Inst. No. Station 15 Arosa 105 Melbourne 61 Boulder 112 New Delhi 102 Edmonton 81 Perth 62 Goose Bay 63 Poker Flat 87 Huancayo 89 Pretoria 85...sites. 11-5 Arosa Dobson instrument 15 was calibrated by GMCC in August 1986. At that time it measured ozone values 2.45% lower on average than did
Electron currents associated with an auroral band

NASA Technical Reports Server (NTRS)

Spiger, R. J.; Anderson, H. R.

1975-01-01

Measurements of electron pitch angle distributions and energy spectra over a broad auroral band were used to calculate net electric current carried by auroral electrons in the vicinity of the band. The particle energy spectrometers were carried by a Nike-Tomahawk rocket launched from Poker Flat, Alaska, at 0722 UT on February 25, 1972. Data are presented which indicate the existence of upward field-aligned currents of electrons in the energy range 0.5-20 keV. The spatial relationship of these currents to visual structure of the auroral arc and the characteristics of the electrons carrying the currents are discussed.
GREECE Mission Launching Into Aurora

NASA Image and Video Library

2014-03-04

Caption: A NASA-funded sounding rocket launches into an aurora in the early morning of March 3, 2014, over Venetie, Alaska. The GREECE mission studies how certain structures – classic curls like swirls of cream in coffee -- form in the aurora. Credit: NASA/Christopher Perry More info: On March 3, 2014, at 6:09 a.m. EST, a NASA-funded sounding rocket launched straight into an aurora over Venetie, Alaska. The Ground-to-Rocket Electrodynamics – Electron Correlative Experiment, or GREECE, sounding rocket mission, which launched from Poker Flat Research Range in Poker Flat, Alaska, will study classic curls in the aurora in the night sky. The GREECE instruments travel on a sounding rocket that launches for a ten-minute ride right through the heart of the aurora reaching its zenith over the native village of Venetie, Alaska. To study the curl structures, GREECE consists of two parts: ground-based imagers located in Venetie to track the aurora from the ground and the rocket to take measurements from the middle of the aurora itself. At their simplest, auroras are caused when particles from the sun funnel over to Earth's night side, generate electric currents, and trigger a shower of particles that strike oxygen and nitrogen some 60 to 200 miles up in Earth's atmosphere, releasing a flash of light. But the details are always more complicated, of course. Researchers wish to understand the aurora, and movement of plasma in general, at much smaller scales including such things as how different structures are formed there. This is a piece of information, which in turn, helps paint a picture of the sun-Earth connection and how energy and particles from the sun interact with Earth's own magnetic system, the magnetosphere. GREECE is a collaborative effort between SWRI, which developed particle instruments and the ground-based imaging, and the University of California, Berkeley, measuring the electric and magnetic fields. The launch is supported by a sounding rocket team from NASA’s Wallops Flight Facility, Wallops Island, Va. The Poker Flat Research Range is operated by the University of Alaska, Fairbanks. “The conditions were optimal,” said Marilia Samara, principal investigator for the mission at Southwest Research Institute in San Antonio, Texas. “We can’t wait to dig into the data.” For more information on the GREECE mission visit: www.nasa.gov/content/goddard/nasa-funded-sounding-rocket- .NASA image use policy. NASA Goddard Space Flight Center enables NASA’s mission through four scientific endeavors: Earth Science, Heliophysics, Solar System Exploration, and Astrophysics. Goddard plays a leading role in NASA’s accomplishments by contributing compelling scientific knowledge to advance the Agency’s mission. Follow us on Twitter Like us on Facebook Find us on Instagram
Concurrency-based approaches to parallel programming

NASA Technical Reports Server (NTRS)

Kale, L.V.; Chrisochoides, N.; Kohl, J.; Yelick, K.

1995-01-01

The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.
Reliability models for dataflow computer systems

NASA Technical Reports Server (NTRS)

Kavi, K. M.; Buckles, B. P.

1985-01-01

The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

2001-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

1999-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Parallel community climate model: Description and user`s guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
The 2nd Symposium on the Frontiers of Massively Parallel Computations

NASA Technical Reports Server (NTRS)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
The Goddard Space Flight Center Program to develop parallel image processing systems

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1972-01-01

Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Parallel Volunteer Learning during Youth Programs

ERIC Educational Resources Information Center

Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi

2012-01-01

Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…
Mechanism to support generic collective communication across a variety of programming models

DOEpatents

Almasi, Gheorghe [Ardsley, NY; Dozsa, Gabor [Ardsley, NY; Kumar, Sameer [White Plains, NY

2011-07-19

A system and method for supporting collective communications on a plurality of processors that use different parallel programming paradigms, in one aspect, may comprise a schedule defining one or more tasks in a collective operation, an executor that executes the task, a multisend module to perform one or more data transfer functions associated with the tasks, and a connection manager that controls one or more connections and identifies an available connection. The multisend module uses the available connection in performing the one or more data transfer functions. A plurality of processors that use different parallel programming paradigms can use a common implementation of the schedule module, the executor module, the connection manager and the multisend module via a language adaptor specific to a parallel programming paradigm implemented on a processor.
Programming with Intervals

NASA Astrophysics Data System (ADS)

Matsakis, Nicholas D.; Gross, Thomas R.

Intervals are a new, higher-level primitive for parallel programming with which programmers directly construct the program schedule. Programs using intervals can be statically analyzed to ensure that they do not deadlock or contain data races. In this paper, we demonstrate the flexibility of intervals by showing how to use them to emulate common parallel control-flow constructs like barriers and signals, as well as higher-level patterns such as bounded-buffer producer-consumer. We have implemented intervals as a publicly available library for Java and Scala.
Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

NASA Astrophysics Data System (ADS)

Ramirez, Andres; Rahnemoonfar, Maryam

2017-04-01

A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
File concepts for parallel I/O

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1989-01-01

The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.

Program For Parallel Discrete-Event Simulation

NASA Technical Reports Server (NTRS)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
LLMapReduce: Multi-Lingual Map-Reduce for Supercomputing Environments

DTIC Science & Technology

2015-11-20

1990s. Popularized by Google [36] and Apache Hadoop [37], map-reduce has become a staple technology of the ever- growing big data community...Lexington, MA, U.S.A Abstract— The map-reduce parallel programming model has become extremely popular in the big data community. Many big data ...to big data users running on a supercomputer. LLMapReduce dramatically simplifies map-reduce programming by providing simple parallel programming
Creating a Parallel Version of VisIt for Microsoft Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whitlock, B J; Biagas, K S; Rawson, P L

2011-12-07

VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Debugging Fortran on a shared memory machine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Allen, T.R.; Padua, D.A.

1987-01-01

Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.
A portable MPI-based parallel vector template library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
A Portable MPI-Based Parallel Vector Template Library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
Parallel computation and the basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1993-05-01

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communications costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis andmore » Parallel Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
The Automated Instrumentation and Monitoring System (AIMS) reference manual

NASA Technical Reports Server (NTRS)

Yan, Jerry; Hontalas, Philip; Listgarten, Sherry

1993-01-01

Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).
Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hirata, So

2003-11-20

We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE PAGES

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi; ...

2016-06-01

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Parent-Child Parallel-Group Intervention for Childhood Aggression in Hong Kong

ERIC Educational Resources Information Center

Fung, Annis L. C.; Tsang, Sandra H. K. M.

2006-01-01

This article reports the original evidence-based outcome study on parent-child parallel group-designed Anger Coping Training (ACT) program for children aged 8-10 with reactive aggression and their parents in Hong Kong. This research program involved experimental and control groups with pre- and post-comparison. Quantitative data collection…
Parallel computer vision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Uhr, L.

1987-01-01

This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
Parallel Performance of a Combustion Chemistry Simulation

DOE PAGES

Skinner, Gregg; Eigenmann, Rudolf

1995-01-01

We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.
Algorithms and programming tools for image processing on the MPP, part 2

NASA Technical Reports Server (NTRS)

Reeves, Anthony P.

1986-01-01

A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.
Voluntary limit setting and player choice in most intense online gamblers: an empirical study of gambling behaviour.

PubMed

Auer, Michael; Griffiths, Mark D

2013-12-01

Social responsibility in gambling has become a major issue for the gaming industry. The possibility for online gamblers to set voluntary time and money limits are a social responsibility practice that is now widespread among online gaming operators. The main issue concerns whether the voluntary setting of such limits has any positive impact on subsequent gambling behaviour and whether such measures are of help to problem gamblers. In this paper, this issue is examined through data collected from a representative random sample of 100,000 players who gambled on the win2day gambling website. When opening an account at the win2day site, there is a mandatory requirement for all players to set time and cash-in limits (that cannot exceed 800 per week). During a 3-month period, all voluntary time and/or money limit setting behaviour by a subsample of online gamblers (n = 5,000) within this mandatory framework was tracked and recorded for subsequent data analysis. From the 5,000 gamblers, the 10 % most intense players (as measured by theoretical loss) were further investigated. Voluntary spending limits had the highest significant effect on subsequent monetary spending among casino and lottery gamblers. Monetary spending among poker players significantly decreased after setting a voluntary time limit. The highest significant decrease in playing duration was among poker players after setting a voluntary playing duration limit. The results of the study demonstrated that voluntary limit setting had a specific and significant effect on the studied gamblers. Therefore, voluntary limits appear to show an appropriate effect in the desired target group (i.e., the most gaming intense players).
High-latitude E Region Ionosphere-thermosphere Coupling: A Comparative Study Using in Situ and Incoherent Scatter Radar Observations

NASA Technical Reports Server (NTRS)

Burchill, J. K.; Clemmons, J. H.; Knudsen, D. J.; Larsen, M.; Nicolls, M. J.; Pfaff, R. F.; Rowland, D.; Sangalli, L.

2012-01-01

We present in situ and ground-based measurements of the ratio k of ion cyclotronangular frequency to ion-neutral momentum transfer collision frequency to investigateionosphere-thermosphere (IT) coupling in the auroral E region. In situ observations were obtained by NASA sounding rocket 36.234, which was launched into the nightsideE region ionosphere at 1229 UT on 19 January 2007 from Poker Flat, AK. The payload carried instrumentation to determine ion drift angle and electric field vectors. Neutral winds were measured by triangulating a chemical tracer released from rocket 41.064 launched two minutes later. k is calculated from the rotation of the ion drift angle relative to the E-cross-B drift direction in a frame co-rotating with the payload. Between the altitudes of 118 km and 130 km k increases exponentially with a scale height of 9.3 +/- 0.7 km, deviating from an exponential above 130 km. k = 1 at an altitude z(sub0) of 119.9 +/- 0.5 km. The ratio was also estimated from Poker Flat Incoherent Scatter Radar (PFISR) measurements using the rotation of ion velocity with altitude. Exponential fits to the PFISR measurements made during the flight of 41.064 yield z(sub0) 115.9 +/- 1.2 km and a scale height of 9.1 +/- 1.0 km. Differences between in situ and ground-based measurements show that the E region atmospheric densities were structured vertically and/or horizontally on scales of 1 km to 10 km. There were no signs of ionospheric structure in ion density or ion temperature below scales of 1 km. The observations demonstrate the accuracy with which the in situ and PFISR data may be used as probes of IT coupling.
New Insights into Auroral Particle Acceleration via Coordinated Optical-Radar Networks

NASA Astrophysics Data System (ADS)

Hirsch, M.

2016-12-01

The efficacy of instruments synthesized from heterogeneous sensor networks is increasingly being realized in fielded science observation systems. New insights into the finest spatio-temporal scales of ground-observable ionospheric physics are realized by coupling low-level data from fixed legacy instruments with mobile and portable sensors. In particular, turbulent ionospheric events give enhanced radar returns more than three orders of magnitude larger than typical incoherent plasma observations. Radar integration times for the Poker Flat Incoherent Scatter Radar (PFISR) can thereby be shrunk from order 100 second integration time down to order 100 millisecond integration time for the ion line. Auroral optical observations with 20 millisecond cadence synchronized in absolute time with the radar help uncover plausible particle acceleration processes for the highly dynamic aurora often associated with Langmuir turbulence. Quantitative analysis of coherent radar returns combined with a physics-based model yielding optical volume emission rate profiles vs. differential number flux input of precipitating particles into the ionosphere yield plausibility estimates for a particular auroral acceleration process type. Tabulated results from a survey of auroral events where the Boston University High Speed Auroral Tomography system operated simultaneously with PFISR are presented. Context is given to the narrow-field HiST observations by the Poker Flat Digital All-Sky Camera and THEMIS GBO ASI network. Recent advances in high-rate (order 100 millisecond) plasma line ISR observations (100x improvement in temporal resolution) will contribute to future coordinated observations. ISR beam pattern and pulse parameter configurations favorable for future coordinated optical-ISR experiments are proposed in light of recent research uncovering the criticality of aspect angle to ISR-observable physics. High-rate scientist-developed GPS TEC receivers are expected to contribute additional high resolution observations to such experiments.
A Configurable Event-Driven Convolutional Node with Rate Saturation Mechanism for Modular ConvNet Systems Implementation.

PubMed

Camuñas-Mesa, Luis A; Domínguez-Cordero, Yaisel L; Linares-Barranco, Alejandro; Serrano-Gotarredona, Teresa; Linares-Barranco, Bernabé

2018-01-01

Convolutional Neural Networks (ConvNets) are a particular type of neural network often used for many applications like image recognition, video analysis or natural language processing. They are inspired by the human brain, following a specific organization of the connectivity pattern between layers of neurons known as receptive field. These networks have been traditionally implemented in software, but they are becoming more computationally expensive as they scale up, having limitations for real-time processing of high-speed stimuli. On the other hand, hardware implementations show difficulties to be used for different applications, due to their reduced flexibility. In this paper, we propose a fully configurable event-driven convolutional node with rate saturation mechanism that can be used to implement arbitrary ConvNets on FPGAs. This node includes a convolutional processing unit and a routing element which allows to build large 2D arrays where any multilayer structure can be implemented. The rate saturation mechanism emulates the refractory behavior in biological neurons, guaranteeing a minimum separation in time between consecutive events. A 4-layer ConvNet with 22 convolutional nodes trained for poker card symbol recognition has been implemented in a Spartan6 FPGA. This network has been tested with a stimulus where 40 poker cards were observed by a Dynamic Vision Sensor (DVS) in 1 s time. Different slow-down factors were applied to characterize the behavior of the system for high speed processing. For slow stimulus play-back, a 96% recognition rate is obtained with a power consumption of 0.85 mW. At maximum play-back speed, a traffic control mechanism downsamples the input stimulus, obtaining a recognition rate above 63% when less than 20% of the input events are processed, demonstrating the robustness of the network.

An analysis of switching and non-switching slot machine player behaviour.

PubMed

Coates, Ewan; Blaszczynski, Alex

2013-12-01

Learning theory predicts that, given the repeated choice to bet between two concurrently available slot machines, gamblers will learn to bet more money on the machine with higher expected return (payback percentage) or higher win probability per spin (volatility). The purpose of this study was to investigate whether this occurs when the two machines vary orthogonally on payback percentage and volatility. The sample comprised 52 first year psychology students (mean age = 20.3 years, 20 females, 32 males) who had played a gaming machine at least once in the previous 12 months. Participants were administered a battery of questionnaires designed to assess level of knowledge on the characteristics and operation of poker machines, frequency of poker machine play in the past 12 months, personality traits of impulsivity and capacity for cognitive reflection, and gambling beliefs. For the experimental task, participants were instructed to play on two PC-simulated electronic gaming machines (EGMs or slot machines) that differed on payback percentage and volatility, with the option of freely switching between EGMs after a practice phase. Results indicated that participants were able to easily discriminate between machines and manifested a preference to play machines offering higher payback or volatility. These findings diverged from previous findings of no preference for play on higher payback/volatility machines, potentially due to of the current study's absence of the option to make multi-line and multi-credit bets. It was concluded that return rate parameters like payback percentage and volatility strongly influenced slot machine preference in the absence of betting options like multi-line bets, though more research is needed to determine the effects of such betting options on player distribution of money between multiple EGMs.
A Configurable Event-Driven Convolutional Node with Rate Saturation Mechanism for Modular ConvNet Systems Implementation

PubMed Central

Camuñas-Mesa, Luis A.; Domínguez-Cordero, Yaisel L.; Linares-Barranco, Alejandro; Serrano-Gotarredona, Teresa; Linares-Barranco, Bernabé

2018-01-01

Convolutional Neural Networks (ConvNets) are a particular type of neural network often used for many applications like image recognition, video analysis or natural language processing. They are inspired by the human brain, following a specific organization of the connectivity pattern between layers of neurons known as receptive field. These networks have been traditionally implemented in software, but they are becoming more computationally expensive as they scale up, having limitations for real-time processing of high-speed stimuli. On the other hand, hardware implementations show difficulties to be used for different applications, due to their reduced flexibility. In this paper, we propose a fully configurable event-driven convolutional node with rate saturation mechanism that can be used to implement arbitrary ConvNets on FPGAs. This node includes a convolutional processing unit and a routing element which allows to build large 2D arrays where any multilayer structure can be implemented. The rate saturation mechanism emulates the refractory behavior in biological neurons, guaranteeing a minimum separation in time between consecutive events. A 4-layer ConvNet with 22 convolutional nodes trained for poker card symbol recognition has been implemented in a Spartan6 FPGA. This network has been tested with a stimulus where 40 poker cards were observed by a Dynamic Vision Sensor (DVS) in 1 s time. Different slow-down factors were applied to characterize the behavior of the system for high speed processing. For slow stimulus play-back, a 96% recognition rate is obtained with a power consumption of 0.85 mW. At maximum play-back speed, a traffic control mechanism downsamples the input stimulus, obtaining a recognition rate above 63% when less than 20% of the input events are processed, demonstrating the robustness of the network. PMID:29515349
Poker flat radar observations of the magnetosphere-ionosphere coupling electrodynamics of the earthward penetrating plasma sheet following convection enhancements

NASA Astrophysics Data System (ADS)

Lyons, L. R.; Zou, S.; Heinselman, C. J.; Nicolls, M. J.; Anderson, P. C.

2009-05-01

The plasma sheet moves earthward (equatorward in the ionosphere) after enhancements in convection, and the electrodynamics of this response is strongly influenced by Region 2 magnetosphere-ionosphere coupling. We have used Poker Flat Advanced Modular Incoherent Scatter Radar (PFISR) observations associated with two relatively abrupt southward turnings of the IMF to provide an initial evaluation of aspects of this response. The observations show that strong westward sub-auroral polarization streams (SAPS) flow regions moved equatorward as the plasma sheet electron precipitation (the diffuse aurora) penetrated equatorward following the IMF southward turnings. Consistent with our identification of these flows as SAPS, concurrent DMSP particle precipitation measurements show the equatorial boundary of ion precipitation equatorward of the electron precipitation boundary and that westward flows lie within the low-conductivity region between the two boundaries where the plasma sheet ion pressure gradient is expected to drive downward R2 currents. Evidence for these downward currents is seen in the DMSP magnetometer observations. Preliminary examination indicates that the SAPS response seen in the examples presented here may be common. However, detailed analysis will be required for many more events to reliably determine if this is the case. If so, it would imply that SAPS are frequently an important aspect of the inner magnetospheric electric field distribution, and that they are critical for understanding the response of the magnetosphere-ionosphere system to enhancements in convection, including understanding the earthward penetration of the plasma sheet. This earthward penetration is critical to geomagnetic disturbance phenomena such as the substorm growth phase and the formation of the stormtime ring current. Additionally, for one example, a prompt electric field response to the IMF southward turnings is seen within the inner plasma sheet.
Scalable Unix commands for parallel processors : a high-performance implementation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ong, E.; Lusk, E.; Gropp, W.

2001-06-22

We describe a family of MPI applications we call the Parallel Unix Commands. These commands are natural parallel versions of common Unix user commands such as ls, ps, and find, together with a few similar commands particular to the parallel environment. We describe the design and implementation of these programs and present some performance results on a 256-node Linux cluster. The Parallel Unix Commands are open source and freely available.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1989-01-01

A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. The authors focus on tensor product array computations, a simple but important class of numerical algorithms. They consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and then look at how such parallel kernels can be combined to form parallel tensor product algorithms.
Including sheath effects in the interpretation of planar retarding potential analyzer's low-energy ion data

NASA Astrophysics Data System (ADS)

Fisher, L. E.; Lynch, K. A.; Fernandes, P. A.; Bekkeng, T. A.; Moen, J.; Zettergren, M.; Miceli, R. J.; Powell, S.; Lessard, M. R.; Horak, P.

2016-04-01

The interpretation of planar retarding potential analyzers (RPA) during ionospheric sounding rocket missions requires modeling the thick 3D plasma sheath. This paper overviews the theory of RPAs with an emphasis placed on the impact of the sheath on current-voltage (I-V) curves. It then describes the Petite Ion Probe (PIP) which has been designed to function in this difficult regime. The data analysis procedure for this instrument is discussed in detail. Data analysis begins by modeling the sheath with the Spacecraft Plasma Interaction System (SPIS), a particle-in-cell code. Test particles are traced through the sheath and detector to determine the detector's response. A training set is constructed from these simulated curves for a support vector regression analysis which relates the properties of the I-V curve to the properties of the plasma. The first in situ use of the PIPs occurred during the MICA sounding rocket mission which launched from Poker Flat, Alaska in February of 2012. These data are presented as a case study, providing valuable cross-instrument comparisons. A heritage top-hat thermal ion electrostatic analyzer, called the HT, and a multi-needle Langmuir probe have been used to validate both the PIPs and the data analysis method. Compared to the HT, the PIP ion temperature measurements agree with a root-mean-square error of 0.023 eV. These two instruments agree on the parallel-to-B plasma flow velocity with a root-mean-square error of 130 m/s. The PIP with its field of view aligned perpendicular-to-B provided a density measurement with an 11% error compared to the multi-needle Langmuir Probe. Higher error in the other PIP's density measurement is likely due to simplifications in the SPIS model geometry.
A CS1 pedagogical approach to parallel thinking

NASA Astrophysics Data System (ADS)

Rague, Brian William

Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of any initial three-week CS1 level module when compared with student comprehension levels just prior to starting the course. Survey results measured during the ninth week of the course reveal that performance levels remained high compared to pre-course performance scores. A second result produced by this study reveals no statistically significant interaction effect between the intervention method and student performance as measured by the evaluation instrument over three separate testing periods. However, visual inspection of survey score trends and the low p-value generated by the interaction analysis (0.062) indicate that further studies may verify improved concept retention levels for the lecture w/PAT group.
YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste

Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less
PISCES 2 users manual

NASA Technical Reports Server (NTRS)

Pratt, Terrence W.

1987-01-01

PISCES 2 is a programming environment and set of extensions to Fortran 77 for parallel programming. It is intended to provide a basis for writing programs for scientific and engineering applications on parallel computers in a way that is relatively independent of the particular details of the underlying computer architecture. This user's manual provides a complete description of the PISCES 2 system as it is currently implemented on the 20 processor Flexible FLEX/32 at NASA Langley Research Center.
A language comparison for scientific computing on MIMD architectures

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.; Voigt, Robert G.

1989-01-01

Choleski's method for solving banded symmetric, positive definite systems is implemented on a multiprocessor computer using three FORTRAN based parallel programming languages, the Force, PISCES and Concurrent FORTRAN. The capabilities of the language for expressing parallelism and their user friendliness are discussed, including readability of the code, debugging assistance offered, and expressiveness of the languages. The performance of the different implementations is compared. It is argued that PISCES, using the Force for medium-grained parallelism, is the appropriate choice for programming Choleski's method on the multiprocessor computer, Flex/32.
Code Parallelization with CAPO: A User Manual

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2001-01-01

A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Thread concept for automatic task parallelization in image analysis

NASA Astrophysics Data System (ADS)

Lueckenhaus, Maximilian; Eckstein, Wolfgang

1998-09-01

Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
PIPS-SBB: A Parallel Distributed-Memory Branch-and-Bound Algorithm for Stochastic Mixed-Integer Programs

DOE PAGES

Munguia, Lluis-Miquel; Oxberry, Geoffrey; Rajan, Deepak

2016-05-01

Stochastic mixed-integer programs (SMIPs) deal with optimization under uncertainty at many levels of the decision-making process. When solved as extensive formulation mixed- integer programs, problem instances can exceed available memory on a single workstation. In order to overcome this limitation, we present PIPS-SBB: a distributed-memory parallel stochastic MIP solver that takes advantage of parallelism at multiple levels of the optimization process. We also show promising results on the SIPLIB benchmark by combining methods known for accelerating Branch and Bound (B&B) methods with new ideas that leverage the structure of SMIPs. Finally, we expect the performance of PIPS-SBB to improve furthermore » as more functionality is added in the future.« less
On the utility of threads for data parallel programming

NASA Technical Reports Server (NTRS)

Fahringer, Thomas; Haines, Matthew; Mehrotra, Piyush

1995-01-01

Threads provide a useful programming model for asynchronous behavior because of their ability to encapsulate units of work that can then be scheduled for execution at runtime, based on the dynamic state of a system. Recently, the threaded model has been applied to the domain of data parallel scientific codes, and initial reports indicate that the threaded model can produce performance gains over non-threaded approaches, primarily through the use of overlapping useful computation with communication latency. However, overlapping computation with communication is possible without the benefit of threads if the communication system supports asynchronous primitives, and this comparison has not been made in previous papers. This paper provides a critical look at the utility of lightweight threads as applied to data parallel scientific programming.
Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

NASA Technical Reports Server (NTRS)

Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

2005-01-01

The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.
A design methodology for portable software on parallel computers

NASA Technical Reports Server (NTRS)

Nicol, David M.; Miller, Keith W.; Chrisman, Dan A.

1993-01-01

This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured the performance of a portion of this subsystem on the Intel iPSC/2 parallel computer. These results are provided in section four. Our future work is summarized in section five, our acknowledgements are stated in section six, and references for published papers associated with NAG-1-995 are provided in section seven.
Massively parallel sparse matrix function calculations with NTPoly

NASA Astrophysics Data System (ADS)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.

PubMed

Halic, Tansel; Ahn, Woojin; De, Suvranu

2014-01-01

This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gupta, Manish

1992-01-01

Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.

GRADSPMHD: A parallel MHD code based on the SPH formalism

NASA Astrophysics Data System (ADS)

Vanaverbeke, S.; Keppens, R.; Poedts, S.

2014-03-01

We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a Sedov test including 15625 particles on a single CPU. Classification: 12. Nature of problem: Evolution of a plasma in the ideal MHD approximation. Solution method: The equations of magnetohydrodynamics are solved using the SPH method. Running time: The test provided takes approximately 20 min using 4 processors.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

PubMed

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

PubMed Central

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Parallel Logic Programming and Parallel Systems Software and Hardware

DTIC Science & Technology

1989-07-29

Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted
The force on the flex: Global parallelism and portability

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1986-01-01

A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Cellular automata with object-oriented features for parallel molecular network modeling.

PubMed

Zhu, Hao; Wu, Yinghui; Huang, Sui; Sun, Yan; Dhar, Pawan

2005-06-01

Cellular automata are an important modeling paradigm for studying the dynamics of large, parallel systems composed of multiple, interacting components. However, to model biological systems, cellular automata need to be extended beyond the large-scale parallelism and intensive communication in order to capture two fundamental properties characteristic of complex biological systems: hierarchy and heterogeneity. This paper proposes extensions to a cellular automata language, Cellang, to meet this purpose. The extended language, with object-oriented features, can be used to describe the structure and activity of parallel molecular networks within cells. Capabilities of this new programming language include object structure to define molecular programs within a cell, floating-point data type and mathematical functions to perform quantitative computation, message passing capability to describe molecular interactions, as well as new operators, statements, and built-in functions. We discuss relevant programming issues of these features, including the object-oriented description of molecular interactions with molecule encapsulation, message passing, and the description of heterogeneity and anisotropy at the cell and molecule levels. By enabling the integration of modeling at the molecular level with system behavior at cell, tissue, organ, or even organism levels, the program will help improve our understanding of how complex and dynamic biological activities are generated and controlled by parallel functioning of molecular networks. Index Terms-Cellular automata, modeling, molecular network, object-oriented.
Efficient Thread Labeling for Monitoring Programs with Nested Parallelism

NASA Astrophysics Data System (ADS)

Ha, Ok-Kyoon; Kim, Sun-Sook; Jun, Yong-Kee

It is difficult and cumbersome to detect data races occurred in an execution of parallel programs. Any on-the-fly race detection techniques using Lamport's happened-before relation needs a thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel threads. NR labeling is an efficient thread labeling scheme for the fork-join program model with nested parallelism, because its efficiency depends only on the nesting depth for every fork and join operation. This paper presents an improved NR labeling, called e-NR labeling, in which every thread generates its label by inheriting the pointer to its ancestor list from the parent threads or by updating the pointer in a constant amount of time and space. This labeling is more efficient than the NR labeling, because its efficiency does not depend on the nesting depth for every fork and join operation. Some experiments were performed with OpenMP programs having nesting depths of three or four and maximum parallelisms varying from 10,000 to 1,000,000. The results show that e-NR is 5 times faster than NR labeling and 4.3 times faster than OS labeling in the average time for creating and maintaining the thread labels. In average space required for labeling, it is 3.5 times smaller than NR labeling and 3 times smaller than OS labeling.
User-Defined Data Distributions in High-Level Programming Languages

NASA Technical Reports Server (NTRS)

Diaconescu, Roxana E.; Zima, Hans P.

2006-01-01

One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Parallel Rendering of Large Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Garbutt, Alexander E.

2005-01-01

Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.
Solving Partial Differential Equations in a data-driven multiprocessor environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.

1988-12-31

Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
Improved Background Removal in Sounding Rocket Neutral Atom Imaging Data

NASA Astrophysics Data System (ADS)

Smith, M. R.; Rowland, D. E.

2017-12-01

The VISIONS sounding rocket, launched into a substorm on Feb 7, 2013 from Poker Flat, Alaska had a novel miniaturized energetic neutral atom (ENA) imager onboard. We present further analysis of the ENA data from this rocket flight, including improved removal of ultraviolet and electron contamination. In particular, the relative error source contributions due to geocoronal, auroral, and airglow UV, as well as energetic electrons from 10 eV to 3 keV were assessed. The resulting data provide a more clear understanding of the spatial and temporal variations of the ion populations that are energized to tens or hundreds of eV.
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.

PubMed

Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro

2016-01-01

Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
Visual analysis of inter-process communication for large-scale parallel computing.

PubMed

Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

2009-01-01

In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
Parallel machine architecture and compiler design facilities

NASA Technical Reports Server (NTRS)

Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

1990-01-01

The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

1999-01-01

As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
DOE SBIR Phase-1 Report on Hybrid CPU-GPU Parallel Development of the Eulerian-Lagrangian Barracuda Multiphase Program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dr. Dale M. Snider

2011-02-28

This report gives the result from the Phase-1 work on demonstrating greater than 10x speedup of the Barracuda computer program using parallel methods and GPU processors (General-Purpose Graphics Processing Unit or Graphics Processing Unit). Phase-1 demonstrated a 12x speedup on a typical Barracuda function using the GPU processor. The problem test case used about 5 million particles and 250,000 Eulerian grid cells. The relative speedup, compared to a single CPU, increases with increased number of particles giving greater than 12x speedup. Phase-1 work provided a path for reformatting data structure modifications to give good parallel performance while keeping a friendlymore » environment for new physics development and code maintenance. The implementation of data structure changes will be in Phase-2. Phase-1 laid the ground work for the complete parallelization of Barracuda in Phase-2, with the caveat that implemented computer practices for parallel programming done in Phase-1 gives immediate speedup in the current Barracuda serial running code. The Phase-1 tasks were completed successfully laying the frame work for Phase-2. The detailed results of Phase-1 are within this document. In general, the speedup of one function would be expected to be higher than the speedup of the entire code because of I/O functions and communication between the algorithms. However, because one of the most difficult Barracuda algorithms was parallelized in Phase-1 and because advanced parallelization methods and proposed parallelization optimization techniques identified in Phase-1 will be used in Phase-2, an overall Barracuda code speedup (relative to a single CPU) is expected to be greater than 10x. This means that a job which takes 30 days to complete will be done in 3 days. Tasks completed in Phase-1 are: Task 1: Profile the entire Barracuda code and select which subroutines are to be parallelized (See Section Choosing a Function to Accelerate) Task 2: Select a GPU consultant company and jointly parallelize subroutines (CPFD chose the small business EMPhotonics for the Phase-1 the technical partner. See Section Technical Objective and Approach) Task 3: Integrate parallel subroutines into Barracuda (See Section Results from Phase-1 and its subsections) Task 4: Testing, refinement, and optimization of parallel methodology (See Section Results from Phase-1 and Section Result Comparison Program) Task 5: Integrate Phase-1 parallel subroutines into Barracuda and release (See Section Results from Phase-1 and its subsections) Task 6: Roadmap of Phase-2 (See Section Plan for Phase-2) With the completion of Phase 1 we have the base understanding to completely parallelize Barracuda. An overview of the work to move Barracuda to a parallelized code is given in Plan for Phase-2.« less
Monitoring Data-Structure Evolution in Distributed Message-Passing Programs

NASA Technical Reports Server (NTRS)

Sarukkai, Sekhar R.; Beers, Andrew; Woodrow, Thomas S. (Technical Monitor)

1996-01-01

Monitoring the evolution of data structures in parallel and distributed programs, is critical for debugging its semantics and performance. However, the current state-of-art in tracking and presenting data-structure information on parallel and distributed environments is cumbersome and does not scale. In this paper we present a methodology that automatically tracks memory bindings (not the actual contents) of static and dynamic data-structures of message-passing C programs, using PVM. With the help of a number of examples we show that in addition to determining the impact of memory allocation overheads on program performance, graphical views can help in debugging the semantics of program execution. Scalable animations of virtual address bindings of source-level data-structures are used for debugging the semantics of parallel programs across all processors. In conjunction with light-weight core-files, this technique can be used to complement traditional debuggers on single processors. Detailed information (such as data-structure contents), on specific nodes, can be determined using traditional debuggers after the data structure evolution leading to the semantic error is observed graphically.
Command/response protocols and concurrent software

NASA Technical Reports Server (NTRS)

Bynum, W. L.

1987-01-01

A version of the program to control the parallel jaw gripper is documented. The parallel jaw end-effector hardware and the Intel 8031 processor that is used to control the end-effector are briefly described. A general overview of the controller program is given and a complete description of the program's structure and design are contained. There are three appendices: a memory map of the on-chip RAM, a cross-reference listing of the self-scheduling routines, and a summary of the top-level and monitor commands.
Computer programs for adjusting the mechanical properties of 2-inch dimension lumber for changes in moisture content

Treesearch

James W. Evans; Jane K. Evans; David W. Green

1990-01-01

This paper presents computer programs for adjusting the mechanical properties of 2-in. dimension lumber for changes in moisture content. Mechanical properties adjusted are modulus of rupture, ultimate tensile stress parallel to the grain, ultimate compressive stress parallel to the gain, and flexural modulus of elasticity. The models are valid for moisture contents...

NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations

NASA Astrophysics Data System (ADS)

Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H. J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A.

2010-09-01

The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance. Program summaryProgram title: NWChem Catalogue identifier: AEGI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open Source Educational Community License No. of lines in distributed program, including test data, etc.: 11 709 543 No. of bytes in distributed program, including test data, etc.: 680 696 106 Distribution format: tar.gz Programming language: Fortran 77, C Computer: all Linux based workstations and parallel supercomputers, Windows and Apple machines Operating system: Linux, OS X, Windows Has the code been vectorised or parallelized?: Code is parallelized Classification: 2.1, 2.2, 3, 7.3, 7.7, 16.1, 16.2, 16.3, 16.10, 16.13 Nature of problem: Large-scale atomistic simulations of chemical and biological systems require efficient and reliable methods for ground and excited solutions of many-electron Hamiltonian, analysis of the potential energy surface, and dynamics. Solution method: Ground and excited solutions of many-electron Hamiltonian are obtained utilizing density-functional theory, many-body perturbation approach, and coupled cluster expansion. These solutions or a combination thereof with classical descriptions are then used to analyze potential energy surface and perform dynamical simulations. Additional comments: Full documentation is provided in the distribution file. This includes an INSTALL file giving details of how to build the package. A set of test runs is provided in the examples directory. The distribution file for this program is over 90 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Running time depends on the size of the chemical system, complexity of the method, number of cpu's and the computational task. It ranges from several seconds for serial DFT energy calculations on a few atoms to several hours for parallel coupled cluster energy calculations on tens of atoms or ab-initio molecular dynamics simulation on hundreds of atoms.
Petascale Simulation Initiative Tech Base: FY2007 Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

May, J; Chen, R; Jefferson, D

The Petascale Simulation Initiative began as an LDRD project in the middle of Fiscal Year 2004. The goal of the project was to develop techniques to allow large-scale scientific simulation applications to better exploit the massive parallelism that will come with computers running at petaflops per second. One of the major products of this work was the design and prototype implementation of a programming model and a runtime system that lets applications extend data-parallel applications to use task parallelism. By adopting task parallelism, applications can use processing resources more flexibly, exploit multiple forms of parallelism, and support more sophisticated multiscalemore » and multiphysics models. Our programming model was originally called the Symponents Architecture but is now known as Cooperative Parallelism, and the runtime software that supports it is called Coop. (However, we sometimes refer to the programming model as Coop for brevity.) We have documented the programming model and runtime system in a submitted conference paper [1]. This report focuses on the specific accomplishments of the Cooperative Parallelism project (as we now call it) under Tech Base funding in FY2007. Development and implementation of the model under LDRD funding alone proceeded to the point of demonstrating a large-scale materials modeling application using Coop on more than 1300 processors by the end of FY2006. Beginning in FY2007, the project received funding from both LDRD and the Computation Directorate Tech Base program. Later in the year, after the three-year term of the LDRD funding ended, the ASC program supported the project with additional funds. The goal of the Tech Base effort was to bring Coop from a prototype to a production-ready system that a variety of LLNL users could work with. Specifically, the major tasks that we planned for the project were: (1) Port SARS [former name of the Coop runtime system] to another LLNL platform, probably Thunder or Peloton (depending on when Peloton becomes available); (2) Improve SARS's robustness and ease-of-use, and develop user documentation; and (3) Work with LLNL code teams to help them determine how Symponents could benefit their applications. The original funding request was $296,000 for the year, and we eventually received $252,000. The remainder of this report describes our efforts and accomplishments for each of the goals listed above.« less
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS

NASA Technical Reports Server (NTRS)

Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets: and (2) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Sargent, Jeff Scott

1988-01-01

A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies

PubMed Central

Ma, Li; Runesha, H Birali; Dvorkin, Daniel; Garbe, John R; Da, Yang

2008-01-01

Background Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS. Results The EPISNPmpi and EPISNP computer programs were developed for testing single-locus and epistatic SNP effects on quantitative traits in GWAS, including tests of three single-locus effects for each SNP (SNP genotypic effect, additive and dominance effects) and five epistasis effects for each pair of SNPs (two-locus interaction, additive × additive, additive × dominance, dominance × additive, and dominance × dominance) based on the extended Kempthorne model. EPISNPmpi is the parallel computing program for epistasis testing in large scale GWAS and achieved excellent scalability for large scale analysis and portability for various parallel computing platforms. EPISNP is the serial computing program based on the EPISNPmpi code for epistasis testing in small scale GWAS using commonly available operating systems and computer hardware. Three serial computing utility programs were developed for graphical viewing of test results and epistasis networks, and for estimating CPU time and disk space requirements. Conclusion The EPISNPmpi parallel computing program provides an effective computing tool for epistasis testing in large scale GWAS, and the epiSNP serial computing programs are convenient tools for epistasis analysis in small scale GWAS using commonly available computer hardware. PMID:18644146
The language parallel Pascal and other aspects of the massively parallel processor

NASA Technical Reports Server (NTRS)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Automatic recognition of vector and parallel operations in a higher level language

NASA Technical Reports Server (NTRS)

Schneck, P. B.

1971-01-01

A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.
Understanding and Improving High-Performance I/O Subsystems

NASA Technical Reports Server (NTRS)

El-Ghazawi, Tarek A.; Frieder, Gideon; Clark, A. James

1996-01-01

This research program has been conducted in the framework of the NASA Earth and Space Science (ESS) evaluations led by Dr. Thomas Sterling. In addition to the many important research findings for NASA and the prestigious publications, the program has helped orienting the doctoral research program of two students towards parallel input/output in high-performance computing. Further, the experimental results in the case of the MasPar were very useful and helpful to MasPar with which the P.I. has had many interactions with the technical management. The contributions of this program are drawn from three experimental studies conducted on different high-performance computing testbeds/platforms, and therefore presented in 3 different segments as follows: 1. Evaluating the parallel input/output subsystem of a NASA high-performance computing testbeds, namely the MasPar MP- 1 and MP-2; 2. Characterizing the physical input/output request patterns for NASA ESS applications, which used the Beowulf platform; and 3. Dynamic scheduling techniques for hiding I/O latency in parallel applications such as sparse matrix computations. This study also has been conducted on the Intel Paragon and has also provided an experimental evaluation for the Parallel File System (PFS) and parallel input/output on the Paragon. This report is organized as follows. The summary of findings discusses the results of each of the aforementioned 3 studies. Three appendices, each containing a key scholarly research paper that details the work in one of the studies are included.
Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1996-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

NASA Technical Reports Server (NTRS)

Sues, R. H.; Lua, Y. J.; Smith, M. D.

1994-01-01

The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Real-Time MENTAT programming language and architecture

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; Silberman, Ami; Liu, Jane W. S.

1989-01-01

Real-time MENTAT, a programming environment designed to simplify the task of programming real-time applications in distributed and parallel environments, is described. It is based on the same data-driven computation model and object-oriented programming paradigm as MENTAT. It provides an easy-to-use mechanism to exploit parallelism, language constructs for the expression and enforcement of timing constraints, and run-time support for scheduling and exciting real-time programs. The real-time MENTAT programming language is an extended C++. The extensions are added to facilitate automatic detection of data flow and generation of data flow graphs, to express the timing constraints of individual granules of computation, and to provide scheduling directives for the runtime system. A high-level view of the real-time MENTAT system architecture and programming language constructs is provided.
Computational strategies for three-dimensional flow simulations on distributed computer systems. Ph.D. Thesis Semiannual Status Report, 15 Aug. 1993 - 15 Feb. 1994

NASA Technical Reports Server (NTRS)

Weed, Richard Allen; Sankar, L. N.

1994-01-01

An increasing amount of research activity in computational fluid dynamics has been devoted to the development of efficient algorithms for parallel computing systems. The increasing performance to price ratio of engineering workstations has led to research to development procedures for implementing a parallel computing system composed of distributed workstations. This thesis proposal outlines an ongoing research program to develop efficient strategies for performing three-dimensional flow analysis on distributed computing systems. The PVM parallel programming interface was used to modify an existing three-dimensional flow solver, the TEAM code developed by Lockheed for the Air Force, to function as a parallel flow solver on clusters of workstations. Steady flow solutions were generated for three different wing and body geometries to validate the code and evaluate code performance. The proposed research will extend the parallel code development to determine the most efficient strategies for unsteady flow simulations.
Communication library for run-time visualization of distributed, asynchronous data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rowlan, J.; Wightman, B.T.

1994-04-01

In this paper we present a method for collecting and visualizing data generated by a parallel computational simulation during run time. Data distributed across multiple processes is sent across parallel communication lines to a remote workstation, which sorts and queues the data for visualization. We have implemented our method in a set of tools called PORTAL (for Parallel aRchitecture data-TrAnsfer Library). The tools comprise generic routines for sending data from a parallel program (callable from either C or FORTRAN), a semi-parallel communication scheme currently built upon Unix Sockets, and a real-time connection to the scientific visualization program AVS. Our methodmore » is most valuable when used to examine large datasets that can be efficiently generated and do not need to be stored on disk. The PORTAL source libraries, detailed documentation, and a working example can be obtained by anonymous ftp from info.mcs.anl.gov from the file portal.tar.Z from the directory pub/portal.« less
Multiprogramming performance degradation - Case study on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Dimpsey, R. T.; Iyer, R. K.

1989-01-01

The performance degradation due to multiprogramming overhead is quantified for a parallel-processing machine. Measurements of real workloads were taken, and it was found that there is a moderate correlation between the completion time of a program and the amount of system overhead measured during program execution. Experiments in controlled environments were then conducted to calculate a lower bound on the performance degradation of parallel jobs caused by multiprogramming overhead. The results show that the multiprogramming overhead of parallel jobs consumes at least 4 percent of the processor time. When two or more serial jobs are introduced into the system, this amount increases to 5.3 percent
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2011-09-06

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

PubMed

O'Connor, B P

2000-08-01

Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.
Message Passing and Shared Address Space Parallelism on an SMP Cluster

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

NASA Technical Reports Server (NTRS)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

The revised solar array synthesis computer program

NASA Technical Reports Server (NTRS)

1970-01-01

The Revised Solar Array Synthesis Computer Program is described. It is a general-purpose program which computes solar array output characteristics while accounting for the effects of temperature, incidence angle, charged-particle irradiation, and other degradation effects on various solar array configurations in either circular or elliptical orbits. Array configurations may consist of up to 75 solar cell panels arranged in any series-parallel combination not exceeding three series-connected panels in a parallel string and no more than 25 parallel strings in an array. Up to 100 separate solar array current-voltage characteristics, corresponding to 100 equal-time increments during the sunlight illuminated portion of an orbit or any 100 user-specified combinations of incidence angle and temperature, can be computed and printed out during one complete computer execution. Individual panel incidence angles may be computed and printed out at the user's option.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
Parallel computation and the Basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1992-12-16

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to-use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communication costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis and Parallelmore » Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Orthorectification by Using Gpgpu Method

NASA Astrophysics Data System (ADS)

Sahin, H.; Kulur, S.

2012-07-01

Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Three pillars for achieving quantum mechanical molecular dynamics simulations of huge systems: Divide-and-conquer, density-functional tight-binding, and massively parallel computation.

PubMed

Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi

2016-08-05

The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Automated Performance Prediction of Message-Passing Parallel Programs

NASA Technical Reports Server (NTRS)

Block, Robert J.; Sarukkai, Sekhar; Mehra, Pankaj; Woodrow, Thomas S. (Technical Monitor)

1995-01-01

The increasing use of massively parallel supercomputers to solve large-scale scientific problems has generated a need for tools that can predict scalability trends of applications written for these machines. Much work has been done to create simple models that represent important characteristics of parallel programs, such as latency, network contention, and communication volume. But many of these methods still require substantial manual effort to represent an application in the model's format. The NIK toolkit described in this paper is the result of an on-going effort to automate the formation of analytic expressions of program execution time, with a minimum of programmer assistance. In this paper we demonstrate the feasibility of our approach, by extending previous work to detect and model communication patterns automatically, with and without overlapped computations. The predictions derived from these models agree, within reasonable limits, with execution times of programs measured on the Intel iPSC/860 and Paragon. Further, we demonstrate the use of MK in selecting optimal computational grain size and studying various scalability metrics.
Connectionist Models and Parallelism in High Level Vision.

DTIC Science & Technology

1985-01-01

GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These
Transient Finite Element Computations on a Variable Transputer System

NASA Technical Reports Server (NTRS)

Smolinski, Patrick J.; Lapczyk, Ireneusz

1993-01-01

A parallel program to analyze transient finite element problems was written and implemented on a system of transputer processors. The program uses the explicit time integration algorithm which eliminates the need for equation solving, making it more suitable for parallel computations. An interprocessor communication scheme was developed for arbitrary two dimensional grid processor configurations. Several 3-D problems were analyzed on a system with a small number of processors.
Scheduling for Locality in Shared-Memory Multiprocessors

DTIC Science & Technology

1993-05-01

Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
An Empirical Development of Parallelization Guidelines for Time-Driven Simulation

DTIC Science & Technology

1989-12-01

wives, who though not Cub fans, put on a good show during our trip, to waich some games . I would also like to recognize the help of my professors at...program parallelization. in this research effort a Ballistic Missile Defense (BMD) time driven simulation program, developed by DESE Research and...continuously, or continuously with discrete changes superimposed. The distinguishing feature of these simulations is the interaction between discretely
Force user's manual, revised

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

1987-01-01

A methodology for writing parallel programs for shared memory multiprocessors has been formalized as an extension to the Fortran language and implemented as a macro preprocessor. The extended language is known as the Force, and this manual describes how to write Force programs and execute them on the Flexible Computer Corporation Flex/32, the Encore Multimax and the Sequent Balance computers. The parallel extension macros are described in detail, but knowledge of Fortran is assumed.
Parallel Programming Paradigms

DTIC Science & Technology

1987-07-01

Unclassified IS.. DECLASSIFICATIONIOOWNGRADIN G 16. DISTRIBUTION STATEMENT (of this Report) Distribution of this report is unlimited. 17...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O . G 1 49 II Parallel Programming Paradigms...processors -. "to fetch from the same memory cell (list head) and thus seems to favor a shared memory - g implementation [37). In this dissertation, we
Dynamic programming in parallel boundary detection with application to ultrasound intima-media segmentation.

PubMed

Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin

2013-12-01

Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency. Copyright © 2013 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Shipman, Galen M.

These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematicmore » approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.« less
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

NASA Astrophysics Data System (ADS)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)
A model for gravity-wave spectra observed by Doppler sounding systems

NASA Technical Reports Server (NTRS)

Vanzandt, T. E.

1986-01-01

A model for Mesosphere - Stratosphere - Troposphere (MST) radar spectra is developed following the formalism presented by Pinkel (1981). Expressions for the one-dimensional spectra of radial velocity versus frequency and versus radial wave number are presented. Their dependence on the parameters of the gravity-wave spectrum and on the experimental parameters, radar zenith angle and averaging time are described and the conditions for critical tests of the gravity-wave hypothesis are discussed. The model spectra is compared with spectra observed in the Arctic summer mesosphere by the Poker Flat radar. This model applies to any monostatic Doppler sounding system, including MST radar, Doppler lidar and Doppler sonar in the atmosphere, and Doppler sonar in the ocean.
Gambling with stimulus payments: feeding gaming machines with federal dollars.

PubMed

Lye, Jenny; Hirschberg, Joe

2014-09-01

In late 2008 and early 2009 the Australian Federal Government introduced a series of economic stimulus packages designed to maintain consumer spending in the early days of the Great Recession. When these packages were initiated the media suggested that the wide-spread availability of electronic gaming machines (EGMs, e.g. slot machines, poker machines, video lottery terminals) in Australia would result in stimulating the EGMs. Using state level monthly data we estimate that the stimulus packages led to an increase of 26 % in EGM revenues. This also resulted in over $60 million in additional tax revenue for State Governments. We also estimate a short-run aggregate income demand elasticity for EGMs to be approximately 2.
Usage of cornea and sclera back reflected images captured in security cameras for forensic and card games applications

NASA Astrophysics Data System (ADS)

Zalevsky, Zeev; Ilovitsh, Asaf; Beiderman, Yevgeny

2013-10-01

We present an approach allowing seeing objects that are hidden and that are not positioned in direct line of sight with security inspection cameras. The approach is based on inspecting the back reflections obtained from the cornea and the sclera of the eyes of people attending the inspected scene and which are positioned in front of the hidden objects we aim to image after performing proper calibration with point light source (e.g. a LED). The scene can be a forensic scene or for instance a casino in which the application is to see the cards of poker players seating in front of you.
The spatial-temporal ambiguity in auroral modeling

NASA Technical Reports Server (NTRS)

Rees, M. H.; Roble, R. G.; Kopp, J.; Abreu, V. J.; Rusch, D. W.; Brace, L. H.; Brinton, H. C.; Hoffman, R. A.; Heelis, R. A.; Kayser, D. C.

1980-01-01

The paper examines the time-dependent models of the aurora which show that various ionospheric parameters respond to the onset of auroral ionization with different time histories. A pass of the Atmosphere Explorer C satellite over Poker Flat, Alaska, and ground based photometric and photographic observations have been used to resolve the time-space ambiguity of a specific auroral event. The density of the O(+), NO(+), O2(+), and N2(+) ions, the electron density, and the electron temperature observed at 280 km altitude in a 50 km wide segment of an auroral arc are predicted by the model if particle precipitation into the region commenced about 11 min prior to the overpass.
Use Computer-Aided Tools to Parallelize Large CFD Applications

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Yan, J.

2000-01-01

Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.

Northeast Artificial Intelligence Consortium Annual Report - 1988 Parallel Vision. Volume 9

DTIC Science & Technology

1989-10-01

supports the Northeast Aritificial Intelligence Consortium (NAIC). Volume 9 Parallel Vision Report submitted by Christopher M. Brown Randal C. Nelson...NORTHEAST ARTIFICIAL INTELLIGENCE CONSORTIUM ANNUAL REPORT - 1988 Parallel Vision Syracuse University Christopher M. Brown and Randal C. Nelson...Technical Director Directorate of Intelligence & Reconnaissance FOR THE COMMANDER: IGOR G. PLONISCH Directorate of Plans & Programs If your address has
High Performance Input/Output for Parallel Computer Systems

NASA Technical Reports Server (NTRS)

Ligon, W. B.

1996-01-01

The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.
The Research of the Parallel Computing Development from the Angle of Cloud Computing

NASA Astrophysics Data System (ADS)

Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

2017-10-01

Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
A real-time MPEG software decoder using a portable message-passing library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan

1995-12-31

We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-01-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-09-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Support of Multidimensional Parallelism in the OpenMP Programming Model

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele

2003-01-01

OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dritz, K.W.; Boyle, J.M.

This paper addresses the problem of measuring and analyzing the performance of fine-grained parallel programs running on shared-memory multiprocessors. Such processors use locking (either directly in the application program, or indirectly in a subroutine library or the operating system) to serialize accesses to global variables. Given sufficiently high rates of locking, the chief factor preventing linear speedup (besides lack of adequate inherent parallelism in the application) is lock contention - the blocking of processes that are trying to acquire a lock currently held by another process. We show how a high-resolution, low-overhead clock may be used to measure both lockmore » contention and lack of parallel work. Several ways of presenting the results are covered, culminating in a method for calculating, in a single multiprocessing run, both the speedup actually achieved and the speedup lost to contention for each lock and to lack of parallel work. The speedup losses are reported in the same units, ''processor-equivalents,'' as the speedup achieved. Both are obtained without having to perform the usual one-process comparison run. We chronicle also a variety of experiments motivated by actual results obtained with our measurement method. The insights into program performance that we gained from these experiments helped us to refine the parts of our programs concerned with communication and synchronization. Ultimately these improvements reduced lock contention to a negligible amount and yielded nearly linear speedup in applications not limited by lack of parallel work. We describe two generally applicable strategies (''code motion out of critical regions'' and ''critical-region fissioning'') for reducing lock contention and one (''lock/variable fusion'') applicable only on certain architectures.« less
ICASE Computer Science Program

NASA Technical Reports Server (NTRS)

1985-01-01

The Institute for Computer Applications in Science and Engineering computer science program is discussed in outline form. Information is given on such topics as problem decomposition, algorithm development, programming languages, and parallel architectures.
The neural basis of parallel saccade programming: an fMRI study.

PubMed

Hu, Yanbo; Walker, Robin

2011-11-01

The neural basis of parallel saccade programming was examined in an event-related fMRI study using a variation of the double-step saccade paradigm. Two double-step conditions were used: one enabled the second saccade to be partially programmed in parallel with the first saccade while in a second condition both saccades had to be prepared serially. The intersaccadic interval, observed in the parallel programming (PP) condition, was significantly reduced compared with latency in the serial programming (SP) condition and also to the latency of single saccades in control conditions. The fMRI analysis revealed greater activity (BOLD response) in the frontal and parietal eye fields for the PP condition compared with the SP double-step condition and when compared with the single-saccade control conditions. By contrast, activity in the supplementary eye fields was greater for the double-step condition than the single-step condition but did not distinguish between the PP and SP requirements. The role of the frontal eye fields in PP may be related to the advanced temporal preparation and increased salience of the second saccade goal that may mediate activity in other downstream structures, such as the superior colliculus. The parietal lobes may be involved in the preparation for spatial remapping, which is required in double-step conditions. The supplementary eye fields appear to have a more general role in planning saccade sequences that may be related to error monitoring and the control over the execution of the correct sequence of responses.
Supercomputing '91; Proceedings of the 4th Annual Conference on High Performance Computing, Albuquerque, NM, Nov. 18-22, 1991

NASA Technical Reports Server (NTRS)

1991-01-01

Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

1994-01-01

An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project

NASA Technical Reports Server (NTRS)

Woo, Alex C.; Hill, Kueichien C.

1996-01-01

The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.
Parallel line analysis: multifunctional software for the biomedical sciences

NASA Technical Reports Server (NTRS)

Swank, P. R.; Lewis, M. L.; Damron, K. L.; Morrison, D. R.

1990-01-01

An easy to use, interactive FORTRAN program for analyzing the results of parallel line assays is described. The program is menu driven and consists of five major components: data entry, data editing, manual analysis, manual plotting, and automatic analysis and plotting. Data can be entered from the terminal or from previously created data files. The data editing portion of the program is used to inspect and modify data and to statistically identify outliers. The manual analysis component is used to test the assumptions necessary for parallel line assays using analysis of covariance techniques and to determine potency ratios with confidence limits. The manual plotting component provides a graphic display of the data on the terminal screen or on a standard line printer. The automatic portion runs through multiple analyses without operator input. Data may be saved in a special file to expedite input at a future time.
Optimized and parallelized implementation of the electronegativity equalization method and the atom-bond electronegativity equalization method.

PubMed

Vareková, R Svobodová; Koca, J

2006-02-01

The most common way to calculate charge distribution in a molecule is ab initio quantum mechanics (QM). Some faster alternatives to QM have also been developed, the so-called "equalization methods" EEM and ABEEM, which are based on DFT. We have implemented and optimized the EEM and ABEEM methods and created the EEM SOLVER and ABEEM SOLVER programs. It has been found that the most time-consuming part of equalization methods is the reduction of the matrix belonging to the equation system generated by the method. Therefore, for both methods this part was replaced by the parallel algorithm WIRS and implemented within the PVM environment. The parallelized versions of the programs EEM SOLVER and ABEEM SOLVER showed promising results, especially on a single computer with several processors (compact PVM). The implemented programs are available through the Web page http://ncbr.chemi.muni.cz/~n19n/eem_abeem.
Automation of Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2001-01-01

The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
Developing Information Power Grid Based Algorithms and Software

NASA Technical Reports Server (NTRS)

Dongarra, Jack

1998-01-01

This exploratory study initiated our effort to understand performance modeling on parallel systems. The basic goal of performance modeling is to understand and predict the performance of a computer program or set of programs on a computer system. Performance modeling has numerous applications, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. Our work lays the basis for the construction of parallel libraries that allow for the reconstruction of application codes on several distinct architectures so as to assure performance portability. Following our strategy, once the requirements of applications are well understood, one can then construct a library in a layered fashion. The top level of this library will consist of architecture-independent geometric, numerical, and symbolic algorithms that are needed by the sample of applications. These routines should be written in a language that is portable across the targeted architectures.
Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Taylor, Arthur C., III

1994-01-01

This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
Memory-based frame synchronizer. [for digital communication systems

NASA Technical Reports Server (NTRS)

Stattel, R. J.; Niswander, J. K. (Inventor)

1981-01-01

A frame synchronizer for use in digital communications systems wherein data formats can be easily and dynamically changed is described. The use of memory array elements provide increased flexibility in format selection and sync word selection in addition to real time reconfiguration ability. The frame synchronizer comprises a serial-to-parallel converter which converts a serial input data stream to a constantly changing parallel data output. This parallel data output is supplied to programmable sync word recognizers each consisting of a multiplexer and a random access memory (RAM). The multiplexer is connected to both the parallel data output and an address bus which may be connected to a microprocessor or computer for purposes of programming the sync word recognizer. The RAM is used as an associative memory or decorder and is programmed to identify a specific sync word. Additional programmable RAMs are used as counter decoders to define word bit length, frame word length, and paragraph frame length.
Parallel algorithms for modeling flow in permeable media. Annual report, February 15, 1995 - February 14, 1996

DOE Office of Scientific and Technical Information (OSTI.GOV)

G.A. Pope; K. Sephernoori; D.C. McKinney

1996-03-15

This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less

Parallel/distributed direct method for solving linear systems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.
Method, systems, and computer program products for implementing function-parallel network firewall

DOEpatents

Fulp, Errin W [Winston-Salem, NC; Farley, Ryan J [Winston-Salem, NC

2011-10-11

Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.
A computer program for converting rectangular coordinates to latitude-longitude coordinates

USGS Publications Warehouse

Rutledge, A.T.

1989-01-01

A computer program was developed for converting the coordinates of any rectangular grid on a map to coordinates on a grid that is parallel to lines of equal latitude and longitude. Using this program in conjunction with groundwater flow models, the user can extract data and results from models with varying grid orientations and place these data into grid structure that is oriented parallel to lines of equal latitude and longitude. All cells in the rectangular grid must have equal dimensions, and all cells in the latitude-longitude grid measure one minute by one minute. This program is applicable if the map used shows lines of equal latitude as arcs and lines of equal longitude as straight lines and assumes that the Earth 's surface can be approximated as a sphere. The program user enters the row number , column number, and latitude and longitude of the midpoint of the cell for three test cells on the rectangular grid. The latitude and longitude of boundaries of the rectangular grid also are entered. By solving sets of simultaneous linear equations, the program calculates coefficients that are used for making the conversion. As an option in the program, the user may build a groundwater model file based on a grid that is parallel to lines of equal latitude and longitude. The program reads a data file based on the rectangular coordinates and automatically forms the new data file. (USGS)
A distributed version of the NASA Engine Performance Program

NASA Technical Reports Server (NTRS)

Cours, Jeffrey T.; Curlett, Brian P.

1993-01-01

Distributed NEPP, a version of the NASA Engine Performance Program, uses the original NEPP code but executes it in a distributed computer environment. Multiple workstations connected by a network increase the program's speed and, more importantly, the complexity of the cases it can handle in a reasonable time. Distributed NEPP uses the public domain software package, called Parallel Virtual Machine, allowing it to execute on clusters of machines containing many different architectures. It includes the capability to link with other computers, allowing them to process NEPP jobs in parallel. This paper discusses the design issues and granularity considerations that entered into programming Distributed NEPP and presents the results of timing runs.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC

NASA Astrophysics Data System (ADS)

Alruwaili, Manal

With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
Using the Parallel Computing Toolbox with MATLAB on the Peregrine System |

Science.gov Websites

parallel pool took %g seconds.\\n', toc) % "single program multiple data" spmd fprintf('Worker %d says Hello World!\\n', labindex) end delete(gcp); % close the parallel pool exit To run the script on a compute node, create the file helloWorld.sub: #!/bin/bash #PBS -l walltime=05:00 #PBS -l nodes=1 #PBS -N
Address tracing for parallel machines

NASA Technical Reports Server (NTRS)

Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

1991-01-01

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
By Hand or Not By-Hand: A Case Study of Alternative Approaches to Parallelize CFD Applications

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1997-01-01

While parallel processing promises to speed up applications by several orders of magnitude, the performance achieved still depends upon several factors, including the multiprocessor architecture, system software, data distribution and alignment, as well as the methods used for partitioning the application and mapping its components onto the architecture. The existence of the Gorden Bell Prize given out at Supercomputing every year suggests that while good performance can be attained for real applications on general purpose multiprocessors, the large investment in man-power and time still has to be repeated for each application-machine combination. As applications and machine architectures become more complex, the cost and time-delays for obtaining performance by hand will become prohibitive. Computer users today can turn to three possible avenues for help: parallel libraries, parallel languages and compilers, interactive parallelization tools. The success of these methodologies, in turn, depends on proper application of data dependency analysis, program structure recognition and transformation, performance prediction as well as exploitation of user supplied knowledge. NASA has been developing multidisciplinary applications on highly parallel architectures under the High Performance Computing and Communications Program. Over the past six years, the transition of underlying hardware and system software have forced the scientists to spend a large effort to migrate and recede their applications. Various attempts to exploit software tools to automate the parallelization process have not produced favorable results. In this paper, we report our most recent experience with CAPTOOL, a package developed at Greenwich University. We have chosen CAPTOOL for three reasons: 1. CAPTOOL accepts a FORTRAN 77 program as input. This suggests its potential applicability to a large collection of legacy codes currently in use. 2. CAPTOOL employs domain decomposition to obtain parallelism. Although the fact that not all kinds of parallelism are handled may seem unappealing, many NASA applications in computational aerosciences as well as earth and space sciences are amenable to domain decomposition. 3. CAPTOOL generates code for a large variety of environments employed across NASA centers: MPI/PVM on network of workstations to the IBS/SP2 and CRAY/T3D.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amadio, G.; et al.

An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less
Observed and modelled effects of auroral precipitation on the thermal ionospheric plasma: comparing the MICA and Cascades2 sounding rocket events

NASA Astrophysics Data System (ADS)

Lynch, K. A.; Gayetsky, L.; Fernandes, P. A.; Zettergren, M. D.; Lessard, M.; Cohen, I. J.; Hampton, D. L.; Ahrns, J.; Hysell, D. L.; Powell, S.; Miceli, R. J.; Moen, J. I.; Bekkeng, T.

2012-12-01

Auroral precipitation can modify the ionospheric thermal plasma through a variety of processes. We examine and compare the events seen by two recent auroral sounding rockets carrying in situ thermal plasma instrumentation. The Cascades2 sounding rocket (March 2009, Poker Flat Research Range) traversed a pre-midnight poleward boundary intensification (PBI) event distinguished by a stationary Alfvenic curtain of field-aligned precipitation. The MICA sounding rocket (February 2012, Poker Flat Research Range) traveled through irregular precipitation following the passage of a strong westward-travelling surge. Previous modelling of the ionospheric effects of auroral precipitation used a one-dimensional model, TRANSCAR, which had a simplified treatment of electric fields and did not have the benefit of in situ thermal plasma data. This new study uses a new two-dimensional model which self-consistently calculates electric fields to explore both spatial and temporal effects, and compares to thermal plasma observations. A rigorous understanding of the ambient thermal plasma parameters and their effects on the local spacecraft sheath and charging, is required for quantitative interpretation of in situ thermal plasma observations. To complement this TRANSCAR analysis we therefore require a reliable means of interpreting in situ thermal plasma observation. This interpretation depends upon a rigorous plasma sheath model since the ambient ion energy is on the order of the spacecraft's sheath energy. A self-consistent PIC model is used to model the spacecraft sheath, and a test-particle approach then predicts the detector response for a given plasma environment. The model parameters are then modified until agreement is found with the in situ data. We find that for some situations, the thermal plasma parameters are strongly driven by the precipitation at the observation time. For other situations, the previous history of the precipitation at that position can have a stronger effect.
An Event-Driven Classifier for Spiking Neural Networks Fed with Synthetic or Dynamic Vision Sensor Data.

PubMed

Stromatias, Evangelos; Soto, Miguel; Serrano-Gotarredona, Teresa; Linares-Barranco, Bernabé

2017-01-01

This paper introduces a novel methodology for training an event-driven classifier within a Spiking Neural Network (SNN) System capable of yielding good classification results when using both synthetic input data and real data captured from Dynamic Vision Sensor (DVS) chips. The proposed supervised method uses the spiking activity provided by an arbitrary topology of prior SNN layers to build histograms and train the classifier in the frame domain using the stochastic gradient descent algorithm. In addition, this approach can cope with leaky integrate-and-fire neuron models within the SNN, a desirable feature for real-world SNN applications, where neural activation must fade away after some time in the absence of inputs. Consequently, this way of building histograms captures the dynamics of spikes immediately before the classifier. We tested our method on the MNIST data set using different synthetic encodings and real DVS sensory data sets such as N-MNIST, MNIST-DVS, and Poker-DVS using the same network topology and feature maps. We demonstrate the effectiveness of our approach by achieving the highest classification accuracy reported on the N-MNIST (97.77%) and Poker-DVS (100%) real DVS data sets to date with a spiking convolutional network. Moreover, by using the proposed method we were able to retrain the output layer of a previously reported spiking neural network and increase its performance by 2%, suggesting that the proposed classifier can be used as the output layer in works where features are extracted using unsupervised spike-based learning methods. In addition, we also analyze SNN performance figures such as total event activity and network latencies, which are relevant for eventual hardware implementations. In summary, the paper aggregates unsupervised-trained SNNs with a supervised-trained SNN classifier, combining and applying them to heterogeneous sets of benchmarks, both synthetic and from real DVS chips.
Sounding Rocket Launches Successfully from Alaska

NASA Image and Video Library

2015-01-28

Caption: Time lapse photo of the NASA Oriole IV sounding rocket with Aural Spatial Structures Probe as an aurora dances over Alaska. All four stages of the rocket are visible in this image. Credit: NASA/Jamie Adkins More info: On count day number 15, the Aural Spatial Structures Probe, or ASSP, was successfully launched on a NASA Oriole IV sounding rocket at 5:41 a.m. EST on Jan. 28, 2015, from the Poker Flat Research Range in Alaska. Preliminary data show that all aspects of the payload worked as designed and the principal investigator Charles Swenson at Utah State University described the mission as a “raging success.” “This is likely the most complicated mission the sounding rocket program has ever undertaken and it was not easy by any stretch," said John Hickman, operations manager of the NASA sounding rocket program office at the Wallops Flight Facility, Virginia. "It was technically challenging every step of the way.” “The payload deployed all six sub-payloads in formation as planned and all appeared to function as planned. Quite an amazing feat to maneuver and align the main payload, maintain the proper attitude while deploying all six 7.3-pound sub payloads at about 40 meters per second," said Hickman. Read more: www.nasa.gov/content/assp-sounding-rocket-launches-succes... NASA image use policy. NASA Goddard Space Flight Center enables NASA’s mission through four scientific endeavors: Earth Science, Heliophysics, Solar System Exploration, and Astrophysics. Goddard plays a leading role in NASA’s accomplishments by contributing compelling scientific knowledge to advance the Agency’s mission. Follow us on Twitter Like us on Facebook Find us on Instagram
An autonomous receiver/digital signal processor applied to ground-based and rocket-borne wave experiments

NASA Astrophysics Data System (ADS)

Dombrowski, M. P.; LaBelle, J.; McGaw, D. G.; Broughton, M. C.

2016-07-01

The programmable combined receiver/digital signal processor platform presented in this article is designed for digital downsampling and processing of general waveform inputs with a 66 MHz initial sampling rate and multi-input synchronized sampling. Systems based on this platform are capable of fully autonomous low-power operation, can be programmed to preprocess and filter the data for preselection and reduction, and may output to a diverse array of transmission or telemetry media. We describe three versions of this system, one for deployment on sounding rockets and two for ground-based applications. The rocket system was flown on the Correlation of High-Frequency and Auroral Roar Measurements (CHARM)-II mission launched from Poker Flat Research Range, Alaska, in 2010. It measured auroral "roar" signals at 2.60 MHz. The ground-based systems have been deployed at Sondrestrom, Greenland, and South Pole Station, Antarctica. The Greenland system synchronously samples signals from three spaced antennas providing direction finding of 0-5 MHz waves. It has successfully measured auroral signals and man-made broadcast signals. The South Pole system synchronously samples signals from two crossed antennas, providing polarization information. It has successfully measured the polarization of auroral kilometric radiation-like signals as well as auroral hiss. Further systems are in development for future rocket missions and for installation in Antarctic Automatic Geophysical Observatories.
Cache Locality Optimization for Recursive Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lifflander, Jonathan; Krishnamoorthy, Sriram

We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. Wemore » present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.« less
A Comparison of Three Programming Models for Adaptive Applications

NASA Technical Reports Server (NTRS)

Shan, Hong-Zhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswa, Rupak; Kwak, Dochan (Technical Monitor)

2000-01-01

We study the performance and programming effort for two major classes of adaptive applications under three leading parallel programming models. We find that all three models can achieve scalable performance on the state-of-the-art multiprocessor machines. The basic parallel algorithms needed for different programming models to deliver their best performance are similar, but the implementations differ greatly, far beyond the fact of using explicit messages versus implicit loads/stores. Compared with MPI and SHMEM, CC-SAS (cache-coherent shared address space) provides substantial ease of programming at the conceptual and program orchestration level, which often leads to the performance gain. However it may also suffer from the poor spatial locality of physically distributed shared data on large number of processors. Our CC-SAS implementation of the PARMETIS partitioner itself runs faster than in the other two programming models, and generates more balanced result for our application.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

PubMed

Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Parallel-Processing Software for Correlating Stereo Images

NASA Technical Reports Server (NTRS)

Klimeck, Gerhard; Deen, Robert; Mcauley, Michael; DeJong, Eric

2007-01-01

A computer program implements parallel- processing algorithms for cor relating images of terrain acquired by stereoscopic pairs of digital stereo cameras on an exploratory robotic vehicle (e.g., a Mars rove r). Such correlations are used to create three-dimensional computatio nal models of the terrain for navigation. In this program, the scene viewed by the cameras is segmented into subimages. Each subimage is assigned to one of a number of central processing units (CPUs) opera ting simultaneously.
Optimization by nonhierarchical asynchronous decomposition

NASA Technical Reports Server (NTRS)

Shankar, Jayashree; Ribbens, Calvin J.; Haftka, Raphael T.; Watson, Layne T.

1992-01-01

Large scale optimization problems are tractable only if they are somehow decomposed. Hierarchical decompositions are inappropriate for some types of problems and do not parallelize well. Sobieszczanski-Sobieski has proposed a nonhierarchical decomposition strategy for nonlinear constrained optimization that is naturally parallel. Despite some successes on engineering problems, the algorithm as originally proposed fails on simple two dimensional quadratic programs. The algorithm is carefully analyzed for quadratic programs, and a number of modifications are suggested to improve its robustness.

Final Report: Correctness Tools for Petascale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mellor-Crummey, John

2014-10-27

In the course of developing parallel programs for leadership computing systems, subtle programming errors often arise that are extremely difficult to diagnose without tools. To meet this challenge, University of Maryland, the University of Wisconsin—Madison, and Rice University worked to develop lightweight tools to help code developers pinpoint a variety of program correctness errors that plague parallel scientific codes. The aim of this project was to develop software tools that help diagnose program errors including memory leaks, memory access errors, round-off errors, and data races. Research at Rice University focused on developing algorithms and data structures to support efficient monitoringmore » of multithreaded programs for memory access errors and data races. This is a final report about research and development work at Rice University as part of this project.« less
ng: What next-generation languages can teach us about HENP frameworks in the manycore era

NASA Astrophysics Data System (ADS)

Binet, Sébastien

2011-12-01

Current High Energy and Nuclear Physics (HENP) frameworks were written before multicore systems became widely deployed. A 'single-thread' execution model naturally emerged from that environment, however, this no longer fits into the processing model on the dawn of the manycore era. Although previous work focused on minimizing the changes to be applied to the LHC frameworks (because of the data taking phase) while still trying to reap the benefits of the parallel-enhanced CPU architectures, this paper explores what new languages could bring to the design of the next-generation frameworks. Parallel programming is still in an intensive phase of R&D and no silver bullet exists despite the 30+ years of literature on the subject. Yet, several parallel programming styles have emerged: actors, message passing, communicating sequential processes, task-based programming, data flow programming, ... to name a few. We present the work of the prototyping of a next-generation framework in new and expressive languages (python and Go) to investigate how code clarity and robustness are affected and what are the downsides of using languages younger than FORTRAN/C/C++.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

PubMed

Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

2004-09-09

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
A Programming Framework for Scientific Applications on CPU-GPU Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Owens, John

2013-03-24

At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiplemore » parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.« less
Transputer parallel processing at NASA Lewis Research Center

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1989-01-01

The transputer parallel processing lab at NASA Lewis Research Center (LeRC) consists of 69 processors (transputers) that can be connected into various networks for use in general purpose concurrent processing applications. The main goal of the lab is to develop concurrent scientific and engineering application programs that will take advantage of the computational speed increases available on a parallel processor over the traditional sequential processor. Current research involves the development of basic programming tools. These tools will help standardize program interfaces to specific hardware by providing a set of common libraries for applications programmers. The thrust of the current effort is in developing a set of tools for graphics rendering/animation. The applications programmer currently has two options for on-screen plotting. One option can be used for static graphics displays and the other can be used for animated motion. The option for static display involves the use of 2-D graphics primitives that can be called from within an application program. These routines perform the standard 2-D geometric graphics operations in real-coordinate space as well as allowing multiple windows on a single screen.
[Internet gambling: what are the risks?].

PubMed

Bonnaire, C

2012-02-01

Actually, there are many different and varied new ways to take part in gambling activities such as gambling via the Internet, mobile phone and interactive television. Among these media, the rise in Internet gambling activity has been very rapid. Nevertheless, few empirical studies have been carried out on the psychosocial effects of Internet gambling. While there is no conclusive evidence that Internet gambling is more likely than other gambling media to cause problem gambling, there are a number of factors that make online activities like Internet gambling potentially seductive and/or addictive. Such factors include anonymity, convenience, escape, dissociation/immersion, accessibility, event frequency, interactivity, disinhibition, simulation, and asociability. It would also appear that virtual environments have the potential to provide short-term comfort, excitement and/or distraction. The introduction of the Internet to gambling activities changes some of the fundamental situational and structural characteristics. The major change is that gambling activities are bought into the home and workplace environment. Thus, Internet gambling can become an in-house or work activity. One of the major concerns relating to those changes and the increase in gambling opportunities is the potential rise in the number of problem and pathological gamblers. Addictions always result from an interaction and interplay between many factors but in the case of gambling, it could be argued that technology and technological advance can themselves be an important contributory factor as we saw in examining the salient factors in Internet gambling. It is difficult to determine the prevalence of online (problem or not) gamblers, as it is obviously a figure that changes and has changed relatively quickly over the past decade. Nevertheless, the rate of Internet gambling is increasing and some recent studies using self-selected samples suggest, for example, that the prevalence of problem gambling among student Internet gamblers is relatively high for students who gamble on the Internet in general. Some recent studies have focused on the type of online games. For example, one specific form of online gambling online poker, is one of the fastest growing forms of online gambling. It appears that problem online poker players are more likely to swap genders when playing online, and play more frequently for longer periods of time. Thus, problem gamblers may be losing time but winning money. This result has a big implication for problem gambling criteria. Indeed, some data suggest that online poker may be producing a new type of problem gambler where the main negative consequence is loss of time (rather than loss of money). All these findings underline the need for better Internet gambling legislation. Indeed, the potential for excessive gambling and the lack of safeguards for vulnerable populations (e.g. adolescents and problem gamblers) raise the need for developing social responsibility tools. Harm-minimisation strategies are fundamental to facilitate gambling in a responsible manner, that is, to promote gambling within a player's means so they do not spent excessive time or money gambling, which cause the individual problems. Some research, but still few, examines the efficacy of responsible gambling strategies like pop-up messages. Copyright Â© 2011 L’Encéphale, Paris. Published by Elsevier Masson SAS. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chrisochoides, N.; Sukup, F.

In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less
Object-Oriented Implementation of the NAS Parallel Benchmarks using Charm++

NASA Technical Reports Server (NTRS)

Krishnan, Sanjeev; Bhandarkar, Milind; Kale, Laxmikant V.

1996-01-01

This report describes experiences with implementing the NAS Computational Fluid Dynamics benchmarks using a parallel object-oriented language, Charm++. Our main objective in implementing the NAS CFD kernel benchmarks was to develop a code that could be used to easily experiment with different domain decomposition strategies and dynamic load balancing. We also wished to leverage the object-orientation provided by the Charm++ parallel object-oriented language, to develop reusable abstractions that would simplify the process of developing parallel applications. We first describe the Charm++ parallel programming model and the parallel object array abstraction, then go into detail about each of the Scalar Pentadiagonal (SP) and Lower/Upper Triangular (LU) benchmarks, along with performance results. Finally we conclude with an evaluation of the methodology used.
Analysis of the study skills of undergraduate pharmacy students of the University of Zambia School of Medicine.

PubMed

Ezeala, Christian Chinyere; Siyanga, Nalucha

2015-01-01

It aimed to compare the study skills of two groups of undergraduate pharmacy students in the School of Medicine, University of Zambia using the Study Skills Assessment Questionnaire (SSAQ), with the goal of analysing students' study skills and identifying factors that affect study skills. A questionnaire was distributed to 67 participants from both programs using stratified random sampling. Completed questionnaires were rated according to participants study skill. The total scores and scores within subscales were analysed and compared quantitatively. Questionnaires were distributed to 37 students in the regular program, and to 30 students in the parallel program. The response rate was 100%. Students had moderate to good study skills: 22 respondents (32.8%) showed good study skills, while 45 respondents (67.2%) were found to have moderate study skills. Students in the parallel program demonstrated significantly better study skills (mean SSAQ score, 185.4±14.5), particularly in time management and writing, than the students in the regular program (mean SSAQ score 175±25.4; P<0.05). No significant differences were found according to age, gender, residential or marital status, or level of study. The students in the parallel program had better time management and writing skills, probably due to their prior work experience. The more intensive training to students in regular program is needed in improving time management and writing skills.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
High-energy physics software parallelization using database techniques

NASA Astrophysics Data System (ADS)

Argante, E.; van der Stok, P. D. V.; Willers, I.

1997-02-01

A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradimg, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI.
Development, Verification and Validation of Parallel, Scalable Volume of Fluid CFD Program for Propulsion Applications

NASA Technical Reports Server (NTRS)

West, Jeff; Yang, H. Q.

2014-01-01

There are many instances involving liquid/gas interfaces and their dynamics in the design of liquid engine powered rockets such as the Space Launch System (SLS). Some examples of these applications are: Propellant tank draining and slosh, subcritical condition injector analysis for gas generators, preburners and thrust chambers, water deluge mitigation for launch induced environments and even solid rocket motor liquid slag dynamics. Commercially available CFD programs simulating gas/liquid interfaces using the Volume of Fluid approach are currently limited in their parallel scalability. In 2010 for instance, an internal NASA/MSFC review of three commercial tools revealed that parallel scalability was seriously compromised at 8 cpus and no additional speedup was possible after 32 cpus. Other non-interface CFD applications at the time were demonstrating useful parallel scalability up to 4,096 processors or more. Based on this review, NASA/MSFC initiated an effort to implement a Volume of Fluid implementation within the unstructured mesh, pressure-based algorithm CFD program, Loci-STREAM. After verification was achieved by comparing results to the commercial CFD program CFD-Ace+, and validation by direct comparison with data, Loci-STREAM-VoF is now the production CFD tool for propellant slosh force and slosh damping rate simulations at NASA/MSFC. On these applications, good parallel scalability has been demonstrated for problems sizes of tens of millions of cells and thousands of cpu cores. Ongoing efforts are focused on the application of Loci-STREAM-VoF to predict the transient flow patterns of water on the SLS Mobile Launch Platform in order to support the phasing of water for launch environment mitigation so that vehicle determinantal effects are not realized.
mm_par2.0: An object-oriented molecular dynamics simulation program parallelized using a hierarchical scheme with MPI and OPENMP

NASA Astrophysics Data System (ADS)

Oh, Kwang Jin; Kang, Ji Hoon; Myung, Hun Joo

2012-02-01

We have revised a general purpose parallel molecular dynamics simulation program mm_par using the object-oriented programming. We parallelized the revised version using a hierarchical scheme in order to utilize more processors for a given system size. The benchmark result will be presented here. New version program summaryProgram title: mm_par2.0 Catalogue identifier: ADXP_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXP_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 2 390 858 No. of bytes in distributed program, including test data, etc.: 25 068 310 Distribution format: tar.gz Programming language: C++ Computer: Any system operated by Linux or Unix Operating system: Linux Classification: 7.7 External routines: We provide wrappers for FFTW [1], Intel MKL library [2] FFT routine, and Numerical recipes [3] FFT, random number generator, and eigenvalue solver routines, SPRNG [4] random number generator, Mersenne Twister [5] random number generator, space filling curve routine. Catalogue identifier of previous version: ADXP_v1_0 Journal reference of previous version: Comput. Phys. Comm. 174 (2006) 560 Does the new version supersede the previous version?: Yes Nature of problem: Structural, thermodynamic, and dynamical properties of fluids and solids from microscopic scales to mesoscopic scales. Solution method: Molecular dynamics simulation in NVE, NVT, and NPT ensemble, Langevin dynamics simulation, dissipative particle dynamics simulation. Reasons for new version: First, object-oriented programming has been used, which is known to be open for extension and closed for modification. It is also known to be better for maintenance. Second, version 1.0 was based on atom decomposition and domain decomposition scheme [6] for parallelization. However, atom decomposition is not popular due to its poor scalability. On the other hand, domain decomposition scheme is better for scalability. It still has a limitation in utilizing a large number of cores on recent petascale computers due to the requirement that the domain size is larger than the potential cutoff distance. To go beyond such a limitation, a hierarchical parallelization scheme has been adopted in this new version and implemented using MPI [7] and OPENMP [8]. Summary of revisions: (1) Object-oriented programming has been used. (2) A hierarchical parallelization scheme has been adopted. (3) SPME routine has been fully parallelized with parallel 3D FFT using volumetric decomposition scheme [9]. K.J.O. thanks Mr. Seung Min Lee for useful discussion on programming and debugging. Running time: Running time depends on system size and methods used. For test system containing a protein (PDB id: 5DHFR) with CHARMM22 force field [10] and 7023 TIP3P [11] waters in simulation box having dimension 62.23 Å×62.23 Å×62.23 Å, the benchmark results are given in Fig. 1. Here the potential cutoff distance was set to 12 Å and the switching function was applied from 10 Å for the force calculation in real space. For the SPME [12] calculation, K, K, and K were set to 64 and the interpolation order was set to 4. To do the fast Fourier transform, we used Intel MKL library. All bonds including hydrogen atoms were constrained using SHAKE/RATTLE algorithms [13,14]. The code was compiled using Intel compiler version 11.1 and mvapich2 version 1.5. Fig. 2 shows performance gains from using CUDA-enabled version [15] of mm_par for 5DHFR simulation in water on Intel Core2Quad 2.83 GHz and GeForce GTX 580. Even though mm_par2.0 is not ported yet for GPU, its performance data would be useful to expect mm_par2.0 performance on GPU. Timing results for 1000 MD steps. 1, 2, 4, and 8 in the figure mean the number of OPENMP threads. Timing results for 1000 MD steps from double precision simulation on CPU, single precision simulation on GPU, and double precision simulation on GPU.
Analysis and selection of optimal function implementations in massively parallel computer

DOEpatents

Archer, Charles Jens [Rochester, MN; Peters, Amanda [Rochester, MN; Ratterman, Joseph D [Rochester, MN

2011-05-31

An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.
The Portals 4.0 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2012-11-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
Automating FEA programming

NASA Technical Reports Server (NTRS)

Sharma, Naveen

1992-01-01

In this paper we briefly describe a combined symbolic and numeric approach for solving mathematical models on parallel computers. An experimental software system, PIER, is being developed in Common Lisp to synthesize computationally intensive and domain formulation dependent phases of finite element analysis (FEA) solution methods. Quantities for domain formulation like shape functions, element stiffness matrices, etc., are automatically derived using symbolic mathematical computations. The problem specific information and derived formulae are then used to generate (parallel) numerical code for FEA solution steps. A constructive approach to specify a numerical program design is taken. The code generator compiles application oriented input specifications into (parallel) FORTRAN77 routines with the help of built-in knowledge of the particular problem, numerical solution methods and the target computer.
Development for SSV on a parallel processing system (PARAGON)

NASA Astrophysics Data System (ADS)

Gothard, Benny M.; Allmen, Mark; Carroll, Michael J.; Rich, Dan

1995-12-01

A goal of the surrogate semi-autonomous vehicle (SSV) program is to have multiple vehicles navigate autonomously and cooperatively with other vehicles. This paper describes the process and tools used in porting UGV/SSV (unmanned ground vehicle) autonomous mobility and target recognition algorithms from a SISD (single instruction single data) processor architecture (i.e., a Sun SPARC workstation running C/UNIX) to a MIMD (multiple instruction multiple data) parallel processor architecture (i.e., PARAGON-a parallel set of i860 processors running C/UNIX). It discusses the gains in performance and the pitfalls of such a venture. It also examines the merits of this processor architecture (based on this conceptual prototyping effort) and programming paradigm to meet the final SSV demonstration requirements.
GASPRNG: GPU accelerated scalable parallel random number generator library

NASA Astrophysics Data System (ADS)

Gao, Shuang; Peterson, Gregory D.

2013-04-01

Graphics processors represent a promising technology for accelerating computational science applications. Many computational science applications require fast and scalable random number generation with good statistical properties, so they use the Scalable Parallel Random Number Generators library (SPRNG). We present the GPU Accelerated SPRNG library (GASPRNG) to accelerate SPRNG in GPU-based high performance computing systems. GASPRNG includes code for a host CPU and CUDA code for execution on NVIDIA graphics processing units (GPUs) along with a programming interface to support various usage models for pseudorandom numbers and computational science applications executing on the CPU, GPU, or both. This paper describes the implementation approach used to produce high performance and also describes how to use the programming interface. The programming interface allows a user to be able to use GASPRNG the same way as SPRNG on traditional serial or parallel computers as well as to develop tightly coupled programs executing primarily on the GPU. We also describe how to install GASPRNG and use it. To help illustrate linking with GASPRNG, various demonstration codes are included for the different usage models. GASPRNG on a single GPU shows up to 280x speedup over SPRNG on a single CPU core and is able to scale for larger systems in the same manner as SPRNG. Because GASPRNG generates identical streams of pseudorandom numbers as SPRNG, users can be confident about the quality of GASPRNG for scalable computational science applications. Catalogue identifier: AEOI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOI_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: UTK license. No. of lines in distributed program, including test data, etc.: 167900 No. of bytes in distributed program, including test data, etc.: 1422058 Distribution format: tar.gz Programming language: C and CUDA. Computer: Any PC or workstation with NVIDIA GPU (Tested on Fermi GTX480, Tesla C1060, Tesla M2070). Operating system: Linux with CUDA version 4.0 or later. Should also run on MacOS, Windows, or UNIX. Has the code been vectorized or parallelized?: Yes. Parallelized using MPI directives. RAM: 512 MB˜ 732 MB (main memory on host CPU, depending on the data type of random numbers.) / 512 MB (GPU global memory) Classification: 4.13, 6.5. Nature of problem: Many computational science applications are able to consume large numbers of random numbers. For example, Monte Carlo simulations are able to consume limitless random numbers for the computation as long as resources for the computing are supported. Moreover, parallel computational science applications require independent streams of random numbers to attain statistically significant results. The SPRNG library provides this capability, but at a significant computational cost. The GASPRNG library presented here accelerates the generators of independent streams of random numbers using graphical processing units (GPUs). Solution method: Multiple copies of random number generators in GPUs allow a computational science application to consume large numbers of random numbers from independent, parallel streams. GASPRNG is a random number generators library to allow a computational science application to employ multiple copies of random number generators to boost performance. Users can interface GASPRNG with software code executing on microprocessors and/or GPUs. Running time: The tests provided take a few minutes to run.
Fenix, A Fault Tolerant Programming Framework for MPI Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gamel, Marc; Teranihi, Keita; Valenzuela, Eric

2016-10-05

Fenix provides APIs to allow the users to add fault tolerance capability to MPI-based parallel programs in a transparent manner. Fenix-enabled programs can run through process failures during program execution using a pool of spare processes accommodated by Fenix.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2014-10-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

GPU COMPUTING FOR PARTICLE TRACKING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nishimura, Hiroshi; Song, Kai; Muriki, Krishna

2011-03-25

This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

NASA Astrophysics Data System (ADS)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Large-scale parallel lattice Boltzmann-cellular automaton model of two-dimensional dendritic growth

NASA Astrophysics Data System (ADS)

Jelinek, Bohumir; Eshraghi, Mohsen; Felicelli, Sergio; Peters, John F.

2014-03-01

An extremely scalable lattice Boltzmann (LB)-cellular automaton (CA) model for simulations of two-dimensional (2D) dendritic solidification under forced convection is presented. The model incorporates effects of phase change, solute diffusion, melt convection, and heat transport. The LB model represents the diffusion, convection, and heat transfer phenomena. The dendrite growth is driven by a difference between actual and equilibrium liquid composition at the solid-liquid interface. The CA technique is deployed to track the new interface cells. The computer program was parallelized using the Message Passing Interface (MPI) technique. Parallel scaling of the algorithm was studied and major scalability bottlenecks were identified. Efficiency loss attributable to the high memory bandwidth requirement of the algorithm was observed when using multiple cores per processor. Parallel writing of the output variables of interest was implemented in the binary Hierarchical Data Format 5 (HDF5) to improve the output performance, and to simplify visualization. Calculations were carried out in single precision arithmetic without significant loss in accuracy, resulting in 50% reduction of memory and computational time requirements. The presented solidification model shows a very good scalability up to centimeter size domains, including more than ten million of dendrites. Catalogue identifier: AEQZ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQZ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, UK Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 29,767 No. of bytes in distributed program, including test data, etc.: 3131,367 Distribution format: tar.gz Programming language: Fortran 90. Computer: Linux PC and clusters. Operating system: Linux. Has the code been vectorized or parallelized?: Yes. Program is parallelized using MPI. Number of processors used: 1-50,000 RAM: Memory requirements depend on the grid size Classification: 6.5, 7.7. External routines: MPI (http://www.mcs.anl.gov/research/projects/mpi/), HDF5 (http://www.hdfgroup.org/HDF5/) Nature of problem: Dendritic growth in undercooled Al-3 wt% Cu alloy melt under forced convection. Solution method: The lattice Boltzmann model solves the diffusion, convection, and heat transfer phenomena. The cellular automaton technique is deployed to track the solid/liquid interface. Restrictions: Heat transfer is calculated uncoupled from the fluid flow. Thermal diffusivity is constant. Unusual features: Novel technique, utilizing periodic duplication of a pre-grown “incubation” domain, is applied for the scaleup test. Running time: Running time varies from minutes to days depending on the domain size and number of computational cores.
A numerical differentiation library exploiting parallel architectures

NASA Astrophysics Data System (ADS)

Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

2009-08-01

We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

NASA Astrophysics Data System (ADS)

Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

2016-04-01

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and Diamantaras, K.: 'Programming and architecture of parallel processing systems', 1st Edition, Eds. Kleidarithmos, 2011 [4] NVIDIA.: 'NVidia CUDA C Programming Guide', version 5.0, NVidia (reference book) [5] Konstantaras, A.: 'Classification of Distinct Seismic Regions and Regional Temporal Modelling of Seismicity in the Vicinity of the Hellenic Seismic Arc', IEEE Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6 (4), pp. 1857-1863, 2013 [6] Konstantaras, A. Varley, M.R.,. Valianatos, F., Collins, G. and Holifield, P.: 'Recognition of electric earthquake precursors using neuro-fuzzy models: methodology and simulation results', Proc. IASTED International Conference on Signal Processing Pattern Recognition and Applications (SPPRA 2002), Crete, Greece, 2002, pp 303-308, 2002 [7] Konstantaras, A., Katsifarakis, E., Maravelakis, E., Skounakis, E., Kokkinos, E. and Karapidakis, E.: 'Intelligent Spatial-Clustering of Seismicity in the Vicinity of the Hellenic Seismic Arc', Earth Science Research, vol. 1 (2), pp. 1-10, 2012 [8] Georgoulas, G., Konstantaras, A., Katsifarakis, E., Stylios, C.D., Maravelakis, E. and Vachtsevanos, G.: '"Seismic-Mass" Density-based Algorithm for Spatio-Temporal Clustering', Expert Systems with Applications, vol. 40 (10), pp. 4183-4189, 2013 [9] Konstantaras, A. J.: 'Expert knowledge-based algorithm for the dynamic discrimination of interactive natural clusters', Earth Science Informatics, 2015 (In Press, see: www.scopus.com) [10] Drakatos, G. and Latoussakis, J.: 'A catalog of aftershock sequences in Greece (1971-1997): Their spatial and temporal characteristics', Journal of Seismology, vol. 5, pp. 137-145, 2001
Array distribution in data-parallel programs

NASA Technical Reports Server (NTRS)

Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Sheffler, Thomas J.

1994-01-01

We consider distribution at compile time of the array data in a distributed-memory implementation of a data-parallel program written in a language like Fortran 90. We allow dynamic redistribution of data and define a heuristic algorithmic framework that chooses distribution parameters to minimize an estimate of program completion time. We represent the program as an alignment-distribution graph. We propose a divide-and-conquer algorithm for distribution that initially assigns a common distribution to each node of the graph and successively refines this assignment, taking computation, realignment, and redistribution costs into account. We explain how to estimate the effect of distribution on computation cost and how to choose a candidate set of distributions. We present the results of an implementation of our algorithms on several test problems.
MIST: An Open Source Environmental Modelling Programming Language Incorporating Easy to Use Data Parallelism.

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2014-05-01

Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.
Parallel design patterns for a low-power, software-defined compressed video encoder

NASA Astrophysics Data System (ADS)

Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar

2011-06-01

Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
A Model for Speedup of Parallel Programs

DTIC Science & Technology

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
ParallelStructure: A R Package to Distribute Parallel Runs of the Population Genetics Program STRUCTURE on Multi-Core Computers

PubMed Central

Besnier, Francois; Glover, Kevin A.

2013-01-01

This software package provides an R-based framework to make use of multi-core computers when running analyses in the population genetics program STRUCTURE. It is especially addressed to those users of STRUCTURE dealing with numerous and repeated data analyses, and who could take advantage of an efficient script to automatically distribute STRUCTURE jobs among multiple processors. It also consists of additional functions to divide analyses among combinations of populations within a single data set without the need to manually produce multiple projects, as it is currently the case in STRUCTURE. The package consists of two main functions: MPI_structure() and parallel_structure() as well as an example data file. We compared the performance in computing time for this example data on two computer architectures and showed that the use of the present functions can result in several-fold improvements in terms of computation time. ParallelStructure is freely available at https://r-forge.r-project.org/projects/parallstructure/. PMID:23923012
Performance of a parallel code for the Euler equations on hypercube computers

NASA Technical Reports Server (NTRS)

Barszcz, Eric; Chan, Tony F.; Jesperson, Dennis C.; Tuminaro, Raymond S.

1990-01-01

The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made.
Automating the parallel processing of fluid and structural dynamics calculations

NASA Technical Reports Server (NTRS)

Arpasi, Dale J.; Cole, Gary L.

1987-01-01

The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.
Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

PubMed

Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

2014-09-09

A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.
Parallel processing for scientific computations

NASA Technical Reports Server (NTRS)

Alkhatib, Hasan S.

1991-01-01

The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.
Parallel Signal Processing and System Simulation using aCe

NASA Technical Reports Server (NTRS)

Dorband, John E.; Aburdene, Maurice F.

2003-01-01

Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Parallel computation using boundary elements in solid mechanics

NASA Technical Reports Server (NTRS)

Chien, L. S.; Sun, C. T.

1990-01-01

The inherent parallelism of the boundary element method is shown. The boundary element is formulated by assuming the linear variation of displacements and tractions within a line element. Moreover, MACSYMA symbolic program is employed to obtain the analytical results for influence coefficients. Three computational components are parallelized in this method to show the speedup and efficiency in computation. The global coefficient matrix is first formed concurrently. Then, the parallel Gaussian elimination solution scheme is applied to solve the resulting system of equations. Finally, and more importantly, the domain solutions of a given boundary value problem are calculated simultaneously. The linear speedups and high efficiencies are shown for solving a demonstrated problem on Sequent Symmetry S81 parallel computing system.
Managing Algorithmic Skeleton Nesting Requirements in Realistic Image Processing Applications: The Case of the SKiPPER-II Parallel Programming Environment's Operating Model

NASA Astrophysics Data System (ADS)

Coudarcher, Rémi; Duculty, Florent; Serot, Jocelyn; Jurie, Frédéric; Derutin, Jean-Pierre; Dhome, Michel

2005-12-01

SKiPPER is a SKeleton-based Parallel Programming EnviRonment being developed since 1996 and running at LASMEA Laboratory, the Blaise-Pascal University, France. The main goal of the project was to demonstrate the applicability of skeleton-based parallel programming techniques to the fast prototyping of reactive vision applications. This paper deals with the special features embedded in the latest version of the project: algorithmic skeleton nesting capabilities and a fully dynamic operating model. Throughout the case study of a complete and realistic image processing application, in which we have pointed out the requirement for skeleton nesting, we are presenting the operating model of this feature. The work described here is one of the few reported experiments showing the application of skeleton nesting facilities for the parallelisation of a realistic application, especially in the area of image processing. The image processing application we have chosen is a 3D face-tracking algorithm from appearance.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

PubMed Central

Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Owen, Jeffrey E.

1988-01-01

A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.

HeNCE: A Heterogeneous Network Computing Environment

DOE PAGES

Beguelin, Adam; Dongarra, Jack J.; Geist, George Al; ...

1994-01-01

Network computing seeks to utilize the aggregate resources of many networked computers to solve a single problem. In so doing it is often possible to obtain supercomputer performance from an inexpensive local area network. The drawback is that network computing is complicated and error prone when done by hand, especially if the computers have different operating systems and data formats and are thus heterogeneous. The heterogeneous network computing environment (HeNCE) is an integrated graphical environment for creating and running parallel programs over a heterogeneous collection of computers. It is built on a lower level package called parallel virtual machine (PVM).more » The HeNCE philosophy of parallel programming is to have the programmer graphically specify the parallelism of a computation and to automate, as much as possible, the tasks of writing, compiling, executing, debugging, and tracing the network computation. Key to HeNCE is a graphical language based on directed graphs that describe the parallelism and data dependencies of an application. Nodes in the graphs represent conventional Fortran or C subroutines and the arcs represent data and control flow. This article describes the present state of HeNCE, its capabilities, limitations, and areas of future research.« less
The AIS-5000 parallel processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schmitt, L.A.; Wilson, S.S.

1988-05-01

The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In thismore » paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.« less
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei

2007-01-01

We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.
Parallel computing for probabilistic fatigue analysis

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

1993-01-01

This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Distributed computing feasibility in a non-dedicated homogeneous distributed system

NASA Technical Reports Server (NTRS)

Leutenegger, Scott T.; Sun, Xian-He

1993-01-01

The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

NASA Technical Reports Server (NTRS)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Nemesis I: Parallel Enhancements to ExodusII

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hennigan, Gary L.; John, Matthew S.; Shadid, John N.

2006-03-28

NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on files which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.
High Performance Fortran for Aerospace Applications

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Zima, Hans; Bushnell, Dennis M. (Technical Monitor)

2000-01-01

This paper focuses on the use of High Performance Fortran (HPF) for important classes of algorithms employed in aerospace applications. HPF is a set of Fortran extensions designed to provide users with a high-level interface for programming data parallel scientific applications, while delegating to the compiler/runtime system the task of generating explicitly parallel message-passing programs. We begin by providing a short overview of the HPF language. This is followed by a detailed discussion of the efficient use of HPF for applications involving multiple structured grids such as multiblock and adaptive mesh refinement (AMR) codes as well as unstructured grid codes. We focus on the data structures and computational structures used in these codes and on the high-level strategies that can be expressed in HPF to optimally exploit the parallelism in these algorithms.
The portals 4.0.1 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2013-04-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
Merlin - Massively parallel heterogeneous computing

NASA Technical Reports Server (NTRS)

Wittie, Larry; Maples, Creve

1989-01-01

Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Exploiting parallel computing with limited program changes using a network of microcomputers

NASA Technical Reports Server (NTRS)

Rogers, J. L., Jr.; Sobieszczanski-Sobieski, J.

1985-01-01

Network computing and multiprocessor computers are two discernible trends in parallel processing. The computational behavior of an iterative distributed process in which some subtasks are completed later than others because of an imbalance in computational requirements is of significant interest. The effects of asynchronus processing was studied. A small existing program was converted to perform finite element analysis by distributing substructure analysis over a network of four Apple IIe microcomputers connected to a shared disk, simulating a parallel computer. The substructure analysis uses an iterative, fully stressed, structural resizing procedure. A framework of beams divided into three substructures is used as the finite element model. The effects of asynchronous processing on the convergence of the design variables are determined by not resizing particular substructures on various iterations.
Telemetry downlink interfaces and level-zero processing

NASA Technical Reports Server (NTRS)

Horan, S.; Pfeiffer, J.; Taylor, J.

1991-01-01

The technical areas being investigated are as follows: (1) processing of space to ground data frames; (2) parallel architecture performance studies; and (3) parallel programming techniques. Additionally, the University administrative details and the technical liaison between New Mexico State University and Goddard Space Flight Center are addressed.
Electronics Book II.

ERIC Educational Resources Information Center

Johnson, Dennis; And Others

This manual, the second of three curriculum guides for an electronics course, is intended for use in a program combining vocational English as a second language (VESL) with bilingual vocational education. Ten units cover the electrical team, Ohm's law, Watt's law, series resistive circuits, parallel resistive circuits, series parallel circuits,…
Parallel Education and Defining the Fourth Sector.

ERIC Educational Resources Information Center

Chessell, Diana

1996-01-01

Parallel to the primary, secondary, postsecondary, and adult/community education sectors is education not associated with formal programs--learning in arts and cultural sites. The emergence of cultural and educational tourism is an opportunity for adult/community education to define itself by extending lifelong learning opportunities into parallel…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Strout, Michelle

Programming parallel machines is fraught with difficulties: the obfuscation of algorithms due to implementation details such as communication and synchronization, the need for transparency between language constructs and performance, the difficulty of performing program analysis to enable automatic parallelization techniques, and the existence of important "dusty deck" codes. The SAIMI project developed abstractions that enable the orthogonal specification of algorithms and implementation details within the context of existing DOE applications. The main idea is to enable the injection of small programming models such as expressions involving transcendental functions, polyhedral iteration spaces with sparse constraints, and task graphs into full programsmore » through the use of pragmas. These smaller, more restricted programming models enable orthogonal specification of many implementation details such as how to map the computation on to parallel processors, how to schedule the computation, and how to allocation storage for the computation. At the same time, these small programming models enable the expression of the most computationally intense and communication heavy portions in many scientific simulations. The ability to orthogonally manipulate the implementation for such computations will significantly ease performance programming efforts and expose transformation possibilities and parameter to automated approaches such as autotuning. At Colorado State University, the SAIMI project was supported through DOE grant DE-SC3956 from April 2010 through August 2015. The SAIMI project has contributed a number of important results to programming abstractions that enable the orthogonal specification of implementation details in scientific codes. This final report summarizes the research that was funded by the SAIMI project.« less
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Performance Metrics for Monitoring Parallel Program Executions

NASA Technical Reports Server (NTRS)

Sarukkai, Sekkar R.; Gotwais, Jacob K.; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)

1994-01-01

Existing tools for debugging performance of parallel programs either provide graphical representations of program execution or profiles of program executions. However, for performance debugging tools to be useful, such information has to be augmented with information that highlights the cause of poor program performance. Identifying the cause of poor performance necessitates the need for not only determining the significance of various performance problems on the execution time of the program, but also needs to consider the effect of interprocessor communications of individual source level data structures. In this paper, we present a suite of normalized indices which provide a convenient mechanism for focusing on a region of code with poor performance and highlights the cause of the problem in terms of processors, procedures and data structure interactions. All the indices are generated from trace files augmented with data structure information.. Further, we show with the help of examples from the NAS benchmark suite that the indices help in detecting potential cause of poor performance, based on augmented execution traces obtained by monitoring the program.
Concurrent simulation of a parallel jaw end effector

NASA Technical Reports Server (NTRS)

Bynum, Bill

1985-01-01

A system of programs developed to aid in the design and development of the command/response protocol between a parallel jaw end effector and the strategic planner program controlling it are presented. The system executes concurrently with the LISP controlling program to generate a graphical image of the end effector that moves in approximately real time in response to commands sent from the controlling program. Concurrent execution of the simulation program is useful for revealing flaws in the communication command structure arising from the asynchronous nature of the message traffic between the end effector and the strategic planner. Software simulation helps to minimize the number of hardware changes necessary to the microprocessor driving the end effector because of changes in the communication protocol. The simulation of other actuator devices can be easily incorporated into the system of programs by using the underlying support that was developed for the concurrent execution of the simulation process and the communication between it and the controlling program.
Parallel program debugging with flowback analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Choi, Jongdeok.

1989-01-01

This thesis describes the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors. The goal of the debugging system is to present to the programmer a graphical view of the dynamic program dependences while keeping the execution-time overhead low. The author first describes the use of flowback analysis to provide information on causal relationship between events in a programs' execution without re-executing the program for debugging. Execution time overhead is kept low by recording only a small amount of trace during a program's execution. He uses semantic analysis and a technique called incrementalmore » tracing to keep the time and space overhead low. As part of the semantic analysis, he uses a static program dependence graph structure that reduces the amount of work done at compile time and takes advantage of the dynamic information produced during execution time. The cornerstone of the incremental tracing concept is to generate a coarse trace during execution and fill incrementally, during the interactive portion of the debugging session, the gap between the information gathered in the coarse trace and the information needed to do the flowback analysis using the coarse trace. Then, he describes how to extend the flowback analysis to parallel programs. The flowback analysis can span process boundaries; i.e., the most recent modification to a shared variable might be traced to a different process than the one that contains the current reference. The static and dynamic program dependence graphs of the individual processes are tied together with synchronization and data dependence information to form complete graphs that represent the entire program.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.