Standard services for the capture, processing, and distribution of packetized telemetry data
NASA Technical Reports Server (NTRS)
Stallings, William H.
1989-01-01
Standard functional services for the capture, processing, and distribution of packetized data are discussed with particular reference to the future implementation of packet processing systems, such as those for the Space Station Freedom. The major functions are listed under the following major categories: input processing, packet processing, and output processing. A functional block diagram of a packet data processing facility is presented, showing the distribution of the various processing functions as well as the primary data flow through the facility.
A Technical Survey on Optimization of Processing Geo Distributed Data
NASA Astrophysics Data System (ADS)
Naga Malleswari, T. Y. J.; Ushasukhanya, S.; Nithyakalyani, A.; Girija, S.
2018-04-01
With growing cloud services and technology, there is growth in some geographically distributed data centers to store large amounts of data. Analysis of geo-distributed data is required in various services for data processing, storage of essential information, etc., processing this geo-distributed data and performing analytics on this data is a challenging task. The distributed data processing is accompanied by issues in storage, computation and communication. The key issues to be dealt with are time efficiency, cost minimization, utility maximization. This paper describes various optimization methods like end-to-end multiphase, G-MR, etc., using the techniques like Map-Reduce, CDS (Community Detection based Scheduling), ROUT, Workload-Aware Scheduling, SAGE, AMP (Ant Colony Optimization) to handle these issues. In this paper various optimization methods and techniques used are analyzed. It has been observed that end-to end multiphase achieves time efficiency; Cost minimization concentrates to achieve Quality of Service, Computation and reduction of Communication cost. SAGE achieves performance improvisation in processing geo-distributed data sets.
Raster Data Partitioning for Supporting Distributed GIS Processing
NASA Astrophysics Data System (ADS)
Nguyen Thai, B.; Olasz, A.
2015-08-01
In the geospatial sector big data concept also has already impact. Several studies facing originally computer science techniques applied in GIS processing of huge amount of geospatial data. In other research studies geospatial data is considered as it were always been big data (Lee and Kang, 2015). Nevertheless, we can prove data acquisition methods have been improved substantially not only the amount, but the resolution of raw data in spectral, spatial and temporal aspects as well. A significant portion of big data is geospatial data, and the size of such data is growing rapidly at least by 20% every year (Dasgupta, 2013). The produced increasing volume of raw data, in different format, representation and purpose the wealth of information derived from this data sets represents only valuable results. However, the computing capability and processing speed rather tackle with limitations, even if semi-automatic or automatic procedures are aimed on complex geospatial data (Kristóf et al., 2014). In late times, distributed computing has reached many interdisciplinary areas of computer science inclusive of remote sensing and geographic information processing approaches. Cloud computing even more requires appropriate processing algorithms to be distributed and handle geospatial big data. Map-Reduce programming model and distributed file systems have proven their capabilities to process non GIS big data. But sometimes it's inconvenient or inefficient to rewrite existing algorithms to Map-Reduce programming model, also GIS data can not be partitioned as text-based data by line or by bytes. Hence, we would like to find an alternative solution for data partitioning, data distribution and execution of existing algorithms without rewriting or with only minor modifications. This paper focuses on technical overview of currently available distributed computing environments, as well as GIS data (raster data) partitioning, distribution and distributed processing of GIS algorithms. A proof of concept implementation have been made for raster data partitioning, distribution and processing. The first results on performance have been compared against commercial software ERDAS IMAGINE 2011 and 2014. Partitioning methods heavily depend on application areas, therefore we may consider data partitioning as a preprocessing step before applying processing services on data. As a proof of concept we have implemented a simple tile-based partitioning method splitting an image into smaller grids (NxM tiles) and comparing the processing time to existing methods by NDVI calculation. The concept is demonstrated using own development open source processing framework.
A dynamic re-partitioning strategy based on the distribution of key in Spark
NASA Astrophysics Data System (ADS)
Zhang, Tianyu; Lian, Xin
2018-05-01
Spark is a memory-based distributed data processing framework, has the ability of processing massive data and becomes a focus in Big Data. But the performance of Spark Shuffle depends on the distribution of data. The naive Hash partition function of Spark can not guarantee load balancing when data is skewed. The time of job is affected by the node which has more data to process. In order to handle this problem, dynamic sampling is used. In the process of task execution, histogram is used to count the key frequency distribution of each node, and then generate the global key frequency distribution. After analyzing the distribution of key, load balance of data partition is achieved. Results show that the Dynamic Re-Partitioning function is better than the default Hash partition, Fine Partition and the Balanced-Schedule strategy, it can reduce the execution time of the task and improve the efficiency of the whole cluster.
Count distribution for mixture of two exponentials as renewal process duration with applications
NASA Astrophysics Data System (ADS)
Low, Yeh Ching; Ong, Seng Huat
2016-06-01
A count distribution is presented by considering a renewal process where the distribution of the duration is a finite mixture of exponential distributions. This distribution is able to model over dispersion, a feature often found in observed count data. The computation of the probabilities and renewal function (expected number of renewals) are examined. Parameter estimation by the method of maximum likelihood is considered with applications of the count distribution to real frequency count data exhibiting over dispersion. It is shown that the mixture of exponentials count distribution fits over dispersed data better than the Poisson process and serves as an alternative to the gamma count distribution.
NASA Astrophysics Data System (ADS)
Olasz, A.; Nguyen Thai, B.; Kristóf, D.
2016-06-01
Within recent years, several new approaches and solutions for Big Data processing have been developed. The Geospatial world is still facing the lack of well-established distributed processing solutions tailored to the amount and heterogeneity of geodata, especially when fast data processing is a must. The goal of such systems is to improve processing time by distributing data transparently across processing (and/or storage) nodes. These types of methodology are based on the concept of divide and conquer. Nevertheless, in the context of geospatial processing, most of the distributed computing frameworks have important limitations regarding both data distribution and data partitioning methods. Moreover, flexibility and expendability for handling various data types (often in binary formats) are also strongly required. This paper presents a concept for tiling, stitching and processing of big geospatial data. The system is based on the IQLib concept (https://github.com/posseidon/IQLib/) developed in the frame of the IQmulus EU FP7 research and development project (http://www.iqmulus.eu). The data distribution framework has no limitations on programming language environment and can execute scripts (and workflows) written in different development frameworks (e.g. Python, R or C#). It is capable of processing raster, vector and point cloud data. The above-mentioned prototype is presented through a case study dealing with country-wide processing of raster imagery. Further investigations on algorithmic and implementation details are in focus for the near future.
Historical Time-Domain: Data Archives, Processing, and Distribution
NASA Astrophysics Data System (ADS)
Grindlay, Jonathan E.; Griffin, R. Elizabeth
2012-04-01
The workshop on Historical Time-Domain Astronomy (TDA) was attended by a near-capacity gathering of ~30 people. From information provided in turn by those present, an up-to-date overview was created of available plate archives, progress in their digitization, the extent of actual processing of those data, and plans for data distribution. Several recommendations were made for prioritising the processing and distribution of historical TDA data.
Data Transparency | Distributed Generation Interconnection Collaborative |
quality and availability are increasingly vital for reducing the costs of distributed generation completion in certain areas, increasing accountability for utility application processing. As distributed PV NREL, HECO, TSRG Improving Data Transparency for the Distributed PV Interconnection Process: Emergent
The application of artificial intelligence techniques to large distributed networks
NASA Technical Reports Server (NTRS)
Dubyah, R.; Smith, T. R.; Star, J. L.
1985-01-01
Data accessibility and transfer of information, including the land resources information system pilot, are structured as large computer information networks. These pilot efforts include the reduction of the difficulty to find and use data, reducing processing costs, and minimize incompatibility between data sources. Artificial Intelligence (AI) techniques were suggested to achieve these goals. The applicability of certain AI techniques are explored in the context of distributed problem solving systems and the pilot land data system (PLDS). The topics discussed include: PLDS and its data processing requirements, expert systems and PLDS, distributed problem solving systems, AI problem solving paradigms, query processing, and distributed data bases.
Shope, William G.; ,
1987-01-01
The US Geological Survey is utilizing a national network of more than 1000 satellite data-collection stations, four satellite-relay direct-readout ground stations, and more than 50 computers linked together in a private telecommunications network to acquire, process, and distribute hydrological data in near real-time. The four Survey offices operating a satellite direct-readout ground station provide near real-time hydrological data to computers located in other Survey offices through the Survey's Distributed Information System. The computerized distribution system permits automated data processing and distribution to be carried out in a timely manner under the control and operation of the Survey office responsible for the data-collection stations and for the dissemination of hydrological information to the water-data users.
Kalvelage, T.; Willems, Jennifer
2003-01-01
The design of the EOS Data and Information Systems (EOSDIS) to acquire, archive, manage and distribute Earth observation data to the broadest possible user community was discussed. A number of several integrated retrieval, processing and distribution capabilities have been explained. The value of these functions to the users were described and potential future improvements were laid out for the users. The users were interested in acquiring the retrieval, processing and archiving systems integrated so that they can get the data they want in the format and delivery mechanism of their choice.
Hierarchical species distribution models
Hefley, Trevor J.; Hooten, Mevin B.
2016-01-01
Determining the distribution pattern of a species is important to increase scientific knowledge, inform management decisions, and conserve biodiversity. To infer spatial and temporal patterns, species distribution models have been developed for use with many sampling designs and types of data. Recently, it has been shown that count, presence-absence, and presence-only data can be conceptualized as arising from a point process distribution. Therefore, it is important to understand properties of the point process distribution. We examine how the hierarchical species distribution modeling framework has been used to incorporate a wide array of regression and theory-based components while accounting for the data collection process and making use of auxiliary information. The hierarchical modeling framework allows us to demonstrate how several commonly used species distribution models can be derived from the point process distribution, highlight areas of potential overlap between different models, and suggest areas where further research is needed.
The Land Processes Distributed Active Archive Center (LP DAAC)
Golon, Danielle K.
2016-10-03
The Land Processes Distributed Active Archive Center (LP DAAC) operates as a partnership with the U.S. Geological Survey and is 1 of 12 DAACs within the National Aeronautics and Space Administration (NASA) Earth Observing System Data and Information System (EOSDIS). The LP DAAC ingests, archives, processes, and distributes NASA Earth science remote sensing data. These data are provided to the public at no charge. Data distributed by the LP DAAC provide information about Earth’s surface from daily to yearly intervals and at 15 to 5,600 meter spatial resolution. Data provided by the LP DAAC can be used to study changes in agriculture, vegetation, ecosystems, elevation, and much more. The LP DAAC provides several ways to access, process, and interact with these data. In addition, the LP DAAC is actively archiving new datasets to provide users with a variety of data to study the Earth.
A distributed data base management system. [for Deep Space Network
NASA Technical Reports Server (NTRS)
Bryan, A. I.
1975-01-01
Major system design features of a distributed data management system for the NASA Deep Space Network (DSN) designed for continuous two-way deep space communications are described. The reasons for which the distributed data base utilizing third-generation minicomputers is selected as the optimum approach for the DSN are threefold: (1) with a distributed master data base, valid data is available in real-time to support DSN management activities at each location; (2) data base integrity is the responsibility of local management; and (3) the data acquisition/distribution and processing power of a third-generation computer enables the computer to function successfully as a data handler or as an on-line process controller. The concept of the distributed data base is discussed along with the software, data base integrity, and hardware used. The data analysis/update constraint is examined.
NASA Astrophysics Data System (ADS)
Samsinar, Riza; Suseno, Jatmiko Endro; Widodo, Catur Edi
2018-02-01
The distribution network is the closest power grid to the customer Electric service providers such as PT. PLN. The dispatching center of power grid companies is also the data center of the power grid where gathers great amount of operating information. The valuable information contained in these data means a lot for power grid operating management. The technique of data warehousing online analytical processing has been used to manage and analysis the great capacity of data. Specific methods for online analytics information systems resulting from data warehouse processing with OLAP are chart and query reporting. The information in the form of chart reporting consists of the load distribution chart based on the repetition of time, distribution chart on the area, the substation region chart and the electric load usage chart. The results of the OLAP process show the development of electric load distribution, as well as the analysis of information on the load of electric power consumption and become an alternative in presenting information related to peak load.
Analysis of overdispersed count data by mixtures of Poisson variables and Poisson processes.
Hougaard, P; Lee, M L; Whitmore, G A
1997-12-01
Count data often show overdispersion compared to the Poisson distribution. Overdispersion is typically modeled by a random effect for the mean, based on the gamma distribution, leading to the negative binomial distribution for the count. This paper considers a larger family of mixture distributions, including the inverse Gaussian mixture distribution. It is demonstrated that it gives a significantly better fit for a data set on the frequency of epileptic seizures. The same approach can be used to generate counting processes from Poisson processes, where the rate or the time is random. A random rate corresponds to variation between patients, whereas a random time corresponds to variation within patients.
NASA Technical Reports Server (NTRS)
Mah, G. R.; Myers, J.
1993-01-01
The U.S. Government has initiated the Global Change Research program, a systematic study of the Earth as a complete system. NASA's contribution of the Global Change Research Program is the Earth Observing System (EOS), a series of orbital sensor platforms and an associated data processing and distribution system. The EOS Data and Information System (EOSDIS) is the archiving, production, and distribution system for data collected by the EOS space segment and uses a multilayer architecture for processing, archiving, and distributing EOS data. The first layer consists of the spacecraft ground stations and processing facilities that receive the raw data from the orbiting platforms and then separate the data by individual sensors. The second layer consists of Distributed Active Archive Centers (DAAC) that process, distribute, and archive the sensor data. The third layer consists of a user science processing network. The EOSDIS is being developed in a phased implementation. The initial phase, Version 0, is a prototype of the operational system. Version 0 activities are based upon existing systems and are designed to provide an EOSDIS-like capability for information management and distribution. An important science support task is the creation of simulated data sets for EOS instruments from precursor aircraft or satellite data. The Land Processes DAAC, at the EROS Data Center (EDC), is responsible for archiving and processing EOS precursor data from airborne instruments such as the Thermal Infrared Multispectral Scanner (TIMS), the Thematic Mapper Simulator (TMS), and Airborne Visible and Infrared Imaging Spectrometer (AVIRIS). AVIRIS, TIMS, and TMS are flown by the NASA-Ames Research Center ARC) on an ER-2. The ER-2 flies at 65000 feet and can carry up to three sensors simultaneously. Most jointly collected data sets are somewhat boresighted and roughly registered. The instrument data are being used to construct data sets that simulate the spectral and spatial characteristics of the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) instrument scheduled to be flown on the first EOS-AM spacecraft. The ASTER is designed to acquire 14 channels of land science data in the visible and near-IR (VNIR), shortwave-IR (SWIR), and thermal-IR (TIR) regions from 0.52 micron to 11.65 micron at high spatial resolutions of 15 m to 90 m. Stereo data will also be acquired in the VNIR region in a single band. The AVIRIS and TMS cover the ASTER VNIR and SWIR bands, and the TIMS covers the TIR bands. Simulated ASTER data sets have been generated over Death Valley, California, Cuprite, Nevada, and the Drum Mountains, Utah using a combination of AVIRIS, TIMS, amd TMS data, and existing digital elevation models (DEM) for the topographic information.
The Web Measurement Environment (WebME): A Tool for Combining and Modeling Distributed Data
NASA Technical Reports Server (NTRS)
Tesoriero, Roseanne; Zelkowitz, Marvin
1997-01-01
Many organizations have incorporated data collection into their software processes for the purpose of process improvement. However, in order to improve, interpreting the data is just as important as the collection of data. With the increased presence of the Internet and the ubiquity of the World Wide Web, the potential for software processes being distributed among several physically separated locations has also grown. Because project data may be stored in multiple locations and in differing formats, obtaining and interpreting data from this type of environment becomes even more complicated. The Web Measurement Environment (WebME), a Web-based data visualization tool, is being developed to facilitate the understanding of collected data in a distributed environment. The WebME system will permit the analysis of development data in distributed, heterogeneous environments. This paper provides an overview of the system and its capabilities.
Computer Sciences and Data Systems, volume 1
NASA Technical Reports Server (NTRS)
1987-01-01
Topics addressed include: software engineering; university grants; institutes; concurrent processing; sparse distributed memory; distributed operating systems; intelligent data management processes; expert system for image analysis; fault tolerant software; and architecture research.
a Hadoop-Based Distributed Framework for Efficient Managing and Processing Big Remote Sensing Images
NASA Astrophysics Data System (ADS)
Wang, C.; Hu, F.; Hu, X.; Zhao, S.; Wen, W.; Yang, C.
2015-07-01
Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing- intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System (HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.
NASA Astrophysics Data System (ADS)
Doyle, Paul; Mtenzi, Fred; Smith, Niall; Collins, Adrian; O'Shea, Brendan
2012-09-01
The scientific community is in the midst of a data analysis crisis. The increasing capacity of scientific CCD instrumentation and their falling costs is contributing to an explosive generation of raw photometric data. This data must go through a process of cleaning and reduction before it can be used for high precision photometric analysis. Many existing data processing pipelines either assume a relatively small dataset or are batch processed by a High Performance Computing centre. A radical overhaul of these processing pipelines is required to allow reduction and cleaning rates to process terabyte sized datasets at near capture rates using an elastic processing architecture. The ability to access computing resources and to allow them to grow and shrink as demand fluctuates is essential, as is exploiting the parallel nature of the datasets. A distributed data processing pipeline is required. It should incorporate lossless data compression, allow for data segmentation and support processing of data segments in parallel. Academic institutes can collaborate and provide an elastic computing model without the requirement for large centralized high performance computing data centers. This paper demonstrates how a base 10 order of magnitude improvement in overall processing time has been achieved using the "ACN pipeline", a distributed pipeline spanning multiple academic institutes.
NASA Astrophysics Data System (ADS)
Savorskiy, V.; Lupyan, E.; Balashov, I.; Burtsev, M.; Proshin, A.; Tolpin, V.; Ermakov, D.; Chernushich, A.; Panova, O.; Kuznetsov, O.; Vasilyev, V.
2014-04-01
Both development and application of remote sensing involves a considerable expenditure of material and intellectual resources. Therefore, it is important to use high-tech means of distribution of remote sensing data and processing results in order to facilitate access for as much as possible number of researchers. It should be accompanied with creation of capabilities for potentially more thorough and comprehensive, i.e. ultimately deeper, acquisition and complex analysis of information about the state of Earth's natural resources. As well objective need in a higher degree of Earth observation (EO) data assimilation is set by conditions of satellite observations, in which the observed objects are uncontrolled state. Progress in addressing this problem is determined to a large extent by order of the distributed EO information system (IS) functioning. Namely, it is largely dependent on reducing the cost of communication processes (data transfer) between spatially distributed IS nodes and data users. One of the most effective ways to improve the efficiency of data exchange processes is the creation of integrated EO IS optimized for running procedures of distributed data processing. The effective EO IS implementation should be based on specific software architecture.
Parallel volume ray-casting for unstructured-grid data on distributed-memory architectures
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu
1995-01-01
As computing technology continues to advance, computational modeling of scientific and engineering problems produces data of increasing complexity: large in size and unstructured in shape. Volume visualization of such data is a challenging problem. This paper proposes a distributed parallel solution that makes ray-casting volume rendering of unstructured-grid data practical. Both the data and the rendering process are distributed among processors. At each processor, ray-casting of local data is performed independent of the other processors. The global image composing processes, which require inter-processor communication, are overlapped with the local ray-casting processes to achieve maximum parallel efficiency. This algorithm differs from previous ones in four ways: it is completely distributed, less view-dependent, reasonably scalable, and flexible. Without using dynamic load balancing, test results on the Intel Paragon using from two to 128 processors show, on average, about 60% parallel efficiency.
Process evaluation distributed system
NASA Technical Reports Server (NTRS)
Moffatt, Christopher L. (Inventor)
2006-01-01
The distributed system includes a database server, an administration module, a process evaluation module, and a data display module. The administration module is in communication with the database server for providing observation criteria information to the database server. The process evaluation module is in communication with the database server for obtaining the observation criteria information from the database server and collecting process data based on the observation criteria information. The process evaluation module utilizes a personal digital assistant (PDA). A data display module in communication with the database server, including a website for viewing collected process data in a desired metrics form, the data display module also for providing desired editing and modification of the collected process data. The connectivity established by the database server to the administration module, the process evaluation module, and the data display module, minimizes the requirement for manual input of the collected process data.
Survey: National Environmental Satellite Service
NASA Technical Reports Server (NTRS)
1977-01-01
The national Environmental Satellite Service (NESS) receives data at periodic intervals from satellites of the Synchronous Meteorological Satellite/Geostationary Operational Environmental Satellite series and from the Improved TIROS (Television Infrared Observational Satellite) Operational Satellite. Within the conterminous United States, direct readout and processed products are distributed to users over facsimile networks from a central processing and data distribution facility. In addition, the NESS Satellite Field Stations analyze, interpret, and distribute processed geostationary satellite products to regional weather service activities.
EOS MLS Science Data Processing System: A Description of Architecture and Capabilities
NASA Technical Reports Server (NTRS)
Cuddy, David T.; Echeverri, Mark D.; Wagner, Paul A.; Hanzel, Audrey T.; Fuller, Ryan A.
2006-01-01
This paper describes the architecture and capabilities of the Science Data Processing System (SDPS) for the EOS MLS. The SDPS consists of two major components--the Science Computing Facility and the Science Investigator-led Processing System. The Science Computing Facility provides the facilities for the EOS MLS Science Team to perform the functions of scientific algorithm development, processing software development, quality control of data products, and scientific analyses. The Science Investigator-led Processing System processes and reprocesses the science data for the entire mission and delivers the data products to the Science Computing Facility and to the Goddard Space Flight Center Earth Science Distributed Active Archive Center, which archives and distributes the standard science products.
A data distributed parallel algorithm for ray-traced volume rendering
NASA Technical Reports Server (NTRS)
Ma, Kwan-Liu; Painter, James S.; Hansen, Charles D.; Krogh, Michael F.
1993-01-01
This paper presents a divide-and-conquer ray-traced volume rendering algorithm and a parallel image compositing method, along with their implementation and performance on the Connection Machine CM-5, and networked workstations. This algorithm distributes both the data and the computations to individual processing units to achieve fast, high-quality rendering of high-resolution data. The volume data, once distributed, is left intact. The processing nodes perform local ray tracing of their subvolume concurrently. No communication between processing units is needed during this locally ray-tracing process. A subimage is generated by each processing unit and the final image is obtained by compositing subimages in the proper order, which can be determined a priori. Test results on both the CM-5 and a group of networked workstations demonstrate the practicality of our rendering algorithm and compositing method.
NASA Technical Reports Server (NTRS)
Leptoukh, Gregory G.
2005-01-01
The NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) is one of the major Distributed Active Archive Centers (DAACs) archiving and distributing remote sensing data from the NASA's Earth Observing System. In addition to providing just data, the GES DISC/DAAC has developed various value-adding processing services. A particularly useful service is data processing a t the DISC (i.e., close to the input data) with the users' algorithms. This can take a number of different forms: as a configuration-managed algorithm within the main processing stream; as a stand-alone program next to the on-line data storage; as build-it-yourself code within the Near-Archive Data Mining (NADM) system; or as an on-the-fly analysis with simple algorithms embedded into the web-based tools (to avoid downloading unnecessary all the data). The existing data management infrastructure at the GES DISC supports a wide spectrum of options: from data subsetting data spatially and/or by parameter to sophisticated on-line analysis tools, producing economies of scale and rapid time-to-deploy. Shifting processing and data management burden from users to the GES DISC, allows scientists to concentrate on science, while the GES DISC handles the data management and data processing at a lower cost. Several examples of successful partnerships with scientists in the area of data processing and mining are presented.
OpenCluster: A Flexible Distributed Computing Framework for Astronomical Data Processing
NASA Astrophysics Data System (ADS)
Wei, Shoulin; Wang, Feng; Deng, Hui; Liu, Cuiyin; Dai, Wei; Liang, Bo; Mei, Ying; Shi, Congming; Liu, Yingbo; Wu, Jingping
2017-02-01
The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developing high-performance processing pipelines of astronomical big data. We first detail the OpenCluster design principles and implementations and present the APIs facilitated by the framework. We then demonstrate a case in which OpenCluster is used to resolve complex data processing problems for developing a pipeline for the Mingantu Ultrawide Spectral Radioheliograph. Finally, we present our OpenCluster performance evaluation. Overall, OpenCluster provides not only high fault tolerance and simple programming interfaces, but also a flexible means of scaling up the number of interacting entities. OpenCluster thereby provides an easily integrated distributed computing framework for quickly developing a high-performance data processing system of astronomical telescopes and for significantly reducing software development expenses.
Distributed Data Processing in a United States Naval Shipyard.
1979-12-01
25 1. Evolution ........ ..................... 25 2. Motivations for Distributed Processing ... ....... 30 a. Extensibility...51 B. EVOLUTION ...... ........................ ... 51 C. CONCEPTS .... ... ........................ . 55 D. FORM AND STRUCTURE OF THE...motivations for, and the characteristics of, distributed processing as they apply to management information systems. 1. Evolution Prior to the advent of
Kalvelage, Thomas A.; Willems, Jennifer
2005-01-01
The US Geological Survey's EROS Data Center (EDC) hosts the Land Processes Distributed Active Archive Center (LP DAAC). The LP DAAC supports NASA's Earth Observing System (EOS), which is a series of polar-orbiting and low inclination satellites for long-term global observations of the land surface, biosphere, solid Earth, atmosphere, and oceans. The EOS Data and Information Systems (EOSDIS) was designed to acquire, archive, manage and distribute Earth observation data to the broadest possible user community.The LP DAAC is one of four DAACs that utilize the EOSDIS Core System (ECS) to manage and archive their data. Since the ECS was originally designed, significant changes have taken place in technology, user expectations, and user requirements. Therefore the LP DAAC has implemented additional systems to meet the evolving needs of scientific users, tailored to an integrated working environment. These systems provide a wide variety of services to improve data access and to enhance data usability through subsampling, reformatting, and reprojection. These systems also support the wide breadth of products that are handled by the LP DAAC.The LP DAAC is the primary archive for the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) data; it is the only facility in the United States that archives, processes, and distributes data from the Advanced Spaceborne Thermal Emission/Reflection Radiometer (ASTER) on NASA's Terra spacecraft; and it is responsible for the archive and distribution of “land products” generated from data acquired by the Moderate Resolution Imaging Spectroradiometer (MODIS) on NASA's Terra and Aqua satellites.
Mars Observer data production, transfer, and archival: The data production assembly line
NASA Technical Reports Server (NTRS)
Childs, David B.
1993-01-01
This paper describes the data production, transfer, and archival process designed for the Mars Observer Flight Project. It addresses the developmental and operational aspects of the archive collection production process. The developmental aspects cover the design and packaging of data products for archival and distribution to the planetary community. Also discussed is the design and development of a data transfer and volume production process capable of handling the large throughput and complexity of the Mars Observer data products. The operational aspects cover the main functions of the process: creating data and engineering products, collecting the data products and ancillary products in a central repository, producing archive volumes, validating volumes, archiving, and distributing the data to the planetary community.
Rapid Processing of Radio Interferometer Data for Transient Surveys
NASA Astrophysics Data System (ADS)
Bourke, S.; Mooley, K.; Hallinan, G.
2014-05-01
We report on a software infrastructure and pipeline developed to process large radio interferometer datasets. The pipeline is implemented using a radical redesign of the AIPS processing model. An infrastructure we have named AIPSlite is used to spawn, at runtime, minimal AIPS environments across a cluster. The pipeline then distributes and processes its data in parallel. The system is entirely free of the traditional AIPS distribution and is self configuring at runtime. This software has so far been used to process a EVLA Stripe 82 transient survey, the data for the JVLA-COSMOS project, and has been used to process most of the EVLA L-Band data archive imaging each integration to search for short duration transients.
High-performance data processing using distributed computing on the SOLIS project
NASA Astrophysics Data System (ADS)
Wampler, Stephen
2002-12-01
The SOLIS solar telescope collects data at a high rate, resulting in 500 GB of raw data each day. The SOLIS Data Handling System (DHS) has been designed to quickly process this data down to 156 GB of reduced data. The DHS design uses pools of distributed reduction processes that are allocated to different observations as needed. A farm of 10 dual-cpu Linux boxes contains the pools of reduction processes. Control is through CORBA and data is stored on a fibre channel storage area network (SAN). Three other Linux boxes are responsible for pulling data from the instruments using SAN-based ringbuffers. Control applications are Java-based while the reduction processes are written in C++. This paper presents the overall design of the SOLIS DHS and provides details on the approach used to control the pooled reduction processes. The various strategies used to manage the high data rates are also covered.
NASA Technical Reports Server (NTRS)
Brown, J. W.; Cleven, G. C.; Klose, J. C.; Lame, D. B.; Yamarone, C. A.
1979-01-01
The Seasat low-rate data system, an end-to-end data-processing and data-distribution system for the four low-rate sensors (radar altimeter, Seasat-A scatterometer system, scanning multichannel microwave radiometer, and visible and infrared radiometer) carried aboard the satellite, is discussed. The function of the distributed, nonreal-time, magnetic-tape system is to apply necessary calibrations, corrections, and conversions to yield geophysically meaningful products from raw telemetry data. The algorithms developed for processing data from the different sensors are described, together with the data catalogs compiled.
Flight deck benefits of integrated data link communication
NASA Technical Reports Server (NTRS)
Waller, Marvin C.
1992-01-01
A fixed-base, piloted simulation study was conducted to determine the operational benefits that result when air traffic control (ATC) instructions are transmitted to the deck of a transport aircraft over a digital data link. The ATC instructions include altitude, airspeed, heading, radio frequency, and route assignment data. The interface between the flight deck and the data link was integrated with other subsystems of the airplane to facilitate data management. Data from the ATC instructions were distributed to the flight guidance and control system, the navigation system, and an automatically tuned communication radio. The co-pilot initiated the automation-assisted data distribution process. Digital communications and automated data distribution were compared with conventional voice radio communication and manual input of data into other subsystems of the simulated aircraft. Less time was required in the combined communication and data management process when data link ATC communication was integrated with the other subsystems. The test subjects, commercial airline pilots, provided favorable evaluations of both the digital communication and data management processes.
Understanding Local Structure Globally in Earth Science Remote Sensing Data Sets
NASA Technical Reports Server (NTRS)
Braverman, Amy; Fetzer, Eric
2007-01-01
Empirical probability distributions derived from the data are the signatures of physical processes generating the data. Distributions defined on different space-time windows can be compared and differences or changes can be attributed to physical processes. This presentation discusses on ways to reduce remote sensing data in a way that preserves information, focusing on the rate-distortion theory and using the entropy-constrained vector quantization algorithm.
NASA Technical Reports Server (NTRS)
Koeberlein, Ernest, III; Pender, Shaw Exum
1994-01-01
This paper describes the Multimission Telemetry Visualization (MTV) data acquisition/distribution system. MTV was developed by JPL's Multimedia Communications Laboratory (MCL) and designed to process and display digital, real-time, science and engineering data from JPL's Mission Control Center. The MTV system can be accessed using UNIX workstations and PC's over common datacom and telecom networks from worldwide locations. It is designed to lower data distribution costs while increasing data analysis functionality by integrating low-cost, off-the-shelf desktop hardware and software. MTV is expected to significantly lower the cost of real-time data display, processing, distribution, and allow for greater spacecraft safety and mission data access.
NASA Astrophysics Data System (ADS)
Wiesmann, William P.; Pranger, L. Alex; Bogucki, Mary S.
1998-05-01
Remote monitoring of physiologic data from individual high- risk workers distributed over time and space is a considerable challenge. This is often due to an inadequate capability to accurately integrate large amounts of data into usable information in real time. In this report, we have used the vertical and horizontal organization of the 'fireground' as a framework to design a distributed network of sensors. In this system, sensor output is linked through a hierarchical object oriented programing process to accurately interpret physiological data, incorporate these data into a synchronous model and relay processed data, trends and predictions to members of the fire incident command structure. There are several unique aspects to this approach. The first includes a process to account for variability in vital parameter values for each individual's normal physiologic response by including an adaptive network in each data process. This information is used by the model in an iterative process to baseline a 'normal' physiologic response to a given stress for each individual and to detect deviations that indicate dysfunction or a significant insult. The second unique capability of the system orders the information for each user including the subject, local company officers, medical personnel and the incident commanders. Information can be retrieved and used for training exercises and after action analysis. Finally this system can easily be adapted to existing communication and processing links along with incorporating the best parts of current models through the use of object oriented programming techniques. These modern software techniques are well suited to handling multiple data processes independently over time in a distributed network.
The impact of distributed computing on education
NASA Technical Reports Server (NTRS)
Utku, S.; Lestingi, J.; Salama, M.
1982-01-01
In this paper, developments in digital computer technology since the early Fifties are reviewed briefly, and the parallelism which exists between these developments and developments in analysis and design procedures of structural engineering is identified. The recent trends in digital computer technology are examined in order to establish the fact that distributed processing is now an accepted philosophy for further developments. The impact of this on the analysis and design practices of structural engineering is assessed by first examining these practices from a data processing standpoint to identify the key operations and data bases, and then fitting them to the characteristics of distributed processing. The merits and drawbacks of the present philosophy in educating structural engineers are discussed and projections are made for the industry-academia relations in the distributed processing environment of structural analysis and design. An ongoing experiment of distributed computing in a university environment is described.
Bibliography On Multiprocessors And Distributed Processing
NASA Technical Reports Server (NTRS)
Miya, Eugene N.
1988-01-01
Multiprocessor and Distributed Processing Bibliography package consists of large machine-readable bibliographic data base, which in addition to usual keyword searches, used for producing citations, indexes, and cross-references. Data base contains UNIX(R) "refer" -formatted ASCII data and implemented on any computer running under UNIX(R) operating system. Easily convertible to other operating systems. Requires approximately one megabyte of secondary storage. Bibliography compiled in 1985.
Land processes distributed active archive center product lifecycle plan
Daucsavage, John C.; Bennett, Stacie D.
2014-01-01
The U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center and the National Aeronautics and Space Administration (NASA) Earth Science Data System Program worked together to establish, develop, and operate the Land Processes (LP) Distributed Active Archive Center (DAAC) to provide stewardship for NASA’s land processes science data. These data are critical science assets that serve the land processes science community with potential value beyond any immediate research use, and therefore need to be accounted for and properly managed throughout their lifecycle. A fundamental LP DAAC objective is to enable permanent preservation of these data and information products. The LP DAAC accomplishes this by bridging data producers and permanent archival resources while providing intermediate archive services for data and information products.
Provenance-aware optimization of workload for distributed data production
NASA Astrophysics Data System (ADS)
Makatun, Dzmitry; Lauret, Jérôme; Rudová, Hana; Šumbera, Michal
2017-10-01
Distributed data processing in High Energy and Nuclear Physics (HENP) is a prominent example of big data analysis. Having petabytes of data being processed at tens of computational sites with thousands of CPUs, standard job scheduling approaches either do not address well the problem complexity or are dedicated to one specific aspect of the problem only (CPU, network or storage). Previously we have developed a new job scheduling approach dedicated to distributed data production - an essential part of data processing in HENP (preprocessing in big data terminology). In this contribution, we discuss the load balancing with multiple data sources and data replication, present recent improvements made to our planner and provide results of simulations which demonstrate the advantage against standard scheduling policies for the new use case. Multi-source or provenance is common in computing models of many applications whereas the data may be copied to several destinations. The initial input data set would hence be already partially replicated to multiple locations and the task of the scheduler is to maximize overall computational throughput considering possible data movements and CPU allocation. The studies have shown that our approach can provide a significant gain in overall computational performance in a wide scope of simulations considering realistic size of computational Grid and various input data distribution.
ERIC Educational Resources Information Center
Laszlo, Sarah; Plaut, David C.
2012-01-01
The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between…
Signal processing for distributed sensor concept: DISCO
NASA Astrophysics Data System (ADS)
Rafailov, Michael K.
2007-04-01
Distributed Sensor concept - DISCO proposed for multiplication of individual sensor capabilities through cooperative target engagement. DISCO relies on ability of signal processing software to format, to process and to transmit and receive sensor data and to exploit those data in signal synthesis process. Each sensor data is synchronized formatted, Signal-to-Noise Ration (SNR) enhanced and distributed inside of the sensor network. Signal processing technique for DISCO is Recursive Adaptive Frame Integration of Limited data - RAFIL technique that was initially proposed [1] as a way to improve the SNR, reduce data rate and mitigate FPA correlated noise of an individual sensor digital video-signal processing. In Distributed Sensor Concept RAFIL technique is used in segmented way, when constituencies of the technique are spatially and/or temporally separated between transmitters and receivers. Those constituencies include though not limited to two thresholds - one is tuned for optimum probability of detection, the other - to manage required false alarm rate, and limited frame integration placed somewhere between the thresholds as well as formatters, conventional integrators and more. RAFIL allows a non-linear integration that, along with SNR gain, provides system designers more capability where cost, weight, or power considerations limit system data rate, processing, or memory capability [2]. DISCO architecture allows flexible optimization of SNR gain, data rates and noise suppression on sensor's side and limited integration, re-formatting and final threshold on node's side. DISCO with Recursive Adaptive Frame Integration of Limited data may have flexible architecture that allows segmenting the hardware and software to be best suitable for specific DISCO applications and sensing needs - whatever it is air-or-space platforms, ground terminals or integration of sensors network.
The Impact of Education on Income Distribution.
ERIC Educational Resources Information Center
Tinbergen, Jan
The author's previously developed theory on income distribution, in which two of the explanatory variables are the average level and the distribution of education, is refined and tested on data selected and processed by the author and data from three studies by Americans. The material consists of data on subdivisions of three countries, the United…
ICESat Science Investigator led Processing System (I-SIPS)
NASA Astrophysics Data System (ADS)
Bhardwaj, S.; Bay, J.; Brenner, A.; Dimarzio, J.; Hancock, D.; Sherman, M.
2003-12-01
The ICESat Science Investigator-led Processing System (I-SIPS) generates the GLAS standard data products. It consists of two main parts the Scheduling and Data Management System (SDMS) and the Geoscience Laser Altimeter System (GLAS) Science Algorithm Software. The system has been operational since the successful launch of ICESat. It ingests data from the GLAS instrument, generates GLAS data products, and distributes them to the GLAS Science Computing Facility (SCF), the Instrument Support Facility (ISF) and the National Snow and Ice Data Center (NSIDC) ECS DAAC. The SDMS is the Planning, Scheduling and Data Management System that runs the GLAS Science Algorithm Software (GSAS). GSAS is based on the Algorithm Theoretical Basis Documents provided by the Science Team and is developed independently of SDMS. The SDMS provides the processing environment to plan jobs based on existing data, control job flow, data distribution, and archiving. The SDMS design is based on a mission-independent architecture that imposes few constraints on the science code thereby facilitating I-SIPS integration. I-SIPS currently works in an autonomous manner to ingest GLAS instrument data, distribute this data to the ISF, run the science processing algorithms to produce the GLAS standard products, reprocess data when new versions of science algorithms are released, and distributes the products to the SCF, ISF, and NSIDC. I-SIPS has a proven performance record, delivering the data to the SCF within hours after the initial instrument activation. The I-SIPS design philosophy gives this system a high potential for reuse in other science missions.
A convergent model for distributed processing of Big Sensor Data in urban engineering networks
NASA Astrophysics Data System (ADS)
Parygin, D. S.; Finogeev, A. G.; Kamaev, V. A.; Finogeev, A. A.; Gnedkova, E. P.; Tyukov, A. P.
2017-01-01
The problems of development and research of a convergent model of the grid, cloud, fog and mobile computing for analytical Big Sensor Data processing are reviewed. The model is meant to create monitoring systems of spatially distributed objects of urban engineering networks and processes. The proposed approach is the convergence model of the distributed data processing organization. The fog computing model is used for the processing and aggregation of sensor data at the network nodes and/or industrial controllers. The program agents are loaded to perform computing tasks for the primary processing and data aggregation. The grid and the cloud computing models are used for integral indicators mining and accumulating. A computing cluster has a three-tier architecture, which includes the main server at the first level, a cluster of SCADA system servers at the second level, a lot of GPU video cards with the support for the Compute Unified Device Architecture at the third level. The mobile computing model is applied to visualize the results of intellectual analysis with the elements of augmented reality and geo-information technologies. The integrated indicators are transferred to the data center for accumulation in a multidimensional storage for the purpose of data mining and knowledge gaining.
An approach for heterogeneous and loosely coupled geospatial data distributed computing
NASA Astrophysics Data System (ADS)
Chen, Bin; Huang, Fengru; Fang, Yu; Huang, Zhou; Lin, Hui
2010-07-01
Most GIS (Geographic Information System) applications tend to have heterogeneous and autonomous geospatial information resources, and the availability of these local resources is unpredictable and dynamic under a distributed computing environment. In order to make use of these local resources together to solve larger geospatial information processing problems that are related to an overall situation, in this paper, with the support of peer-to-peer computing technologies, we propose a geospatial data distributed computing mechanism that involves loosely coupled geospatial resource directories and a term named as Equivalent Distributed Program of global geospatial queries to solve geospatial distributed computing problems under heterogeneous GIS environments. First, a geospatial query process schema for distributed computing as well as a method for equivalent transformation from a global geospatial query to distributed local queries at SQL (Structured Query Language) level to solve the coordinating problem among heterogeneous resources are presented. Second, peer-to-peer technologies are used to maintain a loosely coupled network environment that consists of autonomous geospatial information resources, thus to achieve decentralized and consistent synchronization among global geospatial resource directories, and to carry out distributed transaction management of local queries. Finally, based on the developed prototype system, example applications of simple and complex geospatial data distributed queries are presented to illustrate the procedure of global geospatial information processing.
Production and Distribution of NASA MODIS Remote Sensing Products
NASA Technical Reports Server (NTRS)
Wolfe, Robert
2007-01-01
The two Moderate Resolution Imaging Spectroradiometer (MODIS) instruments on-board NASA's Earth Observing System (EOS) Terra and Aqua satellites make key measurements for understanding the Earth's terrestrial ecosystems. Global time-series of terrestrial geophysical parameters have been produced from MODIS/Terra for over 7 years and for MODIS/Aqua for more than 4 1/2 years. These well calibrated instruments, a team of scientists and a large data production, archive and distribution systems have allowed for the development of a new suite of high quality product variables at spatial resolutions as fine as 250m in support of global change research and natural resource applications. This talk describes the MODIS Science team's products, with a focus on the terrestrial (land) products, the data processing approach and the process for monitoring and improving the product quality. The original MODIS science team was formed in 1989. The team's primary role is the development and implementation of the geophysical algorithms. In addition, the team provided feedback on the design and pre-launch testing of the instrument and helped guide the development of the data processing system. The key challenges the science team dealt with before launch were the development of algorithms for a new instrument and provide guidance of the large and complex multi-discipline processing system. Land, Ocean and Atmosphere discipline teams drove the processing system requirements, particularly in the area of the processing loads and volumes needed to daily produce geophysical maps of the Earth at resolutions as fine as 250 m. The processing system had to handle a large number of data products, large data volumes and processing loads, and complex processing requirements. Prior to MODIS, daily global maps from heritage instruments, such as Advanced Very High Resolution Radiometer (AVHRR), were not produced at resolutions finer than 5 km. The processing solution evolved into a combination of processing the lower level (Level 1) products and the higher level discipline specific Land and Atmosphere products in the MODIS Science Investigator Lead Processing System (SIPS), the MODIS Adaptive Processing System (MODAPS), and archive and distribution of the Land products to the user community by two of NASA s EOS Distributed Active Archive Centers (DAACs). Recently, a part of MODAPS, the Level 1 and Atmosphere Archive and Distribution System (LAADS), took over the role of archiving and distributing the Level 1 and Atmosphere products to the user community.
Remote-Sensing Data Distribution and Processing in the Cloud at the ASF DAAC
NASA Astrophysics Data System (ADS)
Stoner, C.; Arko, S. A.; Nicoll, J. B.; Labelle-Hamer, A. L.
2016-12-01
The Alaska Satellite Facility (ASF) Distributed Active Archive Center (DAAC) has been tasked to archive and distribute data from both SENTINEL-1 satellites and from the NASA-ISRO Synthetic Aperture Radar (NISAR) satellite in a cost effective manner. In order to best support processing and distribution of these large data sets for users, the ASF DAAC enhanced our data system in a number of ways that will be detailed in this presentation.The SENTINEL-1 mission comprises a constellation of two polar-orbiting satellites, operating day and night performing C-band Synthetic Aperture Radar (SAR) imaging, enabling them to acquire imagery regardless of the weather. SENTINEL-1A was launched by the European Space Agency (ESA) in April 2014. SENTINEL-1B is scheduled to launch in April 2016.The NISAR satellite is designed to observe and take measurements of some of the planet's most complex processes, including ecosystem disturbances, ice-sheet collapse, and natural hazards such as earthquakes, tsunamis, volcanoes and landslides. NISAR will employ radar imaging, polarimetry, and interferometry techniques using the SweepSAR technology employed for full-resolution wide-swath imaging. NISAR data files are large, making storage and processing a challenge for conventional store and download systems.To effectively process, store, and distribute petabytes of data in a High-performance computing environment, ASF took a long view with regard to technology choices and picked a path of most flexibility and Software re-use. To that end, this Software tools and services presentation will cover Web Object Storage (WOS) and the ability to seamlessly move from local sunk cost hardware to public cloud, such as Amazon Web Services (AWS). A prototype of SENTINEL-1A system that is in AWS, as well as a local hardware solution, will be examined to explain the pros and cons of each. In preparation for NISAR files which will be even larger than SENTINEL-1A, ASF has embarked on a number of cloud initiatives, including processing in the cloud at scale, processing data on-demand, and processing end-user computations on DAAC data in the cloud.
Volcanic Ash Data Assimilation System for Atmospheric Transport Model
NASA Astrophysics Data System (ADS)
Ishii, K.; Shimbori, T.; Sato, E.; Tokumoto, T.; Hayashi, Y.; Hashimoto, A.
2017-12-01
The Japan Meteorological Agency (JMA) has two operations for volcanic ash forecasts, which are Volcanic Ash Fall Forecast (VAFF) and Volcanic Ash Advisory (VAA). In these operations, the forecasts are calculated by atmospheric transport models including the advection process, the turbulent diffusion process, the gravitational fall process and the deposition process (wet/dry). The initial distribution of volcanic ash in the models is the most important but uncertain factor. In operations, the model of Suzuki (1983) with many empirical assumptions is adopted to the initial distribution. This adversely affects the reconstruction of actual eruption plumes.We are developing a volcanic ash data assimilation system using weather radars and meteorological satellite observation, in order to improve the initial distribution of the atmospheric transport models. Our data assimilation system is based on the three-dimensional variational data assimilation method (3D-Var). Analysis variables are ash concentration and size distribution parameters which are mutually independent. The radar observation is expected to provide three-dimensional parameters such as ash concentration and parameters of ash particle size distribution. On the other hand, the satellite observation is anticipated to provide two-dimensional parameters of ash clouds such as mass loading, top height and particle effective radius. In this study, we estimate the thickness of ash clouds using vertical wind shear of JMA numerical weather prediction, and apply for the volcanic ash data assimilation system.
Adaptive Distributed Intelligent Control Architecture for Future Propulsion Systems (Preprint)
2007-04-01
weight will be reduced by replacing heavy harness assemblies and FADECs , with distributed processing elements interconnected. This paper reviews...Digital Electronic Controls ( FADECs ), with distributed processing elements interconnected through a serial bus. Efficient data flow throughout the...because intelligence is embedded in components while overall control is maintained in the FADEC . The need for Distributed Control Systems in
Development and Operation of the Americas ALOS Data Node
NASA Astrophysics Data System (ADS)
Arko, S. A.; Marlin, R. H.; La Belle-Hamer, A. L.
2004-12-01
In the spring of 2005, the Japanese Aerospace Exploration Agency (JAXA) will launch the next generation in advanced, remote sensing satellites. The Advanced Land Observing Satellite (ALOS) includes three sensors, two visible imagers and one L-band polarimetric SAR, providing high-quality remote sensing data to the scientific and commercial communities throughout the world. Focusing on remote sensing and scientific pursuits, ALOS will image nearly the entire Earth using all three instruments during its expected three-year lifetime. These data sets offer the potential for data continuation of older satellite missions as well as new products for the growing user community. One of the unique features of the ALOS mission is the data distribution approach. JAXA has created a worldwide cooperative data distribution network. The data nodes are NOAA /ASF representing the Americas ALOS Data Node (AADN), ESA representing the ALOS European and African Node (ADEN), Geoscience Australia representing Oceania and JAXA representing the Asian continent. The AADN is the sole agency responsible for archival, processing and distribution of L0 and L1 products to users in both North and South America. In support of this mission, AADN is currently developing a processing and distribution infrastructure to provide easy access to these data sets. Utilizing a custom, grid-based process controller and media generation system, the overall infrastructure has been designed to provide maximum throughput while requiring a minimum of operator input and maintenance. This paper will present an overview of the ALOS system, details of each sensor's capabilities and of the processing and distribution system being developed by AADN to provide these valuable data sets to users throughout North and South America.
NASA Astrophysics Data System (ADS)
Plionis, A. A.; Peterson, D. S.; Tandon, L.; LaMont, S. P.
2010-03-01
Uranium particles within the respirable size range pose a significant hazard to the health and safety of workers. Significant differences in the deposition and incorporation patterns of aerosols within the respirable range can be identified and integrated into sophisticated health physics models. Data characterizing the uranium particle size distribution resulting from specific foundry-related processes are needed. Using personal air sampling cascade impactors, particles collected from several foundry processes were sorted by activity median aerodynamic diameter onto various Marple substrates. After an initial gravimetric assessment of each impactor stage, the substrates were analyzed by alpha spectrometry to determine the uranium content of each stage. Alpha spectrometry provides rapid non-distructive isotopic data that can distinguish process uranium from natural sources and the degree of uranium contribution to the total accumulated particle load. In addition, the particle size bins utilized by the impactors provide adequate resolution to determine if a process particle size distribution is: lognormal, bimodal, or trimodal. Data on process uranium particle size values and distributions facilitate the development of more sophisticated and accurate models for internal dosimetry, resulting in an improved understanding of foundry worker health and safety.
Hazan, Lynn; Zugaro, Michaël; Buzsáki, György
2006-09-15
Recent technological advances now allow for simultaneous recording of large populations of anatomically distributed neurons in behaving animals. The free software package described here was designed to help neurophysiologists process and view recorded data in an efficient and user-friendly manner. This package consists of several well-integrated applications, including NeuroScope (http://neuroscope.sourceforce.net), an advanced viewer for electrophysiological and behavioral data with limited editing capabilities, Klusters (http://klusters.sourceforge.net), a graphical cluster cutting application for manual and semi-automatic spike sorting, NDManager (GPL,see http://www.gnu.org/licenses/gpl.html), an experimental parameter and data processing manager. All of these programs are distributed under the GNU General Public License (GPL, see ), which gives its users legal permission to copy, distribute and/or modify the software. Also included are extensive user manuals and sample data, as well as source code and documentation.
Swarming Reconnaissance Using Unmanned Aerial Vehicles in a Parallel Discrete Event Simulation
2004-03-01
60 4.3.1.4 Data Distribution Management . . . . . . . . . 60 4.3.1.5 Breathing Time Warp Algorithm/ Rolling Back . 61...58 BTW Breathing Time Warp . . . . . . . . . . . . . . . . . . . . . . . . . 59 DDM Data Distribution Management . . . . . . . . . . . . . . . . . . . . 60...events based on the 58 process algorithm. Data proxies/ distribution management is the vital portion of the SPEEDES im- plementation that allows objects
A distributed computing model for telemetry data processing
NASA Astrophysics Data System (ADS)
Barry, Matthew R.; Scott, Kevin L.; Weismuller, Steven P.
1994-05-01
We present a new approach to distributing processed telemetry data among spacecraft flight controllers within the control centers at NASA's Johnson Space Center. This approach facilitates the development of application programs which integrate spacecraft-telemetered data and ground-based synthesized data, then distributes this information to flight controllers for analysis and decision-making. The new approach combines various distributed computing models into one hybrid distributed computing model. The model employs both client-server and peer-to-peer distributed computing models cooperating to provide users with information throughout a diverse operations environment. Specifically, it provides an attractive foundation upon which we are building critical real-time monitoring and control applications, while simultaneously lending itself to peripheral applications in playback operations, mission preparations, flight controller training, and program development and verification. We have realized the hybrid distributed computing model through an information sharing protocol. We shall describe the motivations that inspired us to create this protocol, along with a brief conceptual description of the distributed computing models it employs. We describe the protocol design in more detail, discussing many of the program design considerations and techniques we have adopted. Finally, we describe how this model is especially suitable for supporting the implementation of distributed expert system applications.
A distributed computing model for telemetry data processing
NASA Technical Reports Server (NTRS)
Barry, Matthew R.; Scott, Kevin L.; Weismuller, Steven P.
1994-01-01
We present a new approach to distributing processed telemetry data among spacecraft flight controllers within the control centers at NASA's Johnson Space Center. This approach facilitates the development of application programs which integrate spacecraft-telemetered data and ground-based synthesized data, then distributes this information to flight controllers for analysis and decision-making. The new approach combines various distributed computing models into one hybrid distributed computing model. The model employs both client-server and peer-to-peer distributed computing models cooperating to provide users with information throughout a diverse operations environment. Specifically, it provides an attractive foundation upon which we are building critical real-time monitoring and control applications, while simultaneously lending itself to peripheral applications in playback operations, mission preparations, flight controller training, and program development and verification. We have realized the hybrid distributed computing model through an information sharing protocol. We shall describe the motivations that inspired us to create this protocol, along with a brief conceptual description of the distributed computing models it employs. We describe the protocol design in more detail, discussing many of the program design considerations and techniques we have adopted. Finally, we describe how this model is especially suitable for supporting the implementation of distributed expert system applications.
AIRSAR Web-Based Data Processing
NASA Technical Reports Server (NTRS)
Chu, Anhua; Van Zyl, Jakob; Kim, Yunjin; Hensley, Scott; Lou, Yunling; Madsen, Soren; Chapman, Bruce; Imel, David; Durden, Stephen; Tung, Wayne
2007-01-01
The AIRSAR automated, Web-based data processing and distribution system is an integrated, end-to-end synthetic aperture radar (SAR) processing system. Designed to function under limited resources and rigorous demands, AIRSAR eliminates operational errors and provides for paperless archiving. Also, it provides a yearly tune-up of the processor on flight missions, as well as quality assurance with new radar modes and anomalous data compensation. The software fully integrates a Web-based SAR data-user request subsystem, a data processing system to automatically generate co-registered multi-frequency images from both polarimetric and interferometric data collection modes in 80/40/20 MHz bandwidth, an automated verification quality assurance subsystem, and an automatic data distribution system for use in the remote-sensor community. Features include Survey Automation Processing in which the software can automatically generate a quick-look image from an entire 90-GB SAR raw data 32-MB/s tape overnight without operator intervention. Also, the software allows product ordering and distribution via a Web-based user request system. To make AIRSAR more user friendly, it has been designed to let users search by entering the desired mission flight line (Missions Searching), or to search for any mission flight line by entering the desired latitude and longitude (Map Searching). For precision image automation processing, the software generates the products according to each data processing request stored in the database via a Queue management system. Users are able to have automatic generation of coregistered multi-frequency images as the software generates polarimetric and/or interferometric SAR data processing in ground and/or slant projection according to user processing requests for one of the 12 radar modes.
Data-driven process decomposition and robust online distributed modelling for large-scale processes
NASA Astrophysics Data System (ADS)
Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou
2018-02-01
With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.
schwimmbad: A uniform interface to parallel processing pools in Python
NASA Astrophysics Data System (ADS)
Price-Whelan, Adrian M.; Foreman-Mackey, Daniel
2017-09-01
Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).
Efficient LIDAR Point Cloud Data Managing and Processing in a Hadoop-Based Distributed Framework
NASA Astrophysics Data System (ADS)
Wang, C.; Hu, F.; Sha, D.; Han, X.
2017-10-01
Light Detection and Ranging (LiDAR) is one of the most promising technologies in surveying and mapping city management, forestry, object recognition, computer vision engineer and others. However, it is challenging to efficiently storage, query and analyze the high-resolution 3D LiDAR data due to its volume and complexity. In order to improve the productivity of Lidar data processing, this study proposes a Hadoop-based framework to efficiently manage and process LiDAR data in a distributed and parallel manner, which takes advantage of Hadoop's storage and computing ability. At the same time, the Point Cloud Library (PCL), an open-source project for 2D/3D image and point cloud processing, is integrated with HDFS and MapReduce to conduct the Lidar data analysis algorithms provided by PCL in a parallel fashion. The experiment results show that the proposed framework can efficiently manage and process big LiDAR data.
Uncertainty of Polarized Parton Distributions
NASA Astrophysics Data System (ADS)
Hirai, M.; Goto, Y.; Horaguchi, T.; Kobayashi, H.; Kumano, S.; Miyama, M.; Saito, N.; Shibata, T.-A.
Polarized parton distribution functions are determined by a χ2 analysis of polarized deep inelastic experimental data. In this paper, uncertainty of obtained distribution functions is investigated by a Hessian method. We find that the uncertainty of the polarized gluon distribution is fairly large. Then, we estimate the gluon uncertainty by including the fake data which are generated from prompt photon process at RHIC. We observed that the uncertainty could be reduced with these data.
Alternative Fuels Data Center: Propane Production and Distribution
produced from liquid components recovered during natural gas processing. These components include ethane & Incentives Propane Production and Distribution Propane is a by-product of natural gas processing distribution showing propane originating from three sources: 1) gas well and gas plant, 2) oil well and
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.
Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen
2017-09-25
One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.
DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †
Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen
2017-01-01
One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679
NATIONAL WATER INFORMATION SYSTEM OF THE U. S. GEOLOGICAL SURVEY.
Edwards, Melvin D.
1985-01-01
National Water Information System (NWIS) has been designed as an interactive, distributed data system. It will integrate the existing, diverse data-processing systems into a common system. It will also provide easier, more flexible use as well as more convenient access and expanded computing, dissemination, and data-analysis capabilities. The NWIS is being implemented as part of a Distributed Information System (DIS) being developed by the Survey's Water Resources Division. The NWIS will be implemented on each node of the distributed network for the local processing, storage, and dissemination of hydrologic data collected within the node's area of responsibility. The processor at each node will also be used to perform hydrologic modeling, statistical data analysis, text editing, and some administrative work.
Correlation signatures of wet soils and snows. [algorithm development and computer programming
NASA Technical Reports Server (NTRS)
Phillips, M. R.
1972-01-01
Interpretation, analysis, and development of algorithms have provided the necessary computational programming tools for soil data processing, data handling and analysis. Algorithms that have been developed thus far, are adequate and have been proven successful for several preliminary and fundamental applications such as software interfacing capabilities, probability distributions, grey level print plotting, contour plotting, isometric data displays, joint probability distributions, boundary mapping, channel registration and ground scene classification. A description of an Earth Resources Flight Data Processor, (ERFDP), which handles and processes earth resources data under a users control is provided.
Distributed Visualization Project
NASA Technical Reports Server (NTRS)
Craig, Douglas; Conroy, Michael; Kickbusch, Tracey; Mazone, Rebecca
2016-01-01
Distributed Visualization allows anyone, anywhere to see any simulation at any time. Development focuses on algorithms, software, data formats, data systems and processes to enable sharing simulation-based information across temporal and spatial boundaries without requiring stakeholders to possess highly-specialized and very expensive display systems. It also introduces abstraction between the native and shared data, which allows teams to share results without giving away proprietary or sensitive data. The initial implementation of this capability is the Distributed Observer Network (DON) version 3.1. DON 3.1 is available for public release in the NASA Software Store (https://software.nasa.gov/software/KSC-13775) and works with version 3.0 of the Model Process Control specification (an XML Simulation Data Representation and Communication Language) to display complex graphical information and associated Meta-Data.
A distributed pipeline for DIDSON data processing
Li, Liling; Danner, Tyler; Eickholt, Jesse; McCann, Erin L.; Pangle, Kevin; Johnson, Nicholas
2018-01-01
Technological advances in the field of ecology allow data on ecological systems to be collected at high resolution, both temporally and spatially. Devices such as Dual-frequency Identification Sonar (DIDSON) can be deployed in aquatic environments for extended periods and easily generate several terabytes of underwater surveillance data which may need to be processed multiple times. Due to the large amount of data generated and need for flexibility in processing, a distributed pipeline was constructed for DIDSON data making use of the Hadoop ecosystem. The pipeline is capable of ingesting raw DIDSON data, transforming the acoustic data to images, filtering the images, detecting and extracting motion, and generating feature data for machine learning and classification. All of the tasks in the pipeline can be run in parallel and the framework allows for custom processing. Applications of the pipeline include monitoring migration times, determining the presence of a particular species, estimating population size and other fishery management tasks.
Time-independent models of asset returns revisited
NASA Astrophysics Data System (ADS)
Gillemot, L.; Töyli, J.; Kertesz, J.; Kaski, K.
2000-07-01
In this study we investigate various well-known time-independent models of asset returns being simple normal distribution, Student t-distribution, Lévy, truncated Lévy, general stable distribution, mixed diffusion jump, and compound normal distribution. For this we use Standard and Poor's 500 index data of the New York Stock Exchange, Helsinki Stock Exchange index data describing a small volatile market, and artificial data. The results indicate that all models, excluding the simple normal distribution, are, at least, quite reasonable descriptions of the data. Furthermore, the use of differences instead of logarithmic returns tends to make the data looking visually more Lévy-type distributed than it is. This phenomenon is especially evident in the artificial data that has been generated by an inflated random walk process.
Does climate have heavy tails?
NASA Astrophysics Data System (ADS)
Bermejo, Miguel; Mudelsee, Manfred
2013-04-01
When we speak about a distribution with heavy tails, we are referring to the probability of the existence of extreme values will be relatively large. Several heavy-tail models are constructed from Poisson processes, which are the most tractable models. Among such processes, one of the most important are the Lévy processes, which are those process with independent, stationary increments and stochastic continuity. If the random component of a climate process that generates the data exhibits a heavy-tail distribution, and if that fact is ignored by assuming a finite-variance distribution, then there would be serious consequences (in the form, e.g., of bias) for the analysis of extreme values. Yet, it appears that it is an open question to what extent and degree climate data exhibit heavy-tail phenomena. We present a study about the statistical inference in the presence of heavy-tail distribution. In particular, we explore (1) the estimation of tail index of the marginal distribution using several estimation techniques (e.g., Hill estimator, Pickands estimator) and (2) the power of hypothesis tests. The performance of the different methods are compared using artificial time-series by means of Monte Carlo experiments. We systematically apply the heavy tail inference to observed climate data, in particular we focus on time series data. We study several proxy and directly observed climate variables from the instrumental period, the Holocene and the Pleistocene. This work receives financial support from the European Commission (Marie Curie Initial Training Network LINC, No. 289447, within the 7th Framework Programme).
Diffusion of Siderophile Elements in Fe Metal: Application to Zoned Metal Grains in Chondrites
NASA Technical Reports Server (NTRS)
Righter, K.; Campbell, A. J.; Humajun, M.
2003-01-01
The distribution of highly siderophile elements (HSE) in planetary materials is controlled mainly by metal. Diffusion processes can control the distribution or re-distribution of these elements within metals, yet there is little systematic or appropriate diffusion data that can be used to interpret HSE concentrations in such metals. Because our understanding of isotope chronometry, redox processes, kamacite/taenite-based cooling rates, and metal grain zoning would be enhanced with diffusion data, we have measured diffusion coefficients for Ni, Co, Ga, Ge, Ru, Pd, Ir and Au in Fe metal from 1200 to 1400 C and 1 bar and 10 kbar. These new data on refractory and volatile siderophile elements are used to evaluate the role of diffusional processes in controlling zoning patterns in metal-rich chondrites.
Okayama optical polarimetry and spectroscopy system (OOPS) II. Network-transparent control software.
NASA Astrophysics Data System (ADS)
Sasaki, T.; Kurakami, T.; Shimizu, Y.; Yutani, M.
Control system of the OOPS (Okayama Optical Polarimetry and Spectroscopy system) is designed to integrate several instruments whose controllers are distributed over a network; the OOPS instrument, a CCD camera and data acquisition unit, the 91 cm telescope, an autoguider, a weather monitor, and an image display tool SAOimage. With the help of message-based communication, the control processes cooperate with related processes to perform an astronomical observation under supervising control by a scheduler process. A logger process collects status data of all the instruments to distribute them to related processes upon request. Software structure of each process is described.
Design considerations, architecture, and use of the Mini-Sentinel distributed data system.
Curtis, Lesley H; Weiner, Mark G; Boudreau, Denise M; Cooper, William O; Daniel, Gregory W; Nair, Vinit P; Raebel, Marsha A; Beaulieu, Nicolas U; Rosofsky, Robert; Woodworth, Tiffany S; Brown, Jeffrey S
2012-01-01
We describe the design, implementation, and use of a large, multiorganizational distributed database developed to support the Mini-Sentinel Pilot Program of the US Food and Drug Administration (FDA). As envisioned by the US FDA, this implementation will inform and facilitate the development of an active surveillance system for monitoring the safety of medical products (drugs, biologics, and devices) in the USA. A common data model was designed to address the priorities of the Mini-Sentinel Pilot and to leverage the experience and data of participating organizations and data partners. A review of existing common data models informed the process. Each participating organization designed a process to extract, transform, and load its source data, applying the common data model to create the Mini-Sentinel Distributed Database. Transformed data were characterized and evaluated using a series of programs developed centrally and executed locally by participating organizations. A secure communications portal was designed to facilitate queries of the Mini-Sentinel Distributed Database and transfer of confidential data, analytic tools were developed to facilitate rapid response to common questions, and distributed querying software was implemented to facilitate rapid querying of summary data. As of July 2011, information on 99,260,976 health plan members was included in the Mini-Sentinel Distributed Database. The database includes 316,009,067 person-years of observation time, with members contributing, on average, 27.0 months of observation time. All data partners have successfully executed distributed code and returned findings to the Mini-Sentinel Operations Center. This work demonstrates the feasibility of building a large, multiorganizational distributed data system in which organizations retain possession of their data that are used in an active surveillance system. Copyright © 2012 John Wiley & Sons, Ltd.
Forensic Analysis of Window’s(Registered) Virtual Memory Incorporating the System’s Page-File
2008-12-01
Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503. 1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE December...data in a meaningful way. One reason for this is how memory is managed by the operating system. Data belonging to one process can be distributed...way. One reason for this is how memory is managed by the operating system. Data belonging to one process can be distributed arbitrarily across
Study on parallel and distributed management of RS data based on spatial data base
NASA Astrophysics Data System (ADS)
Chen, Yingbiao; Qian, Qinglan; Liu, Shijin
2006-12-01
With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.
NASA Technical Reports Server (NTRS)
Koda, M.; Seinfeld, J. H.
1982-01-01
The reconstruction of a concentration distribution from spatially averaged and noise-corrupted data is a central problem in processing atmospheric remote sensing data. Distributed parameter observer theory is used to develop reconstructibility conditions for distributed parameter systems having measurements typical of those in remote sensing. The relation of the reconstructibility condition to the stability of the distributed parameter observer is demonstrated. The theory is applied to a variety of remote sensing situations, and it is found that those in which concentrations are measured as a function of altitude satisfy the conditions of distributed state reconstructibility.
DataFed: A Federated Data System for Visualization and Analysis of Spatio-Temporal Air Quality Data
NASA Astrophysics Data System (ADS)
Husar, R. B.; Hoijarvi, K.
2017-12-01
DataFed is a distributed web-services-based computing environment for accessing, processing, and visualizing atmospheric data in support of air quality science and management. The flexible, adaptive environment facilitates the access and flow of atmospheric data from provider to users by enabling the creation of user-driven data processing/visualization applications. DataFed `wrapper' components, non-intrusively wrap heterogeneous, distributed datasets for access by standards-based GIS web services. The mediator components (also web services) map the heterogeneous data into a spatio-temporal data model. Chained web services provide homogeneous data views (e.g., geospatial, time views) using a global multi-dimensional data model. In addition to data access and rendering, the data processing component services can be programmed for filtering, aggregation, and fusion of multidimensional data. A complete application software is written in a custom made data flow language. Currently, the federated data pool consists of over 50 datasets originating from globally distributed data providers delivering surface-based air quality measurements, satellite observations, emissions data as well as regional and global-scale air quality models. The web browser-based user interface allows point and click navigation and browsing the XYZT multi-dimensional data space. The key applications of DataFed are for exploring spatial pattern of pollutants, seasonal, weekly, diurnal cycles and frequency distributions for exploratory air quality research. Since 2008, DataFed has been used to support EPA in the implementation of the Exceptional Event Rule. The data system is also used at universities in the US, Europe and Asia.
Study of data I/O performance on distributed disk system in mask data preparation
NASA Astrophysics Data System (ADS)
Ohara, Shuichiro; Odaira, Hiroyuki; Chikanaga, Tomoyuki; Hamaji, Masakazu; Yoshioka, Yasuharu
2010-09-01
Data volume is getting larger every day in Mask Data Preparation (MDP). In the meantime, faster data handling is always required. MDP flow typically introduces Distributed Processing (DP) system to realize the demand because using hundreds of CPU is a reasonable solution. However, even if the number of CPU were increased, the throughput might be saturated because hard disk I/O and network speeds could be bottlenecks. So, MDP needs to invest a lot of money to not only hundreds of CPU but also storage and a network device which make the throughput faster. NCS would like to introduce new distributed processing system which is called "NDE". NDE could be a distributed disk system which makes the throughput faster without investing a lot of money because it is designed to use multiple conventional hard drives appropriately over network. NCS studies I/O performance with OASIS® data format on NDE which contributes to realize the high throughput in this paper.
Study on parallel and distributed management of RS data based on spatial database
NASA Astrophysics Data System (ADS)
Chen, Yingbiao; Qian, Qinglan; Wu, Hongqiao; Liu, Shijin
2009-10-01
With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.
State estimation for distributed systems with sensing delay
NASA Astrophysics Data System (ADS)
Alexander, Harold L.
1991-08-01
Control of complex systems such as remote robotic vehicles requires combining data from many sensors where the data may often be delayed by sensory processing requirements. The number and variety of sensors make it desirable to distribute the computational burden of sensing and estimation among multiple processors. Classic Kalman filters do not lend themselves to distributed implementations or delayed measurement data. The alternative Kalman filter designs presented in this paper are adapted for delays in sensor data generation and for distribution of computation for sensing and estimation over a set of networked processors.
AIRSAR Automated Web-based Data Processing and Distribution System
NASA Technical Reports Server (NTRS)
Chu, Anhua; vanZyl, Jakob; Kim, Yunjin; Lou, Yunling; Imel, David; Tung, Wayne; Chapman, Bruce; Durden, Stephen
2005-01-01
In this paper, we present an integrated, end-to-end synthetic aperture radar (SAR) processing system that accepts data processing requests, submits processing jobs, performs quality analysis, delivers and archives processed data. This fully automated SAR processing system utilizes database and internet/intranet web technologies to allow external users to browse and submit data processing requests and receive processed data. It is a cost-effective way to manage a robust SAR processing and archival system. The integration of these functions has reduced operator errors and increased processor throughput dramatically.
NASA Astrophysics Data System (ADS)
Evans, M. E.; Merow, C.; Record, S.; Menlove, J.; Gray, A.; Cundiff, J.; McMahon, S.; Enquist, B. J.
2013-12-01
Current attempts to forecast how species' distributions will change in response to climate change suffer under a fundamental trade-off: between modeling many species superficially vs. few species in detail (between correlative vs. mechanistic models). The goals of this talk are two-fold: first, we present a Bayesian multilevel modeling framework, dynamic range modeling (DRM), for building process-based forecasts of many species' distributions at a time, designed to address the trade-off between detail and number of distribution forecasts. In contrast to 'species distribution modeling' or 'niche modeling', which uses only species' occurrence data and environmental data, DRMs draw upon demographic data, abundance data, trait data, occurrence data, and GIS layers of climate in a single framework to account for two processes known to influence range dynamics - demography and dispersal. The vision is to use extensive databases on plant demography, distributions, and traits - in the Botanical Information and Ecology Network, the Forest Inventory and Analysis database (FIA), and the International Tree Ring Data Bank - to develop DRMs for North American trees. Second, we present preliminary results from building the core submodel of a DRM - an integral projection model (IPM) - for a sample of dominant tree species in western North America. IPMs are used to infer demographic niches - i.e., the set of environmental conditions under which population growth rate is positive - and project population dynamics through time. Based on >550,000 data points derived from FIA for nine tree species in western North America, we show IPM-based models of their current and future distributions, and discuss how IPMs can be used to forecast future forest productivity, mortality patterns, and inform efforts at assisted migration.
Intelligent Control of Micro Grid: A Big Data-Based Control Center
NASA Astrophysics Data System (ADS)
Liu, Lu; Wang, Yanping; Liu, Li; Wang, Zhiseng
2018-01-01
In this paper, a structure of micro grid system with big data-based control center is introduced. Energy data from distributed generation, storage and load are analized through the control center, and from the results new trends will be predicted and applied as a feedback to optimize the control. Therefore, each step proceeded in micro grid can be adjusted and orgnized in a form of comprehensive management. A framework of real-time data collection, data processing and data analysis will be proposed by employing big data technology. Consequently, a integrated distributed generation and a optimized energy storage and transmission process can be implemented in the micro grid system.
NASA Astrophysics Data System (ADS)
Cincotti, Silvano; Ponta, Linda; Raberto, Marco; Scalas, Enrico
2005-05-01
In this paper, empirical analyses and computational experiments are presented on high-frequency data for a double-auction (book) market. Main objective of the paper is to generalize the order waiting time process in order to properly model such empirical evidences. The empirical study is performed on the best bid and best ask data of 7 U.S. financial markets, for 30-stock time series. In particular, statistical properties of trading waiting times have been analyzed and quality of fits is evaluated by suitable statistical tests, i.e., comparing empirical distributions with theoretical models. Starting from the statistical studies on real data, attention has been focused on the reproducibility of such results in an artificial market. The computational experiments have been performed within the Genoa Artificial Stock Market. In the market model, heterogeneous agents trade one risky asset in exchange for cash. Agents have zero intelligence and issue random limit or market orders depending on their budget constraints. The price is cleared by means of a limit order book. The order generation is modelled with a renewal process. Based on empirical trading estimation, the distribution of waiting times between two consecutive orders is modelled by a mixture of exponential processes. Results show that the empirical waiting-time distribution can be considered as a generalization of a Poisson process. Moreover, the renewal process can approximate real data and implementation on the artificial stocks market can reproduce the trading activity in a realistic way.
System and Method for Monitoring Distributed Asset Data
NASA Technical Reports Server (NTRS)
Gorinevsky, Dimitry (Inventor)
2015-01-01
A computer-based monitoring system and monitoring method implemented in computer software for detecting, estimating, and reporting the condition states, their changes, and anomalies for many assets. The assets are of same type, are operated over a period of time, and outfitted with data collection systems. The proposed monitoring method accounts for variability of working conditions for each asset by using regression model that characterizes asset performance. The assets are of the same type but not identical. The proposed monitoring method accounts for asset-to-asset variability; it also accounts for drifts and trends in the asset condition and data. The proposed monitoring system can perform distributed processing of massive amounts of historical data without discarding any useful information where moving all the asset data into one central computing system might be infeasible. The overall processing is includes distributed preprocessing data records from each asset to produce compressed data.
Federated Giovanni: A Distributed Web Service for Analysis and Visualization of Remote Sensing Data
NASA Technical Reports Server (NTRS)
Lynnes, Chris
2014-01-01
The Geospatial Interactive Online Visualization and Analysis Interface (Giovanni) is a popular tool for users of the Goddard Earth Sciences Data and Information Services Center (GES DISC) and has been in use for over a decade. It provides a wide variety of algorithms and visualizations to explore large remote sensing datasets without having to download the data and without having to write readers and visualizers for it. Giovanni is now being extended to enable its capabilities at other data centers within the Earth Observing System Data and Information System (EOSDIS). This Federated Giovanni will allow four other data centers to add and maintain their data within Giovanni on behalf of their user community. Those data centers are the Physical Oceanography Distributed Active Archive Center (PO.DAAC), MODIS Adaptive Processing System (MODAPS), Ocean Biology Processing Group (OBPG), and Land Processes Distributed Active Archive Center (LP DAAC). Three tiers are supported: Tier 1 (GES DISC-hosted) gives the remote data center a data management interface to add and maintain data, which are provided through the Giovanni instance at the GES DISC. Tier 2 packages Giovanni up as a virtual machine for distribution to and deployment by the other data centers. Data variables are shared among data centers by sharing documents from the Solr database that underpins Giovanni's data management capabilities. However, each data center maintains their own instance of Giovanni, exposing the variables of most interest to their user community. Tier 3 is a Shared Source model, in which the data centers cooperate to extend the infrastructure by contributing source code.
The future of PanDA in ATLAS distributed computing
NASA Astrophysics Data System (ADS)
De, K.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.
2015-12-01
Experiments at the Large Hadron Collider (LHC) face unprecedented computing challenges. Heterogeneous resources are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, while data processing requires more than a few billion hours of computing usage per year. The PanDA (Production and Distributed Analysis) system was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. In the process, the old batch job paradigm of locally managed computing in HEP was discarded in favour of a far more automated, flexible and scalable model. The success of PanDA in ATLAS is leading to widespread adoption and testing by other experiments. PanDA is the first exascale workload management system in HEP, already operating at more than a million computing jobs per day, and processing over an exabyte of data in 2013. There are many new challenges that PanDA will face in the near future, in addition to new challenges of scale, heterogeneity and increasing user base. PanDA will need to handle rapidly changing computing infrastructure, will require factorization of code for easier deployment, will need to incorporate additional information sources including network metrics in decision making, be able to control network circuits, handle dynamically sized workload processing, provide improved visualization, and face many other challenges. In this talk we will focus on the new features, planned or recently implemented, that are relevant to the next decade of distributed computing workload management using PanDA.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC
NASA Astrophysics Data System (ADS)
Alruwaili, Manal
With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
Optimized distributed computing environment for mask data preparation
NASA Astrophysics Data System (ADS)
Ahn, Byoung-Sup; Bang, Ju-Mi; Ji, Min-Kyu; Kang, Sun; Jang, Sung-Hoon; Choi, Yo-Han; Ki, Won-Tai; Choi, Seong-Woon; Han, Woo-Sung
2005-11-01
As the critical dimension (CD) becomes smaller, various resolution enhancement techniques (RET) are widely adopted. In developing sub-100nm devices, the complexity of optical proximity correction (OPC) is severely increased and applied OPC layers are expanded to non-critical layers. The transformation of designed pattern data by OPC operation causes complexity, which cause runtime overheads to following steps such as mask data preparation (MDP), and collapse of existing design hierarchy. Therefore, many mask shops exploit the distributed computing method in order to reduce the runtime of mask data preparation rather than exploit the design hierarchy. Distributed computing uses a cluster of computers that are connected to local network system. However, there are two things to limit the benefit of the distributing computing method in MDP. First, every sequential MDP job, which uses maximum number of available CPUs, is not efficient compared to parallel MDP job execution due to the input data characteristics. Second, the runtime enhancement over input cost is not sufficient enough since the scalability of fracturing tools is limited. In this paper, we will discuss optimum load balancing environment that is useful in increasing the uptime of distributed computing system by assigning appropriate number of CPUs for each input design data. We will also describe the distributed processing (DP) parameter optimization to obtain maximum throughput in MDP job processing.
Using Cloud Storage for NMR Data Distribution
ERIC Educational Resources Information Center
Soulsby, David
2012-01-01
An approach using Google Groups as method for distributing student-acquired NMR data has been implemented. We describe how to configure NMR spectrometer software so that data is uploaded to a laboratory section specific Google Group, thereby removing bottlenecks associated with printing and processing at the spectrometer workstation. Outside of…
Network-Capable Application Process and Wireless Intelligent Sensors for ISHM
NASA Technical Reports Server (NTRS)
Figueroa, Fernando; Morris, Jon; Turowski, Mark; Wang, Ray
2011-01-01
Intelligent sensor technology and systems are increasingly becoming attractive means to serve as frameworks for intelligent rocket test facilities with embedded intelligent sensor elements, distributed data acquisition elements, and onboard data acquisition elements. Networked intelligent processors enable users and systems integrators to automatically configure their measurement automation systems for analog sensors. NASA and leading sensor vendors are working together to apply the IEEE 1451 standard for adding plug-and-play capabilities for wireless analog transducers through the use of a Transducer Electronic Data Sheet (TEDS) in order to simplify sensor setup, use, and maintenance, to automatically obtain calibration data, and to eliminate manual data entry and error. A TEDS contains the critical information needed by an instrument or measurement system to identify, characterize, interface, and properly use the signal from an analog sensor. A TEDS is deployed for a sensor in one of two ways. First, the TEDS can reside in embedded, nonvolatile memory (typically flash memory) within the intelligent processor. Second, a virtual TEDS can exist as a separate file, downloadable from the Internet. This concept of virtual TEDS extends the benefits of the standardized TEDS to legacy sensors and applications where the embedded memory is not available. An HTML-based user interface provides a visual tool to interface with those distributed sensors that a TEDS is associated with, to automate the sensor management process. Implementing and deploying the IEEE 1451.1-based Network-Capable Application Process (NCAP) can achieve support for intelligent process in Integrated Systems Health Management (ISHM) for the purpose of monitoring, detection of anomalies, diagnosis of causes of anomalies, prediction of future anomalies, mitigation to maintain operability, and integrated awareness of system health by the operator. It can also support local data collection and storage. This invention enables wide-area sensing and employs numerous globally distributed sensing devices that observe the physical world through the existing sensor network. This innovation enables distributed storage, distributed processing, distributed intelligence, and the availability of DiaK (Data, Information, and Knowledge) to any element as needed. It also enables the simultaneous execution of multiple processes, and represents models that contribute to the determination of the condition and health of each element in the system. The NCAP (intelligent process) can configure data-collection and filtering processes in reaction to sensed data, allowing it to decide when and how to adapt collection and processing with regard to sophisticated analysis of data derived from multiple sensors. The user will be able to view the sensing device network as a single unit that supports a high-level query language. Each query would be able to operate over data collected from across the global sensor network just as a search query encompasses millions of Web pages. The sensor web can preserve ubiquitous information access between the querier and the queried data. Pervasive monitoring of the physical world raises significant data and privacy concerns. This innovation enables different authorities to control portions of the sensing infrastructure, and sensor service authors may wish to compose services across authority boundaries.
GOES-R User Data Types and Structure
NASA Astrophysics Data System (ADS)
Royle, A. W.
2012-12-01
GOES-R meteorological data is provided to the operational and science user community through four main distribution mechanisms. The GOES-R Ground Segment (GS) generates a set of Level 1b (L1b) data from each of the six primary satellite instruments and formats the data into a direct broadcast stream known as GOES Rebroadcast (GRB). Terrestrially, cloud and moisture imagery data is provided to forecasters at the National Weather Service (NWS) through a direct interface to the Advanced Weather Interactive Processing System (AWIPS). A secondary pathway for the user community to receive data terrestrially is via NOAA's Environmental Satellite Processing and Distribution System (ESPDS) Product Distribution and Access (PDA) system. The ESPDS PDA will service the NWS and other meteorological users through a data portal, which provides both a subscription service and an ad hoc query capability. Finally, GOES-R data is made available to NOAA's Comprehensive Large Array-Data Stewardship System (CLASS) for long-term archive. CLASS data includes the L1b and L2+ products sent to PDA, along with the Level 0 data used to create these products, and other data used for product generation and processing. This session will provide a summary description of the data types and formats associated with each of the four primary distribution pathways for user data from GOES-R. It will discuss the resources that are being developed by GOES-R to document the data structures and formats. It will also provide a brief introduction to the types of metadata associated with each of the primary data flows.
PILOT: An intelligent distributed operations support system
NASA Technical Reports Server (NTRS)
Rasmussen, Arthur N.
1993-01-01
The Real-Time Data System (RTDS) project is exploring the application of advanced technologies to the real-time flight operations environment of the Mission Control Centers at NASA's Johnson Space Center. The system, based on a network of engineering workstations, provides services such as delivery of real time telemetry data to flight control applications. To automate the operation of this complex distributed environment, a facility called PILOT (Process Integrity Level and Operation Tracker) is being developed. PILOT comprises a set of distributed agents cooperating with a rule-based expert system; together they monitor process operation and data flows throughout the RTDS network. The goal of PILOT is to provide unattended management and automated operation under user control.
Bosse, Stefan
2015-01-01
Multi-agent systems (MAS) can be used for decentralized and self-organizing data processing in a distributed system, like a resource-constrained sensor network, enabling distributed information extraction, for example, based on pattern recognition and self-organization, by decomposing complex tasks in simpler cooperative agents. Reliable MAS-based data processing approaches can aid the material-integration of structural-monitoring applications, with agent processing platforms scaled to the microchip level. The agent behavior, based on a dynamic activity-transition graph (ATG) model, is implemented with program code storing the control and the data state of an agent, which is novel. The program code can be modified by the agent itself using code morphing techniques and is capable of migrating in the network between nodes. The program code is a self-contained unit (a container) and embeds the agent data, the initialization instructions and the ATG behavior implementation. The microchip agent processing platform used for the execution of the agent code is a standalone multi-core stack machine with a zero-operand instruction format, leading to a small-sized agent program code, low system complexity and high system performance. The agent processing is token-queue-based, similar to Petri-nets. The agent platform can be implemented in software, too, offering compatibility at the operational and code level, supporting agent processing in strong heterogeneous networks. In this work, the agent platform embedded in a large-scale distributed sensor network is simulated at the architectural level by using agent-based simulation techniques. PMID:25690550
Bosse, Stefan
2015-02-16
Multi-agent systems (MAS) can be used for decentralized and self-organizing data processing in a distributed system, like a resource-constrained sensor network, enabling distributed information extraction, for example, based on pattern recognition and self-organization, by decomposing complex tasks in simpler cooperative agents. Reliable MAS-based data processing approaches can aid the material-integration of structural-monitoring applications, with agent processing platforms scaled to the microchip level. The agent behavior, based on a dynamic activity-transition graph (ATG) model, is implemented with program code storing the control and the data state of an agent, which is novel. The program code can be modified by the agent itself using code morphing techniques and is capable of migrating in the network between nodes. The program code is a self-contained unit (a container) and embeds the agent data, the initialization instructions and the ATG behavior implementation. The microchip agent processing platform used for the execution of the agent code is a standalone multi-core stack machine with a zero-operand instruction format, leading to a small-sized agent program code, low system complexity and high system performance. The agent processing is token-queue-based, similar to Petri-nets. The agent platform can be implemented in software, too, offering compatibility at the operational and code level, supporting agent processing in strong heterogeneous networks. In this work, the agent platform embedded in a large-scale distributed sensor network is simulated at the architectural level by using agent-based simulation techniques.
ERIC Educational Resources Information Center
Liu, Yan; Bellibas, Mehmet Sukru; Printy, Susan
2018-01-01
Distributed leadership is a dynamic process and reciprocal interaction of the leader, the subordinates and the situation. This research was inspired by the theoretical framework of Spillane in order to contextualize distributed leadership and compare the variations using the Teaching and Learning International Survey 2013 data. The two-level…
NASA Astrophysics Data System (ADS)
Ntarlagiannis, D.; Ustra, A.; Slater, L. D.; Zhang, C.; Mendonça, C. A.
2015-12-01
In this work we present an alternative formulation of the Debye Decomposition (DD) of complex conductivity spectra, with a new set of parameters that are directly related to the continuous Debye relaxation model. The procedure determines the relaxation time distribution (RTD) and two frequency-independent parameters that modulate the induced polarization spectra. The distribution of relaxation times quantifies the contribution of each distinct relaxation process, which can in turn be associated with specific polarization processes and characterized in terms of electrochemical and interfacial parameters as derived from mechanistic models. Synthetic tests show that the procedure can successfully fit spectral induced polarization (SIP) data and accurately recover the RTD. The procedure was applied to different data sets, focusing on environmental applications. We focus on data of sand-clay mixtures artificially contaminated with toluene, and crude oil-contaminated sands experiencing biodegradation. The results identify characteristic relaxation times that can be associated with distinct polarization processes resulting from either the contaminant itself or transformations associated with biodegradation. The inversion results provide information regarding the relative strength and dominant relaxation time of these polarization processes.
A Supervised Learning Process to Validate Online Disease Reports for Use in Predictive Models.
Patching, Helena M M; Hudson, Laurence M; Cooke, Warrick; Garcia, Andres J; Hay, Simon I; Roberts, Mark; Moyes, Catherine L
2015-12-01
Pathogen distribution models that predict spatial variation in disease occurrence require data from a large number of geographic locations to generate disease risk maps. Traditionally, this process has used data from public health reporting systems; however, using online reports of new infections could speed up the process dramatically. Data from both public health systems and online sources must be validated before they can be used, but no mechanisms exist to validate data from online media reports. We have developed a supervised learning process to validate geolocated disease outbreak data in a timely manner. The process uses three input features, the data source and two metrics derived from the location of each disease occurrence. The location of disease occurrence provides information on the probability of disease occurrence at that location based on environmental and socioeconomic factors and the distance within or outside the current known disease extent. The process also uses validation scores, generated by disease experts who review a subset of the data, to build a training data set. The aim of the supervised learning process is to generate validation scores that can be used as weights going into the pathogen distribution model. After analyzing the three input features and testing the performance of alternative processes, we selected a cascade of ensembles comprising logistic regressors. Parameter values for the training data subset size, number of predictors, and number of layers in the cascade were tested before the process was deployed. The final configuration was tested using data for two contrasting diseases (dengue and cholera), and 66%-79% of data points were assigned a validation score. The remaining data points are scored by the experts, and the results inform the training data set for the next set of predictors, as well as going to the pathogen distribution model. The new supervised learning process has been implemented within our live site and is being used to validate the data that our system uses to produce updated predictive disease maps on a weekly basis.
MULTIPROCESSOR AND DISTRIBUTED PROCESSING BIBLIOGRAPHIC DATA BASE SOFTWARE SYSTEM
NASA Technical Reports Server (NTRS)
Miya, E. N.
1994-01-01
Multiprocessors and distributed processing are undergoing increased scientific scrutiny for many reasons. It is more and more difficult to keep track of the existing research in these fields. This package consists of a large machine-readable bibliographic data base which, in addition to the usual keyword searches, can be used for producing citations, indexes, and cross-references. The data base is compiled from smaller existing multiprocessing bibliographies, and tables of contents from journals and significant conferences. There are approximately 4,000 entries covering topics such as parallel and vector processing, networks, supercomputers, fault-tolerant computers, and cellular automata. Each entry is represented by 21 fields including keywords, author, referencing book or journal title, volume and page number, and date and city of publication. The data base contains UNIX 'refer' formatted ASCII data and can be implemented on any computer running under the UNIX operating system. The data base requires approximately one megabyte of secondary storage. The documentation for this program is included with the distribution tape, although it can be purchased for the price below. This bibliography was compiled in 1985 and updated in 1988.
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop
NASA Astrophysics Data System (ADS)
Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.
2018-04-01
The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
Data Processing Factory for the Sloan Digital Sky Survey
NASA Astrophysics Data System (ADS)
Stoughton, Christopher; Adelman, Jennifer; Annis, James T.; Hendry, John; Inkmann, John; Jester, Sebastian; Kent, Steven M.; Kuropatkin, Nickolai; Lee, Brian; Lin, Huan; Peoples, John, Jr.; Sparks, Robert; Tucker, Douglas; Vanden Berk, Dan; Yanny, Brian; Yocum, Dan
2002-12-01
The Sloan Digital Sky Survey (SDSS) data handling presents two challenges: large data volume and timely production of spectroscopic plates from imaging data. A data processing factory, using technologies both old and new, handles this flow. Distribution to end users is via disk farms, to serve corrected images and calibrated spectra, and a database, to efficiently process catalog queries. For distribution of modest amounts of data from Apache Point Observatory to Fermilab, scripts use rsync to update files, while larger data transfers are accomplished by shipping magnetic tapes commercially. All data processing pipelines are wrapped in scripts to address consecutive phases: preparation, submission, checking, and quality control. We constructed the factory by chaining these pipelines together while using an operational database to hold processed imaging catalogs. The science database catalogs all imaging and spectroscopic object, with pointers to the various external files associated with them. Diverse computing systems address particular processing phases. UNIX computers handle tape reading and writing, as well as calibration steps that require access to a large amount of data with relatively modest computational demands. Commodity CPUs process steps that require access to a limited amount of data with more demanding computations requirements. Disk servers optimized for cost per Gbyte serve terabytes of processed data, while servers optimized for disk read speed run SQLServer software to process queries on the catalogs. This factory produced data for the SDSS Early Data Release in June 2001, and it is currently producing Data Release One, scheduled for January 2003.
The ATLAS PanDA Pilot in Operation
NASA Astrophysics Data System (ADS)
Nilsson, P.; Caballero, J.; De, K.; Maeno, T.; Stradling, A.; Wenaus, T.; ATLAS Collaboration
2011-12-01
The Production and Distributed Analysis system (PanDA) [1-2] was designed to meet ATLAS [3] requirements for a data-driven workload management system capable of operating at LHC data processing scale. Submitted jobs are executed on worker nodes by pilot jobs sent to the grid sites by pilot factories. This paper provides an overview of the PanDA pilot [4] system and presents major features added in light of recent operational experience, including multi-job processing, advanced job recovery for jobs with output storage failures, gLExec [5-6] based identity switching from the generic pilot to the actual user, and other security measures. The PanDA system serves all ATLAS distributed processing and is the primary system for distributed analysis; it is currently used at over 100 sites worldwide. We analyze the performance of the pilot system in processing real LHC data on the OSG [7], EGI [8] and Nordugrid [9-10] infrastructures used by ATLAS, and describe plans for its evolution.
1983-06-01
LOSARDO Project Engineer APPROVED: .MARMCINIhI, Colonel. USAF Chief, Coaud and Control Division FOR THE CCOaIDKR: Acting Chief, Plea Off ice * **711...WORK UNIT NUMBERS General Dynamics Corporation 62702F Data Systems Division P 0 Box 748, Fort Worth TX 76101 55811829 I1. CONTROLLING OFFICE NAME AND...Processing System for 29 the Operation/Direction Center(s) 4-3 Distribution of Processing Control 30 for the Operation/Direction Center(s) 4-4 Generalized
Incorporating Skew into RMS Surface Roughness Probability Distribution
NASA Technical Reports Server (NTRS)
Stahl, Mark T.; Stahl, H. Philip.
2013-01-01
The standard treatment of RMS surface roughness data is the application of a Gaussian probability distribution. This handling of surface roughness ignores the skew present in the surface and overestimates the most probable RMS of the surface, the mode. Using experimental data we confirm the Gaussian distribution overestimates the mode and application of an asymmetric distribution provides a better fit. Implementing the proposed asymmetric distribution into the optical manufacturing process would reduce the polishing time required to meet surface roughness specifications.
Full-chip level MEEF analysis using model based lithography verification
NASA Astrophysics Data System (ADS)
Kim, Juhwan; Wang, Lantian; Zhang, Daniel; Tang, Zongwu
2005-11-01
MEEF (Mask Error Enhancement Factor) has become a critical factor in CD uniformity control since optical lithography process moved to sub-resolution era. A lot of studies have been done by quantifying the impact of the mask CD (Critical Dimension) errors on the wafer CD errors1-2. However, the benefits from those studies were restricted only to small pattern areas of the full-chip data due to long simulation time. As fast turn around time can be achieved for the complicated verifications on very large data by linearly scalable distributed processing technology, model-based lithography verification becomes feasible for various types of applications such as post mask synthesis data sign off for mask tape out in production and lithography process development with full-chip data3,4,5. In this study, we introduced two useful methodologies for the full-chip level verification of mask error impact on wafer lithography patterning process. One methodology is to check MEEF distribution in addition to CD distribution through process window, which can be used for RET/OPC optimization at R&D stage. The other is to check mask error sensitivity on potential pinch and bridge hotspots through lithography process variation, where the outputs can be passed on to Mask CD metrology to add CD measurements on those hotspot locations. Two different OPC data were compared using the two methodologies in this study.
Control structures for high speed processors
NASA Technical Reports Server (NTRS)
Maki, G. K.; Mankin, R.; Owsley, P. A.; Kim, G. M.
1982-01-01
A special processor was designed to function as a Reed Solomon decoder with throughput data rate in the Mhz range. This data rate is significantly greater than is possible with conventional digital architectures. To achieve this rate, the processor design includes sequential, pipelined, distributed, and parallel processing. The processor was designed using a high level language register transfer language. The RTL can be used to describe how the different processes are implemented by the hardware. One problem of special interest was the development of dependent processes which are analogous to software subroutines. For greater flexibility, the RTL control structure was implemented in ROM. The special purpose hardware required approximately 1000 SSI and MSI components. The data rate throughput is 2.5 megabits/second. This data rate is achieved through the use of pipelined and distributed processing. This data rate can be compared with 800 kilobits/second in a recently proposed very large scale integration design of a Reed Solomon encoder.
Hadoop distributed batch processing for Gaia: a success story
NASA Astrophysics Data System (ADS)
Riello, Marco
2015-12-01
The DPAC Cambridge Data Processing Centre (DPCI) is responsible for the photometric calibration of the Gaia data including the low resolution spectra. The large data volume produced by Gaia (~26 billion transits/year), the complexity of its data stream and the self-calibrating approach pose unique challenges for scalability, reliability and robustness of both the software pipelines and the operations infrastructure. DPCI has been the first in DPAC to realise the potential of Hadoop and Map/Reduce and to adopt them as the core technologies for its infrastructure. This has proven a winning choice allowing DPCI unmatched processing throughput and reliability within DPAC to the point that other DPCs have started following our footsteps. In this talk we will present the software infrastructure developed to build the distributed and scalable batch data processing system that is currently used in production at DPCI and the excellent results in terms of performance of the system.
NASA Astrophysics Data System (ADS)
Davies, D.; Murphy, K. J.; Michael, K.
2013-12-01
NASA's Land Atmosphere Near real-time Capability for EOS (Earth Observing System) (LANCE) provides data and imagery from Terra, Aqua and Aura satellites in less than 3 hours from satellite observation, to meet the needs of the near real-time (NRT) applications community. This article describes the architecture of the LANCE and outlines the modifications made to achieve the 3-hour latency requirement with a view to informing future NRT satellite distribution capabilities. It also describes how latency is determined. LANCE is a distributed system that builds on the existing EOS Data and Information System (EOSDIS) capabilities. To achieve the NRT latency requirement, many components of the EOS satellite operations, ground and science processing systems have been made more efficient without compromising the quality of science data processing. The EOS Data and Operations System (EDOS) processes the NRT stream with higher priority than the science data stream in order to minimize latency. In addition to expediting transfer times, the key difference between the NRT Level 0 products and those for standard science processing is the data used to determine the precise location and tilt of the satellite. Standard products use definitive geo-location (attitude and ephemeris) data provided daily, whereas NRT products use predicted geo-location provided by the instrument Global Positioning System (GPS) or approximation of navigational data (depending on platform). Level 0 data are processed in to higher-level products at designated Science Investigator-led Processing Systems (SIPS). The processes used by LANCE have been streamlined and adapted to work with datasets as soon as they are downlinked from satellites or transmitted from ground stations. Level 2 products that require ancillary data have modified production rules to relax the requirements for ancillary data so reducing processing times. Looking to the future, experience gained from LANCE can provide valuable lessons on satellite and ground system architectures and on how the delivery of NRT products from other NASA missions might be achieved.
de Beer, R; Graveron-Demilly, D; Nastase, S; van Ormondt, D
2004-03-01
Recently we have developed a Java-based heterogeneous distributed computing system for the field of magnetic resonance imaging (MRI). It is a software system for embedding the various image reconstruction algorithms that we have created for handling MRI data sets with sparse sampling distributions. Since these data sets may result from multi-dimensional MRI measurements our system has to control the storage and manipulation of large amounts of data. In this paper we describe how we have employed the extensible markup language (XML) to realize this data handling in a highly structured way. To that end we have used Java packages, recently released by Sun Microsystems, to process XML documents and to compile pieces of XML code into Java classes. We have effectuated a flexible storage and manipulation approach for all kinds of data within the MRI system, such as data describing and containing multi-dimensional MRI measurements, data configuring image reconstruction methods and data representing and visualizing the various services of the system. We have found that the object-oriented approach, possible with the Java programming environment, combined with the XML technology is a convenient way of describing and handling various data streams in heterogeneous distributed computing systems.
A gossip based information fusion protocol for distributed frequent itemset mining
NASA Astrophysics Data System (ADS)
Sohrabi, Mohammad Karim
2018-07-01
The computational complexity, huge memory space requirement, and time-consuming nature of frequent pattern mining process are the most important motivations for distribution and parallelization of this mining process. On the other hand, the emergence of distributed computational and operational environments, which causes the production and maintenance of data on different distributed data sources, makes the parallelization and distribution of the knowledge discovery process inevitable. In this paper, a gossip based distributed itemset mining (GDIM) algorithm is proposed to extract frequent itemsets, which are special types of frequent patterns, in a wireless sensor network environment. In this algorithm, local frequent itemsets of each sensor are extracted using a bit-wise horizontal approach (LHPM) from the nodes which are clustered using a leach-based protocol. Heads of clusters exploit a gossip based protocol in order to communicate each other to find the patterns which their global support is equal to or more than the specified support threshold. Experimental results show that the proposed algorithm outperforms the best existing gossip based algorithm in term of execution time.
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines
2011-01-01
Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.
Cieślik, Marcin; Mura, Cameron
2011-02-25
Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.
NASA Astrophysics Data System (ADS)
Neill, Aaron; Reaney, Sim
2015-04-01
Fully-distributed, physically-based rainfall-runoff models attempt to capture some of the complexity of the runoff processes that operate within a catchment, and have been used to address a variety of issues including water quality and the effect of climate change on flood frequency. Two key issues are prevalent, however, which call into question the predictive capability of such models. The first is the issue of parameter equifinality which can be responsible for large amounts of uncertainty. The second is whether such models make the right predictions for the right reasons - are the processes operating within a catchment correctly represented, or do the predictive abilities of these models result only from the calibration process? The use of additional data sources, such as environmental tracers, has been shown to help address both of these issues, by allowing for multi-criteria model calibration to be undertaken, and by permitting a greater understanding of the processes operating in a catchment and hence a more thorough evaluation of how well catchment processes are represented in a model. Using discharge and oxygen-18 data sets, the ability of the fully-distributed, physically-based CRUM3 model to represent the runoff processes in three sub-catchments in Cumbria, NW England has been evaluated. These catchments (Morland, Dacre and Pow) are part of the of the River Eden demonstration test catchment project. The oxygen-18 data set was firstly used to derive transit-time distributions and mean residence times of water for each of the catchments to gain an integrated overview of the types of processes that were operating. A generalised likelihood uncertainty estimation procedure was then used to calibrate the CRUM3 model for each catchment based on a single discharge data set from each catchment. Transit-time distributions and mean residence times of water obtained from the model using the top 100 behavioural parameter sets for each catchment were then compared to those derived from the oxygen-18 data to see how well the model captured catchment dynamics. The value of incorporating the oxygen-18 data set, as well as discharge data sets from multiple as opposed to single gauging stations in each catchment, in the calibration process to improve the predictive capability of the model was then investigated. This was achieved by assessing by how much the identifiability of the model parameters and the ability of the model to represent the runoff processes operating in each catchment improved with the inclusion of the additional data sets with respect to the likely costs that would be incurred in obtaining the data sets themselves.
The Raid distributed database system
NASA Technical Reports Server (NTRS)
Bhargava, Bharat; Riedl, John
1989-01-01
Raid, a robust and adaptable distributed database system for transaction processing (TP), is described. Raid is a message-passing system, with server processes on each site to manage concurrent processing, consistent replicated copies during site failures, and atomic distributed commitment. A high-level layered communications package provides a clean location-independent interface between servers. The latest design of the package delivers messages via shared memory in a configuration with several servers linked into a single process. Raid provides the infrastructure to investigate various methods for supporting reliable distributed TP. Measurements on TP and server CPU time are presented, along with data from experiments on communications software, consistent replicated copy control during site failures, and concurrent distributed checkpointing. A software tool for evaluating the implementation of TP algorithms in an operating-system kernel is proposed.
Measurement of the Drell-Yan angular distribution in the dimuon channel using 2011 CMS data
NASA Astrophysics Data System (ADS)
Silvers, David I.
The angular distributions of muons produced by the Drell-Yan process are measured as a function of dimuon transverse momentum in two ranges of rapidity. Events from pp collisions at sqrt( s) = 7 TeV were collected with the CMS detector using dimuon triggers and selected from data samples corresponding to 4.9 fb-1 of integrated luminosity. The two-dimensional angular distribution dN/dO of the negative muon in the Collins-Soper frame is fitted to determine the coefficients in a parametric form of the angular distribution. The measured coefficients are compared to next-to-leading order calculations. We observe that qq and leading order qg production dominate the Drell-Yan process at pT (mumu) <55 GeV/c, while higher-order qg production dominates the Drell-Yan process for 55< pT (mumu) <120 GeV/c.
[Rank distributions in community ecology from the statistical viewpoint].
Maksimov, V N
2004-01-01
Traditional statistical methods for definition of empirical functions of abundance distribution (population, biomass, production, etc.) of species in a community are applicable for processing of multivariate data contained in the above quantitative indices of the communities. In particular, evaluation of moments of distribution suffices for convolution of the data contained in a list of species and their abundance. At the same time, the species should be ranked in the list in ascending rather than descending population and the distribution models should be analyzed on the basis of the data on abundant species only.
Manufacturing, Marketing and Distribution, Business and Office Occupations: Grade 8. Cluster III.
ERIC Educational Resources Information Center
Calhoun, Olivia H.
A curriculum guide for grade 8, the document is divided into eleven units: marketing and distribution; food manufacturing; data processing and automation; administration, management, and labor; secretarial and clerical services; office machines; equipment; metal manufacturing and processing; prefabrication and prepackaging; textile and clothing…
Distributed Processing of Projections of Large Datasets: A Preliminary Study
Maddox, Brian G.
2004-01-01
Modern information needs have resulted in very large amounts of data being used in geographic information systems. Problems arise when trying to project these data in a reasonable amount of time and accuracy, however. Current single-threaded methods can suffer from two problems: fast projection with poor accuracy, or accurate projection with long processing time. A possible solution may be to combine accurate interpolation methods and distributed processing algorithms to quickly and accurately convert digital geospatial data between coordinate systems. Modern technology has made it possible to construct systems, such as Beowulf clusters, for a low cost and provide access to supercomputer-class technology. Combining these techniques may result in the ability to use large amounts of geographic data in time-critical situations.
Peek, N; Holmes, J H; Sun, J
2014-08-15
To review technical and methodological challenges for big data research in biomedicine and health. We discuss sources of big datasets, survey infrastructures for big data storage and big data processing, and describe the main challenges that arise when analyzing big data. The life and biomedical sciences are massively contributing to the big data revolution through secondary use of data that were collected during routine care and through new data sources such as social media. Efficient processing of big datasets is typically achieved by distributing computation over a cluster of computers. Data analysts should be aware of pitfalls related to big data such as bias in routine care data and the risk of false-positive findings in high-dimensional datasets. The major challenge for the near future is to transform analytical methods that are used in the biomedical and health domain, to fit the distributed storage and processing model that is required to handle big data, while ensuring confidentiality of the data being analyzed.
An Investigation of Data Overload in Team-Based Distributed Cognition Systems
ERIC Educational Resources Information Center
Hellar, David Benjamin
2009-01-01
The modern military command center is a hybrid system of computer automated surveillance and human oriented decision making. In these distributed cognition systems, data overload refers simultaneously to the glut of raw data processed by information technology systems and the dearth of actionable knowledge useful to human decision makers.…
PD2P: PanDA Dynamic Data Placement for ATLAS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maeno, T.; De, K.; Panitkin, S.
2012-12-13
The PanDA (Production and Distributed Analysis) system plays a key role in the ATLAS distributed computing infrastructure. PanDA is the ATLAS workload management system for processing all Monte-Carlo (MC) simulation and data reprocessing jobs in addition to user and group analysis jobs. The PanDA Dynamic Data Placement (PD2P) system has been developed to cope with difficulties of data placement for ATLAS. We will describe the design of the new system, its performance during the past year of data taking, dramatic improvements it has brought about in the efficient use of storage and processing resources, and plans for the future.
Huang, Jr-Chuan; Lee, Tsung-Yu; Teng, Tse-Yang; Chen, Yi-Chin; Huang, Cho-Ying; Lee, Cheing-Tung
2014-01-01
The exponent decay in landslide frequency-area distribution is widely used for assessing the consequences of landslides and with some studies arguing that the slope of the exponent decay is universal and independent of mechanisms and environmental settings. However, the documented exponent slopes are diverse and hence data processing is hypothesized for this inconsistency. An elaborated statistical experiment and two actual landslide inventories were used here to demonstrate the influences of the data processing on the determination of the exponent. Seven categories with different landslide numbers were generated from the predefined inverse-gamma distribution and then analyzed by three data processing procedures (logarithmic binning, LB, normalized logarithmic binning, NLB and cumulative distribution function, CDF). Five different bin widths were also considered while applying LB and NLB. Following that, the maximum likelihood estimation was used to estimate the exponent slopes. The results showed that the exponents estimated by CDF were unbiased while LB and NLB performed poorly. Two binning-based methods led to considerable biases that increased with the increase of landslide number and bin width. The standard deviations of the estimated exponents were dependent not just on the landslide number but also on binning method and bin width. Both extremely few and plentiful landslide numbers reduced the confidence of the estimated exponents, which could be attributed to limited landslide numbers and considerable operational bias, respectively. The diverse documented exponents in literature should therefore be adjusted accordingly. Our study strongly suggests that the considerable bias due to data processing and the data quality should be constrained in order to advance the understanding of landslide processes.
NCI's Distributed Geospatial Data Server
NASA Astrophysics Data System (ADS)
Larraondo, P. R.; Evans, B. J. K.; Antony, J.
2016-12-01
Earth systems, environmental and geophysics datasets are an extremely valuable source of information about the state and evolution of the Earth. However, different disciplines and applications require this data to be post-processed in different ways before it can be used. For researchers experimenting with algorithms across large datasets or combining multiple data sets, the traditional approach to batch data processing and storing all the output for later analysis rapidly becomes unfeasible, and often requires additional work to publish for others to use. Recent developments on distributed computing using interactive access to significant cloud infrastructure opens the door for new ways of processing data on demand, hence alleviating the need for storage space for each individual copy of each product. The Australian National Computational Infrastructure (NCI) has developed a highly distributed geospatial data server which supports interactive processing of large geospatial data products, including satellite Earth Observation data and global model data, using flexible user-defined functions. This system dynamically and efficiently distributes the required computations among cloud nodes and thus provides a scalable analysis capability. In many cases this completely alleviates the need to preprocess and store the data as products. This system presents a standards-compliant interface, allowing ready accessibility for users of the data. Typical data wrangling problems such as handling different file formats and data types, or harmonising the coordinate projections or temporal and spatial resolutions, can now be handled automatically by this service. The geospatial data server exposes functionality for specifying how the data should be aggregated and transformed. The resulting products can be served using several standards such as the Open Geospatial Consortium's (OGC) Web Map Service (WMS) or Web Feature Service (WFS), Open Street Map tiles, or raw binary arrays under different conventions. We will show some cases where we have used this new capability to provide a significant improvement over previous approaches.
Automated video-microscopic imaging and data acquisition system for colloid deposition measurements
Abdel-Fattah, Amr I.; Reimus, Paul W.
2004-12-28
A video microscopic visualization system and image processing and data extraction and processing method for in situ detailed quantification of the deposition of sub-micrometer particles onto an arbitrary surface and determination of their concentration across the bulk suspension. The extracted data includes (a) surface concentration and flux of deposited, attached and detached colloids, (b) surface concentration and flux of arriving and departing colloids, (c) distribution of colloids in the bulk suspension in the direction perpendicular to the deposition surface, and (d) spatial and temporal distributions of deposited colloids.
A Scientific Workflow System for Satellite Data Processing with Real-Time Monitoring
NASA Astrophysics Data System (ADS)
Nguyen, Minh Duc
2018-02-01
This paper provides a case study on satellite data processing, storage, and distribution in the space weather domain by introducing the Satellite Data Downloading System (SDDS). The approach proposed in this paper was evaluated through real-world scenarios and addresses the challenges related to the specific field. Although SDDS is used for satellite data processing, it can potentially be adapted to a wide range of data processing scenarios in other fields of physics.
Leveraging AMI data for distribution system model calibration and situational awareness
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peppanen, Jouni; Reno, Matthew J.; Thakkar, Mohini
The many new distributed energy resources being installed at the distribution system level require increased visibility into system operations that will be enabled by distribution system state estimation (DSSE) and situational awareness applications. Reliable and accurate DSSE requires both robust methods for managing the big data provided by smart meters and quality distribution system models. This paper presents intelligent methods for detecting and dealing with missing or inaccurate smart meter data, as well as the ways to process the data for different applications. It also presents an efficient and flexible parameter estimation method based on the voltage drop equation andmore » regression analysis to enhance distribution system model accuracy. Finally, it presents a 3-D graphical user interface for advanced visualization of the system state and events. Moreover, we demonstrate this paper for a university distribution network with the state-of-the-art real-time and historical smart meter data infrastructure.« less
Leveraging AMI data for distribution system model calibration and situational awareness
Peppanen, Jouni; Reno, Matthew J.; Thakkar, Mohini; ...
2015-01-15
The many new distributed energy resources being installed at the distribution system level require increased visibility into system operations that will be enabled by distribution system state estimation (DSSE) and situational awareness applications. Reliable and accurate DSSE requires both robust methods for managing the big data provided by smart meters and quality distribution system models. This paper presents intelligent methods for detecting and dealing with missing or inaccurate smart meter data, as well as the ways to process the data for different applications. It also presents an efficient and flexible parameter estimation method based on the voltage drop equation andmore » regression analysis to enhance distribution system model accuracy. Finally, it presents a 3-D graphical user interface for advanced visualization of the system state and events. Moreover, we demonstrate this paper for a university distribution network with the state-of-the-art real-time and historical smart meter data infrastructure.« less
SciSpark's SRDD : A Scientific Resilient Distributed Dataset for Multidimensional Data
NASA Astrophysics Data System (ADS)
Palamuttam, R. S.; Wilson, B. D.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; McGibbney, L. J.; Ramirez, P.
2015-12-01
Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We have developed SciSpark, a robust Big Data framework, that extends ApacheTM Spark for scaling scientific computations. Apache Spark improves the map-reduce implementation in ApacheTM Hadoop for parallel computing on a cluster, by emphasizing in-memory computation, "spilling" to disk only as needed, and relying on lazy evaluation. Central to Spark is the Resilient Distributed Dataset (RDD), an in-memory distributed data structure that extends the functional paradigm provided by the Scala programming language. However, RDDs are ideal for tabular or unstructured data, and not for highly dimensional data. The SciSpark project introduces the Scientific Resilient Distributed Dataset (sRDD), a distributed-computing array structure which supports iterative scientific algorithms for multidimensional data. SciSpark processes data stored in NetCDF and HDF files by partitioning them across time or space and distributing the partitions among a cluster of compute nodes. We show usability and extensibility of SciSpark by implementing distributed algorithms for geospatial operations on large collections of multi-dimensional grids. In particular we address the problem of scaling an automated method for finding Mesoscale Convective Complexes. SciSpark provides a tensor interface to support the pluggability of different matrix libraries. We evaluate performance of the various matrix libraries in distributed pipelines, such as Nd4jTM and BreezeTM. We detail the architecture and design of SciSpark, our efforts to integrate climate science algorithms, parallel ingest and partitioning (sharding) of A-Train satellite observations from model grids. These solutions are encompassed in SciSpark, an open-source software framework for distributed computing on scientific data.
Preliminary surficial geologic map database of the Amboy 30 x 60 minute quadrangle, California
Bedford, David R.; Miller, David M.; Phelps, Geoffrey A.
2006-01-01
The surficial geologic map database of the Amboy 30x60 minute quadrangle presents characteristics of surficial materials for an area approximately 5,000 km2 in the eastern Mojave Desert of California. This map consists of new surficial mapping conducted between 2000 and 2005, as well as compilations of previous surficial mapping. Surficial geology units are mapped and described based on depositional process and age categories that reflect the mode of deposition, pedogenic effects occurring post-deposition, and, where appropriate, the lithologic nature of the material. The physical properties recorded in the database focus on those that drive hydrologic, biologic, and physical processes such as particle size distribution (PSD) and bulk density. This version of the database is distributed with point data representing locations of samples for both laboratory determined physical properties and semi-quantitative field-based information. Future publications will include the field and laboratory data as well as maps of distributed physical properties across the landscape tied to physical process models where appropriate. The database is distributed in three parts: documentation, spatial map-based data, and printable map graphics of the database. Documentation includes this file, which provides a discussion of the surficial geology and describes the format and content of the map data, a database 'readme' file, which describes the database contents, and FGDC metadata for the spatial map information. Spatial data are distributed as Arc/Info coverage in ESRI interchange (e00) format, or as tabular data in the form of DBF3-file (.DBF) file formats. Map graphics files are distributed as Postscript and Adobe Portable Document Format (PDF) files, and are appropriate for representing a view of the spatial database at the mapped scale.
Evaluating non-relational storage technology for HEP metadata and meta-data catalog
NASA Astrophysics Data System (ADS)
Grigorieva, M. A.; Golosova, M. V.; Gubin, M. Y.; Klimentov, A. A.; Osipova, V. V.; Ryabinkin, E. A.
2016-10-01
Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle of experiment is managed by specialized software like Distributed Data Management and Workload Management Systems. In order to be interpreted and mined, experimental data must be accompanied by auxiliary metadata, which are recorded at each data processing step. Metadata describes scientific data and represent scientific objects or results of scientific experiments, allowing them to be shared by various applications, to be recorded in databases or published via Web. Processing and analysis of constantly growing volume of auxiliary metadata is a challenging task, not simpler than the management and processing of experimental data itself. Furthermore, metadata sources are often loosely coupled and potentially may lead to an end-user inconsistency in combined information queries. To aggregate and synthesize a range of primary metadata sources, and enhance them with flexible schema-less addition of aggregated data, we are developing the Data Knowledge Base architecture serving as the intelligence behind GUIs and APIs.
NASA Astrophysics Data System (ADS)
Sazuka, Naoya
2007-03-01
We analyze waiting times for price changes in a foreign currency exchange rate. Recent empirical studies of high-frequency financial data support that trades in financial markets do not follow a Poisson process and the waiting times between trades are not exponentially distributed. Here we show that our data is well approximated by a Weibull distribution rather than an exponential distribution in the non-asymptotic regime. Moreover, we quantitatively evaluate how much an empirical data is far from an exponential distribution using a Weibull fit. Finally, we discuss a transition between a Weibull-law and a power-law in the long time asymptotic regime.
NASA Astrophysics Data System (ADS)
Hempelmann, Nils; Ehbrecht, Carsten; Alvarez-Castro, Carmen; Brockmann, Patrick; Falk, Wolfgang; Hoffmann, Jörg; Kindermann, Stephan; Koziol, Ben; Nangini, Cathy; Radanovics, Sabine; Vautard, Robert; Yiou, Pascal
2018-01-01
Analyses of extreme weather events and their impacts often requires big data processing of ensembles of climate model simulations. Researchers generally proceed by downloading the data from the providers and processing the data files ;at home; with their own analysis processes. However, the growing amount of available climate model and observation data makes this procedure quite awkward. In addition, data processing knowledge is kept local, instead of being consolidated into a common resource of reusable code. These drawbacks can be mitigated by using a web processing service (WPS). A WPS hosts services such as data analysis processes that are accessible over the web, and can be installed close to the data archives. We developed a WPS named 'flyingpigeon' that communicates over an HTTP network protocol based on standards defined by the Open Geospatial Consortium (OGC), to be used by climatologists and impact modelers as a tool for analyzing large datasets remotely. Here, we present the current processes we developed in flyingpigeon relating to commonly-used processes (preprocessing steps, spatial subsets at continent, country or region level, and climate indices) as well as methods for specific climate data analysis (weather regimes, analogues of circulation, segetal flora distribution, and species distribution models). We also developed a novel, browser-based interactive data visualization for circulation analogues, illustrating the flexibility of WPS in designing custom outputs. Bringing the software to the data instead of transferring the data to the code is becoming increasingly necessary, especially with the upcoming massive climate datasets.
A Hands-on Activity for Teaching the Poisson Distribution Using the Stock Market
ERIC Educational Resources Information Center
Dunlap, Mickey; Studstill, Sharyn
2014-01-01
The number of increases a particular stock makes over a fixed period follows a Poisson distribution. This article discusses using this easily-found data as an opportunity to let students become involved in the data collection and analysis process.
Extended Poisson process modelling and analysis of grouped binary data.
Faddy, Malcolm J; Smith, David M
2012-05-01
A simple extension of the Poisson process results in binomially distributed counts of events in a time interval. A further extension generalises this to probability distributions under- or over-dispersed relative to the binomial distribution. Substantial levels of under-dispersion are possible with this modelling, but only modest levels of over-dispersion - up to Poisson-like variation. Although simple analytical expressions for the moments of these probability distributions are not available, approximate expressions for the mean and variance are derived, and used to re-parameterise the models. The modelling is applied in the analysis of two published data sets, one showing under-dispersion and the other over-dispersion. More appropriate assessment of the precision of estimated parameters and reliable model checking diagnostics follow from this more general modelling of these data sets. © 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Application Of Holography In The Distribution Measurement Of Fuel Spraying Field In Diesel Engines
NASA Astrophysics Data System (ADS)
Xiang, He Wan; Xiong, Li Zhi
1988-01-01
The distribution of fuel spraying field in the combustion chamber is an important factor which influences the performance of diesel engines. Precise data for those major parameters of the spraying field distribution are difficult to obtain using conventional ways of measurement, so its effects on the combustion process cannot be controlled. The laser holographic measurement is used and many researches have been made on the injecting nozzles used in diesel engines Series 95, 100 and 130. These researches show that clear spraying field hologram can be taken with an "IC Engine Laser Holography System". By rendition and data processing, droplet size, amount and their space distribution in the spraying; the spraying range, cone angle and other dependable data can be obtained. Therefore, the spraying quality of an injecting nozzle can be precisely determined, which provides reliable basis for the improvement of diesel engines' functions.
Semantics-based distributed I/O with the ParaMEDIC framework.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balaji, P.; Feng, W.; Lin, H.
2008-01-01
Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site oftentimes has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. Thus, we present a framework called 'ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing' which uses application-specific semantic information to convert the generatedmore » data to orders-of-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and re-process the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.« less
Synthetic Foveal Imaging Technology
NASA Technical Reports Server (NTRS)
Nikzad, Shouleh (Inventor); Monacos, Steve P. (Inventor); Hoenk, Michael E. (Inventor)
2013-01-01
Apparatuses and methods are disclosed that create a synthetic fovea in order to identify and highlight interesting portions of an image for further processing and rapid response. Synthetic foveal imaging implements a parallel processing architecture that uses reprogrammable logic to implement embedded, distributed, real-time foveal image processing from different sensor types while simultaneously allowing for lossless storage and retrieval of raw image data. Real-time, distributed, adaptive processing of multi-tap image sensors with coordinated processing hardware used for each output tap is enabled. In mosaic focal planes, a parallel-processing network can be implemented that treats the mosaic focal plane as a single ensemble rather than a set of isolated sensors. Various applications are enabled for imaging and robotic vision where processing and responding to enormous amounts of data quickly and efficiently is important.
NASA Astrophysics Data System (ADS)
Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo
2017-08-01
We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)
Understanding the distributed cognitive processes of intensive care patient discharge.
Lin, Frances; Chaboyer, Wendy; Wallis, Marianne
2014-03-01
To better understand and identify vulnerabilities and risks in the ICU patient discharge process, which provides evidence for service improvement. Previous studies have identified that 'after hours' discharge and 'premature' discharge from ICU are associated with increased mortality. However, some of these studies have largely been retrospective reviews of various administrative databases, while others have focused on specific aspects of the process, which may miss crucial components of the discharge process. This is an ethnographic exploratory study. Distributed cognition and activity theory were used as theoretical frameworks. Ethnographic data collection techniques including informal interviews, direct observations and collecting existing documents were used. A total of 56 one-to-one interviews were conducted with 46 participants; 28 discharges were observed; and numerous documents were collected during a five-month period. A triangulated technique was used in both data collection and data analysis to ensure the research rigour. Under the guidance of activity theory and distributed cognition theoretical frameworks, five themes emerged: hierarchical power and authority, competing priorities, ineffective communication, failing to enact the organisational processes and working collaboratively to optimise the discharge process. Issues with teamwork, cognitive processes and team members' interaction with cognitive artefacts influenced the discharge process. Strategies to improve shared situational awareness are needed to improve teamwork, patient flow and resource efficiency. Tools need to be evaluated regularly to ensure their continuous usefulness. Health care professionals need to be aware of the impact of their competing priorities and ensure discharges occur in a timely manner. Activity theory and distributed cognition are useful theoretical frameworks to support healthcare organisational research. © 2013 John Wiley & Sons Ltd.
RACORO aerosol data processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elisabeth Andrews
2011-10-31
The RACORO aerosol data (cloud condensation nuclei (CCN), condensation nuclei (CN) and aerosol size distributions) need further processing to be useful for model evaluation (e.g., GCM droplet nucleation parameterizations) and other investigations. These tasks include: (1) Identification and flagging of 'splash' contaminated Twin Otter aerosol data. (2) Calculation of actual supersaturation (SS) values in the two CCN columns flown on the Twin Otter. (3) Interpolation of CCN spectra from SGP and Twin Otter to 0.2% SS. (4) Process data for spatial variability studies. (5) Provide calculated light scattering from measured aerosol size distributions. Below we first briefly describe the measurementsmore » and then describe the results of several data processing tasks that which have been completed, paving the way for the scientific analyses for which the campaign was designed. The end result of this research will be several aerosol data sets which can be used to achieve some of the goals of the RACORO mission including the enhanced understanding of cloud-aerosol interactions and improved cloud simulations in climate models.« less
Data Integration in Computer Distributed Systems
NASA Astrophysics Data System (ADS)
Kwiecień, Błażej
In this article the author analyze a problem of data integration in a computer distributed systems. Exchange of information between different levels in integrated pyramid of enterprise process is fundamental with regard to efficient enterprise work. Communication and data exchange between levels are not always the same cause of necessity of different network protocols usage, communication medium, system response time, etc.
Operating tool for a distributed data and information management system
NASA Astrophysics Data System (ADS)
Reck, C.; Mikusch, E.; Kiemle, S.; Wolfmüller, M.; Böttcher, M.
2002-07-01
The German Remote Sensing Data Center has developed the Data Information and Management System DIMS which provides multi-mission ground system services for earth observation product processing, archiving, ordering and delivery. DIMS successfully uses newest technologies within its services. This paper presents the solution taken to simplify operation tasks for this large and distributed system.
NASA Astrophysics Data System (ADS)
Garov, A. S.; Karachevtseva, I. P.; Matveev, E. V.; Zubarev, A. E.; Florinsky, I. V.
2016-06-01
We are developing a unified distributed communication environment for processing of spatial data which integrates web-, desktop- and mobile platforms and combines volunteer computing model and public cloud possibilities. The main idea is to create a flexible working environment for research groups, which may be scaled according to required data volume and computing power, while keeping infrastructure costs at minimum. It is based upon the "single window" principle, which combines data access via geoportal functionality, processing possibilities and communication between researchers. Using an innovative software environment the recently developed planetary information system (http://cartsrv.mexlab.ru/geoportal) will be updated. The new system will provide spatial data processing, analysis and 3D-visualization and will be tested based on freely available Earth remote sensing data as well as Solar system planetary images from various missions. Based on this approach it will be possible to organize the research and representation of results on a new technology level, which provides more possibilities for immediate and direct reuse of research materials, including data, algorithms, methodology, and components. The new software environment is targeted at remote scientific teams, and will provide access to existing spatial distributed information for which we suggest implementation of a user interface as an advanced front-end, e.g., for virtual globe system.
NASA Astrophysics Data System (ADS)
Musdalifah, N.; Handajani, S. S.; Zukhronah, E.
2017-06-01
Competition between the homoneous companies cause the company have to keep production quality. To cover this problem, the company controls the production with statistical quality control using control chart. Shewhart control chart is used to normal distributed data. The production data is often non-normal distribution and occured small process shift. Grand median control chart is a control chart for non-normal distributed data, while cumulative sum (cusum) control chart is a sensitive control chart to detect small process shift. The purpose of this research is to compare grand median and cusum control charts on shuttlecock weight variable in CV Marjoko Kompas dan Domas by generating data as the actual distribution. The generated data is used to simulate multiplier of standard deviation on grand median and cusum control charts. Simulation is done to get average run lenght (ARL) 370. Grand median control chart detects ten points that out of control, while cusum control chart detects a point out of control. It can be concluded that grand median control chart is better than cusum control chart.
Beowulf Distributed Processing and the United States Geological Survey
Maddox, Brian G.
2002-01-01
Introduction In recent years, the United States Geological Survey's (USGS) National Mapping Discipline (NMD) has expanded its scientific and research activities. Work is being conducted in areas such as emergency response research, scientific visualization, urban prediction, and other simulation activities. Custom-produced digital data have become essential for these types of activities. High-resolution, remotely sensed datasets are also seeing increased use. Unfortunately, the NMD is also finding that it lacks the resources required to perform some of these activities. Many of these projects require large amounts of computer processing resources. Complex urban-prediction simulations, for example, involve large amounts of processor-intensive calculations on large amounts of input data. This project was undertaken to learn and understand the concepts of distributed processing. Experience was needed in developing these types of applications. The idea was that this type of technology could significantly aid the needs of the NMD scientific and research programs. Porting a numerically intensive application currently being used by an NMD science program to run in a distributed fashion would demonstrate the usefulness of this technology. There are several benefits that this type of technology can bring to the USGS's research programs. Projects can be performed that were previously impossible due to a lack of computing resources. Other projects can be performed on a larger scale than previously possible. For example, distributed processing can enable urban dynamics research to perform simulations on larger areas without making huge sacrifices in resolution. The processing can also be done in a more reasonable amount of time than with traditional single-threaded methods (a scaled version of Chester County, Pennsylvania, took about fifty days to finish its first calibration phase with a single-threaded program). This paper has several goals regarding distributed processing technology. It will describe the benefits of the technology. Real data about a distributed application will be presented as an example of the benefits that this technology can bring to USGS scientific programs. Finally, some of the issues with distributed processing that relate to USGS work will be discussed.
Evaluation of Apache Hadoop for parallel data analysis with ROOT
NASA Astrophysics Data System (ADS)
Lehrack, S.; Duckeck, G.; Ebke, J.
2014-06-01
The Apache Hadoop software is a Java based framework for distributed processing of large data sets across clusters of computers, using the Hadoop file system (HDFS) for data storage and backup and MapReduce as a processing platform. Hadoop is primarily designed for processing large textual data sets which can be processed in arbitrary chunks, and must be adapted to the use case of processing binary data files which cannot be split automatically. However, Hadoop offers attractive features in terms of fault tolerance, task supervision and control, multi-user functionality and job management. For this reason, we evaluated Apache Hadoop as an alternative approach to PROOF for ROOT data analysis. Two alternatives in distributing analysis data were discussed: either the data was stored in HDFS and processed with MapReduce, or the data was accessed via a standard Grid storage system (dCache Tier-2) and MapReduce was used only as execution back-end. The focus in the measurements were on the one hand to safely store analysis data on HDFS with reasonable data rates and on the other hand to process data fast and reliably with MapReduce. In the evaluation of the HDFS, read/write data rates from local Hadoop cluster have been measured and compared to standard data rates from the local NFS installation. In the evaluation of MapReduce, realistic ROOT analyses have been used and event rates have been compared to PROOF.
NASA Science Data Processing for SNPP
NASA Astrophysics Data System (ADS)
Hall, A.; Behnke, J.; Lowe, D. R.; Ho, E. L.
2014-12-01
NASA's ESDIS Project has been operating the Suomi National Polar-Orbiting Partnership (SNPP) Science Data Segment (SDS) since the launch in October 2011. The science data processing system includes a Science Data Depository and Distribution Element (SD3E) and five Product Evaluation and Analysis Tool Elements (PEATEs): Land, Ocean, Atmosphere, Ozone, and Sounder. The SDS has been responsible for assessing Environmental Data Records (EDRs) for climate quality, providing and demonstrating algorithm improvements/enhancements and supporting the calibration/validation activities as well as instrument calibration and sensor table uploads for mission planning. The SNPP also flies two NASA instruments: OMPS Limb and CERES. The SNPP SDS has been responsible for producing, archiving and distributing the standard products for those instruments in close association with their NASA science teams. The PEATEs leveraged existing science data processing techniques developed under the EOSDIS Program. This enabled he PEATEs to do an excellent job in supporting Science Team analysis for SNPP. The SDS acquires data from three sources: NESDIS IDPS (Raw Data Records (RDRs)), GRAVITE (Retained Intermediate Products (RIPs)), and the NOAA/CLASS (higher level products). The SD3E component aggregates the RDRs, and distributes them to each of the PEATEs for further analysis and processing. It provides a ~32 day rolling storage of data, available for pickup by the PEATEs. The current system used by NASA will be presented along with plans for streamlining the system in support of continuing the NASA's EOS measurements.
Carneggie, David M.; Metz, Gary G.; Draeger, William C.; Thompson, Ralph J.
1991-01-01
The U.S. Geological Survey's Earth Resources Observation Systems (EROS) Data Center, the national archive for Landsat data, has 20 years of experience in acquiring, archiving, processing, and distributing Landsat and earth science data. The Center is expanding its satellite and earth science data management activities to support the U.S. Global Change Research Program and the National Aeronautics and Space Administration (NASA) Earth Observing System Program. The Center's current and future data management activities focus on land data and include: satellite and earth science data set acquisition, development and archiving; data set preservation, maintenance and conversion to more durable and accessible archive medium; development of an advanced Land Data Information System; development of enhanced data packaging and distribution mechanisms; and data processing, reprocessing, and product generation systems.
Signal processing in urodynamics: towards high definition urethral pressure profilometry.
Klünder, Mario; Sawodny, Oliver; Amend, Bastian; Ederer, Michael; Kelp, Alexandra; Sievert, Karl-Dietrich; Stenzl, Arnulf; Feuer, Ronny
2016-03-22
Urethral pressure profilometry (UPP) is used in the diagnosis of stress urinary incontinence (SUI) which is a significant medical, social, and economic problem. Low spatial pressure resolution, common occurrence of artifacts, and uncertainties in data location limit the diagnostic value of UPP. To overcome these limitations, high definition urethral pressure profilometry (HD-UPP) combining enhanced UPP hardware and signal processing algorithms has been developed. In this work, we present the different signal processing steps in HD-UPP and show experimental results from female minipigs. We use a special microtip catheter with high angular pressure resolution and an integrated inclination sensor. Signals from the catheter are filtered and time-correlated artifacts removed. A signal reconstruction algorithm processes pressure data into a detailed pressure image on the urethra's inside. Finally, the pressure distribution on the urethra's outside is calculated through deconvolution. A mathematical model of the urethra is contained in a point-spread-function (PSF) which is identified depending on geometric and material properties of the urethra. We additionally investigate the PSF's frequency response to determine the relevant frequency band for pressure information on the urinary sphincter. Experimental pressure data are spatially located and processed into high resolution pressure images. Artifacts are successfully removed from data without blurring other details. The pressure distribution on the urethra's outside is reconstructed and compared to the one on the inside. Finally, the pressure images are mapped onto the urethral geometry calculated from inclination and position data to provide an integrated image of pressure distribution, anatomical shape, and location. With its advanced sensing capabilities, the novel microtip catheter collects an unprecedented amount of urethral pressure data. Through sequential signal processing steps, physicians are provided with detailed information on the pressure distribution in and around the urethra. Therefore, HD-UPP overcomes many current limitations of conventional UPP and offers the opportunity to evaluate urethral structures, especially the sphincter, in context of the correct anatomical location. This could enable the development of focal therapy approaches in the treatment of SUI.
Distributed data mining on grids: services, tools, and applications.
Cannataro, Mario; Congiusta, Antonio; Pugliese, Andrea; Talia, Domenico; Trunfio, Paolo
2004-12-01
Data mining algorithms are widely used today for the analysis of large corporate and scientific datasets stored in databases and data archives. Industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for distributed knowledge discovery applications. For the development of data mining applications on grids we designed a system called Knowledge Grid. This paper describes the Knowledge Grid framework and presents the toolset provided by the Knowledge Grid for implementing distributed knowledge discovery. The paper discusses how to design and implement data mining applications by using the Knowledge Grid tools starting from searching grid resources, composing software and data components, and executing the resulting data mining process on a grid. Some performance results are also discussed.
Choi, Hyungsuk; Choi, Woohyuk; Quan, Tran Minh; Hildebrand, David G C; Pfister, Hanspeter; Jeong, Won-Ki
2014-12-01
As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and even experienced programmers often fail to fully leverage the computing power of new parallel architectures due to their steep learning curve and programming complexity. In this paper, we propose Vivaldi, a new domain-specific language for volume processing and visualization on distributed heterogeneous computing systems. Vivaldi's Python-like grammar and parallel processing abstractions provide flexible programming tools for non-experts to easily write high-performance parallel computing code. Vivaldi provides commonly used functions and numerical operators for customized visualization and high-throughput image processing applications. We demonstrate the performance and usability of Vivaldi on several examples ranging from volume rendering to image segmentation.
NASA Astrophysics Data System (ADS)
Yamamoto, K.; Murata, K.; Kimura, E.; Honda, R.
2006-12-01
In the Solar-Terrestrial Physics (STP) field, the amount of satellite observation data has been increasing every year. It is necessary to solve the following three problems to achieve large-scale statistical analyses of plenty of data. (i) More CPU power and larger memory and disk size are required. However, total powers of personal computers are not enough to analyze such amount of data. Super-computers provide a high performance CPU and rich memory area, but they are usually separated from the Internet or connected only for the purpose of programming or data file transfer. (ii) Most of the observation data files are managed at distributed data sites over the Internet. Users have to know where the data files are located. (iii) Since no common data format in the STP field is available now, users have to prepare reading program for each data by themselves. To overcome the problems (i) and (ii), we constructed a parallel and distributed data analysis environment based on the Gfarm reference implementation of the Grid Datafarm architecture. The Gfarm shares both computational resources and perform parallel distributed processings. In addition, the Gfarm provides the Gfarm filesystem which can be as virtual directory tree among nodes. The Gfarm environment is composed of three parts; a metadata server to manage distributed files information, filesystem nodes to provide computational resources and a client to throw a job into metadata server and manages data processing schedulings. In the present study, both data files and data processes are parallelized on the Gfarm with 6 file system nodes: CPU clock frequency of each node is Pentium V 1GHz, 256MB memory and40GB disk. To evaluate performances of the present Gfarm system, we scanned plenty of data files, the size of which is about 300MB for each, in three processing methods: sequential processing in one node, sequential processing by each node and parallel processing by each node. As a result, in comparison between the number of files and the elapsed time, parallel and distributed processing shorten the elapsed time to 1/5 than sequential processing. On the other hand, sequential processing times were shortened in another experiment, whose file size is smaller than 100KB. In this case, the elapsed time to scan one file is within one second. It implies that disk swap took place in case of parallel processing by each node. We note that the operation became unstable when the number of the files exceeded 1000. To overcome the problem (iii), we developed an original data class. This class supports our reading of data files with various data formats since it converts them into an original data format since it defines schemata for every type of data and encapsulates the structure of data files. In addition, since this class provides a function of time re-sampling, users can easily convert multiple data (array) with different time resolution into the same time resolution array. Finally, using the Gfarm, we achieved a high performance environment for large-scale statistical data analyses. It should be noted that the present method is effective only when one data file size is large enough. At present, we are restructuring the new Gfarm environment with 8 nodes: CPU is Athlon 64 x2 Dual Core 2GHz, 2GB memory and 1.2TB disk (using RAID0) for each node. Our original class is to be implemented on the new Gfarm environment. In the present talk, we show the latest results with applying the present system for data analyses with huge number of satellite observation data files.
Quantum cryptographic system with reduced data loss
Lo, H.K.; Chau, H.F.
1998-03-24
A secure method for distributing a random cryptographic key with reduced data loss is disclosed. Traditional quantum key distribution systems employ similar probabilities for the different communication modes and thus reject at least half of the transmitted data. The invention substantially reduces the amount of discarded data (those that are encoded and decoded in different communication modes e.g. using different operators) in quantum key distribution without compromising security by using significantly different probabilities for the different communication modes. Data is separated into various sets according to the actual operators used in the encoding and decoding process and the error rate for each set is determined individually. The invention increases the key distribution rate of the BB84 key distribution scheme proposed by Bennett and Brassard in 1984. Using the invention, the key distribution rate increases with the number of quantum signals transmitted and can be doubled asymptotically. 23 figs.
Quantum cryptographic system with reduced data loss
Lo, Hoi-Kwong; Chau, Hoi Fung
1998-01-01
A secure method for distributing a random cryptographic key with reduced data loss. Traditional quantum key distribution systems employ similar probabilities for the different communication modes and thus reject at least half of the transmitted data. The invention substantially reduces the amount of discarded data (those that are encoded and decoded in different communication modes e.g. using different operators) in quantum key distribution without compromising security by using significantly different probabilities for the different communication modes. Data is separated into various sets according to the actual operators used in the encoding and decoding process and the error rate for each set is determined individually. The invention increases the key distribution rate of the BB84 key distribution scheme proposed by Bennett and Brassard in 1984. Using the invention, the key distribution rate increases with the number of quantum signals transmitted and can be doubled asymptotically.
Universal Distribution of Litter Decay Rates
NASA Astrophysics Data System (ADS)
Forney, D. C.; Rothman, D. H.
2008-12-01
Degradation of litter is the result of many physical, chemical and biological processes. The high variability of these processes likely accounts for the progressive slowdown of decay with litter age. This age dependence is commonly thought to result from the superposition of processes with different decay rates k. Here we assume an underlying continuous yet unknown distribution p(k) of decay rates [1]. To seek its form, we analyze the mass-time history of 70 LIDET [2] litter data sets obtained under widely varying conditions. We construct a regularized inversion procedure to find the best fitting distribution p(k) with the least degrees of freedom. We find that the resulting p(k) is universally consistent with a lognormal distribution, i.e.~a Gaussian distribution of log k, characterized by a dataset-dependent mean and variance of log k. This result is supported by a recurring observation that microbial populations on leaves are log-normally distributed [3]. Simple biological processes cause the frequent appearance of the log-normal distribution in ecology [4]. Environmental factors, such as soil nitrate, soil aggregate size, soil hydraulic conductivity, total soil nitrogen, soil denitrification, soil respiration have been all observed to be log-normally distributed [5]. Litter degradation rates depend on many coupled, multiplicative factors, which provides a fundamental basis for the lognormal distribution. Using this insight, we systematically estimated the mean and variance of log k for 512 data sets from the LIDET study. We find the mean strongly correlates with temperature and precipitation, while the variance appears to be uncorrelated with main environmental factors and is thus likely more correlated with chemical composition and/or ecology. Results indicate the possibility that the distribution in rates reflects, at least in part, the distribution of microbial niches. [1] B. P. Boudreau, B.~R. Ruddick, American Journal of Science,291, 507, (1991). [2] M. Harmon, Forest Science Data Bank: TD023 [Database]. LTER Intersite Fine Litter Decomposition Experiment (LIDET): Long-Term Ecological Research, (2007). [3] G.~A. Beattie, S.~E. Lindow, Phytopathology 89, 353 (1999). [4] R.~A. May, Ecology and Evolution of Communities/, A pattern of Species Abundance and Diversity, 81 (1975). [5] T.~B. Parkin, J.~A. Robinson, Advances in Soil Science 20, Analysis of Lognormal Data, 194 (1992).
Work Distribution in a Fully Distributed Processing System.
1982-01-01
Institute of Technology Atlanta, Georgia 30332 THE VIEW, OPINIONS, AND/OR FINDINGS CONTAINED IN THIS REPORT ARE THOSE OF THE AUTHOR AND SHOULD NOT BE...opinions, and/or findings contained in this report are those of the author and should not be construed as an official Department of the Navy position...SECTIONO1 Distributed data processing systems are currently being studied by researchers and prospective users because of their potential for improvements
WaveJava: Wavelet-based network computing
NASA Astrophysics Data System (ADS)
Ma, Kun; Jiao, Licheng; Shi, Zhuoer
1997-04-01
Wavelet is a powerful theory, but its successful application still needs suitable programming tools. Java is a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multi- threaded, dynamic language. This paper addresses the design and development of a cross-platform software environment for experimenting and applying wavelet theory. WaveJava, a wavelet class library designed by the object-orient programming, is developed to take advantage of the wavelets features, such as multi-resolution analysis and parallel processing in the networking computing. A new application architecture is designed for the net-wide distributed client-server environment. The data are transmitted with multi-resolution packets. At the distributed sites around the net, these data packets are done the matching or recognition processing in parallel. The results are fed back to determine the next operation. So, the more robust results can be arrived quickly. The WaveJava is easy to use and expand for special application. This paper gives a solution for the distributed fingerprint information processing system. It also fits for some other net-base multimedia information processing, such as network library, remote teaching and filmless picture archiving and communications.
NASA Astrophysics Data System (ADS)
Eschenbächer, Jens; Seifert, Marcus; Thoben, Klaus-Dieter
Distributed innovation processes are considered as a new option to handle both the complexity and the speed in which new products and services need to be prepared. Indeed most research on innovation processes was focused on multinational companies with an intra-organisational perspective. The phenomena of innovation processes in networks - with an inter-organisational perspective - have been almost neglected. Collaborative networks present a perfect playground for such distributed innovation processes whereas the authors highlight in specific Virtual Organisation because of their dynamic behaviour. Research activities supporting distributed innovation processes in VO are rather new so that little knowledge about the management of such research is available. With the presentation of the collaborative network relationship analysis this gap will be addressed. It will be shown that a qualitative planning of collaboration intensities can support real business cases by proving knowledge and planning data.
CMO: Cruise Metadata Organizer for JAMSTEC Research Cruises
NASA Astrophysics Data System (ADS)
Fukuda, K.; Saito, H.; Hanafusa, Y.; Vanroosebeke, A.; Kitayama, T.
2011-12-01
JAMSTEC's Data Research Center for Marine-Earth Sciences manages and distributes a wide variety of observational data and samples obtained from JAMSTEC research vessels and deep sea submersibles. Generally, metadata are essential to identify data and samples were obtained. In JAMSTEC, cruise metadata include cruise information such as cruise ID, name of vessel, research theme, and diving information such as dive number, name of submersible and position of diving point. They are submitted by chief scientists of research cruises in the Microsoft Excel° spreadsheet format, and registered into a data management database to confirm receipt of observational data files, cruise summaries, and cruise reports. The cruise metadata are also published via "JAMSTEC Data Site for Research Cruises" within two months after end of cruise. Furthermore, these metadata are distributed with observational data, images and samples via several data and sample distribution websites after a publication moratorium period. However, there are two operational issues in the metadata publishing process. One is that duplication efforts and asynchronous metadata across multiple distribution websites due to manual metadata entry into individual websites by administrators. The other is that differential data types or representation of metadata in each website. To solve those problems, we have developed a cruise metadata organizer (CMO) which allows cruise metadata to be connected from the data management database to several distribution websites. CMO is comprised of three components: an Extensible Markup Language (XML) database, an Enterprise Application Integration (EAI) software, and a web-based interface. The XML database is used because of its flexibility for any change of metadata. Daily differential uptake of metadata from the data management database to the XML database is automatically processed via the EAI software. Some metadata are entered into the XML database using the web-based interface by a metadata editor in CMO as needed. Then daily differential uptake of metadata from the XML database to databases in several distribution websites is automatically processed using a convertor defined by the EAI software. Currently, CMO is available for three distribution websites: "Deep Sea Floor Rock Sample Database GANSEKI", "Marine Biological Sample Database", and "JAMSTEC E-library of Deep-sea Images". CMO is planned to provide "JAMSTEC Data Site for Research Cruises" with metadata in the future.
NASA Astrophysics Data System (ADS)
Xie, Jibo; Li, Guoqing
2015-04-01
Earth observation (EO) data obtained by air-borne or space-borne sensors has the characteristics of heterogeneity and geographical distribution of storage. These data sources belong to different organizations or agencies whose data management and storage methods are quite different and geographically distributed. Different data sources provide different data publish platforms or portals. With more Remote sensing sensors used for Earth Observation (EO) missions, different space agencies have distributed archived massive EO data. The distribution of EO data archives and system heterogeneity makes it difficult to efficiently use geospatial data for many EO applications, such as hazard mitigation. To solve the interoperable problems of different EO data systems, an advanced architecture of distributed geospatial data infrastructure is introduced to solve the complexity of distributed and heterogeneous EO data integration and on-demand processing in this paper. The concept and architecture of geospatial data service gateway (GDSG) is proposed to build connection with heterogeneous EO data sources by which EO data can be retrieved and accessed with unified interfaces. The GDSG consists of a set of tools and service to encapsulate heterogeneous geospatial data sources into homogenous service modules. The GDSG modules includes EO metadata harvesters and translators, adaptors to different type of data system, unified data query and access interfaces, EO data cache management, and gateway GUI, etc. The GDSG framework is used to implement interoperability and synchronization between distributed EO data sources with heterogeneous architecture. An on-demand distributed EO data platform is developed to validate the GDSG architecture and implementation techniques. Several distributed EO data achieves are used for test. Flood and earthquake serves as two scenarios for the use cases of distributed EO data integration and interoperability.
Huang, Jr-Chuan; Lee, Tsung-Yu; Teng, Tse-Yang; Chen, Yi-Chin; Huang, Cho-Ying; Lee, Cheing-Tung
2014-01-01
The exponent decay in landslide frequency-area distribution is widely used for assessing the consequences of landslides and with some studies arguing that the slope of the exponent decay is universal and independent of mechanisms and environmental settings. However, the documented exponent slopes are diverse and hence data processing is hypothesized for this inconsistency. An elaborated statistical experiment and two actual landslide inventories were used here to demonstrate the influences of the data processing on the determination of the exponent. Seven categories with different landslide numbers were generated from the predefined inverse-gamma distribution and then analyzed by three data processing procedures (logarithmic binning, LB, normalized logarithmic binning, NLB and cumulative distribution function, CDF). Five different bin widths were also considered while applying LB and NLB. Following that, the maximum likelihood estimation was used to estimate the exponent slopes. The results showed that the exponents estimated by CDF were unbiased while LB and NLB performed poorly. Two binning-based methods led to considerable biases that increased with the increase of landslide number and bin width. The standard deviations of the estimated exponents were dependent not just on the landslide number but also on binning method and bin width. Both extremely few and plentiful landslide numbers reduced the confidence of the estimated exponents, which could be attributed to limited landslide numbers and considerable operational bias, respectively. The diverse documented exponents in literature should therefore be adjusted accordingly. Our study strongly suggests that the considerable bias due to data processing and the data quality should be constrained in order to advance the understanding of landslide processes. PMID:24852019
The strange sea density and charm production in deep inelastic charged current processes
NASA Astrophysics Data System (ADS)
Glück, M.; Kretzer, S.; Reya, E.
1996-02-01
Charm production as related to the determination of the strange sea density in deep inelastic charged current processes is studied predominantly in the framework of the overlineMS fixed flavor factorization scheme. Perturbative stability within this formalism is demonstrated. The compatibility of recent next-to-leading order strange quark distributions with the available dimuon and F2νN data is investigated. It is shown that final conclusions concerning these distributions afford further analyses of presently available and/or forthcoming neutrino data.
A simple enrichment correction factor for improving erosion estimation by rare earth oxide tracers
USDA-ARS?s Scientific Manuscript database
Spatially distributed soil erosion data are needed to better understanding soil erosion processes and validating distributed erosion models. Rare earth element (REE) oxides were used to generate spatial erosion data. However, a general concern on the accuracy of the technique arose due to selective ...
Future electro-optical sensors and processing in urban operations
NASA Astrophysics Data System (ADS)
Grönwall, Christina; Schwering, Piet B.; Rantakokko, Jouni; Benoist, Koen W.; Kemp, Rob A. W.; Steinvall, Ove; Letalick, Dietmar; Björkert, Stefan
2013-10-01
In the electro-optical sensors and processing in urban operations (ESUO) study we pave the way for the European Defence Agency (EDA) group of Electro-Optics experts (IAP03) for a common understanding of the optimal distribution of processing functions between the different platforms. Combinations of local, distributed and centralized processing are proposed. In this way one can match processing functionality to the required power, and available communication systems data rates, to obtain the desired reaction times. In the study, three priority scenarios were defined. For these scenarios, present-day and future sensors and signal processing technologies were studied. The priority scenarios were camp protection, patrol and house search. A method for analyzing information quality in single and multi-sensor systems has been applied. A method for estimating reaction times for transmission of data through the chain of command has been proposed and used. These methods are documented and can be used to modify scenarios, or be applied to other scenarios. Present day data processing is organized mainly locally. Very limited exchange of information with other platforms is present; this is performed mainly at a high information level. Main issues that arose from the analysis of present-day systems and methodology are the slow reaction time due to the limited field of view of present-day sensors and the lack of robust automated processing. Efficient handover schemes between wide and narrow field of view sensors may however reduce the delay times. The main effort in the study was in forecasting the signal processing of EO-sensors in the next ten to twenty years. Distributed processing is proposed between hand-held and vehicle based sensors. This can be accompanied by cloud processing on board several vehicles. Additionally, to perform sensor fusion on sensor data originating from different platforms, and making full use of UAV imagery, a combination of distributed and centralized processing is essential. There is a central role for sensor fusion of heterogeneous sensors in future processing. The changes that occur in the urban operations of the future due to the application of these new technologies will be the improved quality of information, with shorter reaction time, and with lower operator load.
Nuclear parton distributions and the Drell-Yan process
NASA Astrophysics Data System (ADS)
Kulagin, S. A.; Petti, R.
2014-10-01
We study the nuclear parton distribution functions on the basis of our recently developed semimicroscopic model, which takes into account a number of nuclear effects including nuclear shadowing, Fermi motion and nuclear binding, nuclear meson-exchange currents, and off-shell corrections to bound nucleon distributions. We discuss in detail the dependencies of nuclear effects on the type of parton distribution (nuclear sea vs valence), as well as on the parton flavor (isospin). We apply the resulting nuclear parton distributions to calculate ratios of cross sections for proton-induced Drell-Yan production off different nuclear targets. We obtain a good agreement on the magnitude, target and projectile x, and the dimuon mass dependence of proton-nucleus Drell-Yan process data from the E772 and E866 experiments at Fermilab. We also provide nuclear corrections for the Drell-Yan data from the E605 experiment.
Managing distribution changes in time series prediction
NASA Astrophysics Data System (ADS)
Matias, J. M.; Gonzalez-Manteiga, W.; Taboada, J.; Ordonez, C.
2006-07-01
When a problem is modeled statistically, a single distribution model is usually postulated that is assumed to be valid for the entire space. Nonetheless, this practice may be somewhat unrealistic in certain application areas, in which the conditions of the process that generates the data may change; as far as we are aware, however, no techniques have been developed to tackle this problem.This article proposes a technique for modeling and predicting this change in time series with a view to improving estimates and predictions. The technique is applied, among other models, to the hypernormal distribution recently proposed. When tested on real data from a range of stock market indices the technique produces better results that when a single distribution model is assumed to be valid for the entire period of time studied.Moreover, when a global model is postulated, it is highly recommended to select the hypernormal distribution parameter in the same likelihood maximization process.
Distributed fault detection over sensor networks with Markovian switching topologies
NASA Astrophysics Data System (ADS)
Ge, Xiaohua; Han, Qing-Long
2014-05-01
This paper deals with the distributed fault detection for discrete-time Markov jump linear systems over sensor networks with Markovian switching topologies. The sensors are scatteredly deployed in the sensor field and the fault detectors are physically distributed via a communication network. The system dynamics changes and sensing topology variations are modeled by a discrete-time Markov chain with incomplete mode transition probabilities. Each of these sensor nodes firstly collects measurement outputs from its all underlying neighboring nodes, processes these data in accordance with the Markovian switching topologies, and then transmits the processed data to the remote fault detector node. Network-induced delays and accumulated data packet dropouts are incorporated in the data transmission between the sensor nodes and the distributed fault detector nodes through the communication network. To generate localized residual signals, mode-independent distributed fault detection filters are proposed. By means of the stochastic Lyapunov functional approach, the residual system performance analysis is carried out such that the overall residual system is stochastically stable and the error between each residual signal and the fault signal is made as small as possible. Furthermore, a sufficient condition on the existence of the mode-independent distributed fault detection filters is derived in the simultaneous presence of incomplete mode transition probabilities, Markovian switching topologies, network-induced delays, and accumulated data packed dropouts. Finally, a stirred-tank reactor system is given to show the effectiveness of the developed theoretical results.
Distribution and interplay of geologic processes on Titan from Cassini radar data
Lopes, R.M.C.; Stofan, E.R.; Peckyno, R.; Radebaugh, J.; Mitchell, K.L.; Mitri, Giuseppe; Wood, C.A.; Kirk, R.L.; Wall, S.D.; Lunine, J.I.; Hayes, A.; Lorenz, R.; Farr, Tom; Wye, L.; Craig, J.; Ollerenshaw, R.J.; Janssen, M.; LeGall, A.; Paganelli, F.; West, R.; Stiles, B.; Callahan, P.; Anderson, Y.; Valora, P.; Soderblom, L.
2010-01-01
The Cassini Titan Radar Mapper is providing an unprecedented view of Titan's surface geology. Here we use Synthetic Aperture Radar (SAR) image swaths (Ta-T30) obtained from October 2004 to December 2007 to infer the geologic processes that have shaped Titan's surface. These SAR swaths cover about 20% of the surface, at a spatial resolution ranging from ???350 m to ???2 km. The SAR data are distributed over a wide latitudinal and longitudinal range, enabling some conclusions to be drawn about the global distribution of processes. They reveal a geologically complex surface that has been modified by all the major geologic processes seen on Earth - volcanism, tectonism, impact cratering, and erosion and deposition by fluvial and aeolian activity. In this paper, we map geomorphological units from SAR data and analyze their areal distribution and relative ages of modification in order to infer the geologic evolution of Titan's surface. We find that dunes and hummocky and mountainous terrains are more widespread than lakes, putative cryovolcanic features, mottled plains, and craters and crateriform structures that may be due to impact. Undifferentiated plains are the largest areal unit; their origin is uncertain. In terms of latitudinal distribution, dunes and hummocky and mountainous terrains are located mostly at low latitudes (less than 30??), with no dunes being present above 60??. Channels formed by fluvial activity are present at all latitudes, but lakes are at high latitudes only. Crateriform structures that may have been formed by impact appear to be uniformly distributed with latitude, but the well-preserved impact craters are all located at low latitudes, possibly indicating that more resurfacing has occurred at higher latitudes. Cryovolcanic features are not ubiquitous, and are mostly located between 30?? and 60?? north. We examine temporal relationships between units wherever possible, and conclude that aeolian and fluvial/pluvial/lacustrine processes are the most recent, while tectonic processes that led to the formation of mountains and Xanadu are likely the most ancient. ?? 2009 Elsevier Inc.
Improved Seismic Acquisition System and Data Processing for the Italian National Seismic Network
NASA Astrophysics Data System (ADS)
Badiali, L.; Marcocci, C.; Mele, F.; Piscini, A.
2001-12-01
A new system for acquiring and processing digital signals has been developed in the last few years at the Istituto Nazionale di Geofisica e Vulcanologia (INGV). The system makes extensive use of the internet communication protocol standards such as TCP and UDP which are used as the transport highway inside the Italian network, and possibly in a near future outside, to share or redirect data among processes. The Italian National Seismic Network has been working for about 18 years equipped with vertical short period seismometers and transmitting through analog lines, to the computer center in Rome. We are now concentrating our efforts on speeding the migration towards a fully digital network based on about 150 stations equipped with either broad band or 5 seconds sensors connected to the data center partly through wired digital communication and partly through satellite digital communication. The overall process is layered through intranet and/or internet. Every layer gathers data in a simple format and provides data in a processed format, ready to be distributed towards the next layer. The lowest level acquires seismic data (raw waveforms) coming from the remote stations. It handshakes, checks and sends data in LAN or WAN according to a distribution list where other machines with their programs are waiting for. At the next level there are the picking procedures, or "pickers", on a per instrument basis, looking for phases. A picker spreads phases, again through the LAN or WAN and according to a distribution list, to one or more waiting locating machines tuned to generate a seismic event. The event locating procedure itself, the higher level in this stack, can exchange information with other similar procedures. Such a layered and distributed structure with nearby targets allows other seismic networks to join the processing and data collection of the same ongoing event, creating a virtual network larger than the original one. At present we plan to cooperate with other Italian regional and local networks, and with the VBB Mediterranean Network (MedNet) to share waveforms and events detected in real time. The seismic acquisition system at INGV uses a relational database built on standard SQL, for every activity involving the seismic network.
Mitra, Rajib; Jordan, Michael I.; Dunbrack, Roland L.
2010-01-01
Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp. PMID:20442867
Cloud-based distributed control of unmanned systems
NASA Astrophysics Data System (ADS)
Nguyen, Kim B.; Powell, Darren N.; Yetman, Charles; August, Michael; Alderson, Susan L.; Raney, Christopher J.
2015-05-01
Enabling warfighters to efficiently and safely execute dangerous missions, unmanned systems have been an increasingly valuable component in modern warfare. The evolving use of unmanned systems leads to vast amounts of data collected from sensors placed on the remote vehicles. As a result, many command and control (C2) systems have been developed to provide the necessary tools to perform one of the following functions: controlling the unmanned vehicle or analyzing and processing the sensory data from unmanned vehicles. These C2 systems are often disparate from one another, limiting the ability to optimally distribute data among different users. The Space and Naval Warfare Systems Center Pacific (SSC Pacific) seeks to address this technology gap through the UxV to the Cloud via Widgets project. The overarching intent of this three year effort is to provide three major capabilities: 1) unmanned vehicle control using an open service oriented architecture; 2) data distribution utilizing cloud technologies; 3) a collection of web-based tools enabling analysts to better view and process data. This paper focuses on how the UxV to the Cloud via Widgets system is designed and implemented by leveraging the following technologies: Data Distribution Service (DDS), Accumulo, Hadoop, and Ozone Widget Framework (OWF).
Validation results of the IAG Dancer project for distributed GPS analysis
NASA Astrophysics Data System (ADS)
Boomkamp, H.
2012-12-01
The number of permanent GPS stations in the world has grown far too large to allow processing of all this data at analysis centers. The majority of these GPS sites do not even make their observation data available to the analysis centers, for various valid reasons. The current ITRF solution is still based on centralized analysis by the IGS, and subsequent densification of the reference frame via regional network solutions. Minor inconsistencies in analysis methods, software systems and data quality imply that this centralized approach is unlikely to ever reach the ambitious accuracy objectives of GGOS. The dependence on published data also makes it clear that a centralized approach will never provide a true global ITRF solution for all GNSS receivers in the world. If the data does not come to the analysis, the only alternative is to bring the analysis to the data. The IAG Dancer project has implemented a distributed GNSS analysis system on the internet in which each receiver can have its own analysis center in the form of a freely distributed JAVA peer-to-peer application. Global parameters for satellite orbits, clocks and polar motion are solved via a distributed least squares solution among all participating receivers. A Dancer instance can run on any computer that has simultaneous access to the receiver data and to the public internet. In the future, such a process may be embedded in the receiver firmware directly. GPS network operators can join the Dancer ITRF realization without having to publish their observation data or estimation products. GPS users can run a Dancer process without contributing to the global solution, to have direct access to the ITRF in near real-time. The Dancer software has been tested on-line since late 2011. A global network of processes has gradually evolved to allow stabilization and tuning of the software in order to reach a fully operational system. This presentation reports on the current performance of the Dancer system, and demonstrates the obvious benefits of distributed analysis of geodetic data in general. IAG Dancer screenshot
Negative Binomial Process Count and Mixture Modeling.
Zhou, Mingyuan; Carin, Lawrence
2015-02-01
The seemingly disjoint problems of count and mixture modeling are united under the negative binomial (NB) process. A gamma process is employed to model the rate measure of a Poisson process, whose normalization provides a random probability measure for mixture modeling and whose marginalization leads to an NB process for count modeling. A draw from the NB process consists of a Poisson distributed finite number of distinct atoms, each of which is associated with a logarithmic distributed number of data samples. We reveal relationships between various count- and mixture-modeling distributions and construct a Poisson-logarithmic bivariate distribution that connects the NB and Chinese restaurant table distributions. Fundamental properties of the models are developed, and we derive efficient Bayesian inference. It is shown that with augmentation and normalization, the NB process and gamma-NB process can be reduced to the Dirichlet process and hierarchical Dirichlet process, respectively. These relationships highlight theoretical, structural, and computational advantages of the NB process. A variety of NB processes, including the beta-geometric, beta-NB, marked-beta-NB, marked-gamma-NB and zero-inflated-NB processes, with distinct sharing mechanisms, are also constructed. These models are applied to topic modeling, with connections made to existing algorithms under Poisson factor analysis. Example results show the importance of inferring both the NB dispersion and probability parameters.
Mistaking geography for biology: inferring processes from species distributions.
Warren, Dan L; Cardillo, Marcel; Rosauer, Dan F; Bolnick, Daniel I
2014-10-01
Over the past few decades, there has been a rapid proliferation of statistical methods that infer evolutionary and ecological processes from data on species distributions. These methods have led to considerable new insights, but they often fail to account for the effects of historical biogeography on present-day species distributions. Because the geography of speciation can lead to patterns of spatial and temporal autocorrelation in the distributions of species within a clade, this can result in misleading inferences about the importance of deterministic processes in generating spatial patterns of biodiversity. In this opinion article, we discuss ways in which patterns of species distributions driven by historical biogeography are often interpreted as evidence of particular evolutionary or ecological processes. We focus on three areas that are especially prone to such misinterpretations: community phylogenetics, environmental niche modelling, and analyses of beta diversity (compositional turnover of biodiversity). Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Gene tree rooting methods give distributions that mimic the coalescent process.
Tian, Yuan; Kubatko, Laura S
2014-01-01
Multi-locus phylogenetic inference is commonly carried out via models that incorporate the coalescent process to model the possibility that incomplete lineage sorting leads to incongruence between gene trees and the species tree. An interesting question that arises in this context is whether data "fit" the coalescent model. Previous work (Rosenfeld et al., 2012) has suggested that rooting of gene trees may account for variation in empirical data that has been previously attributed to the coalescent process. We examine this possibility using simulated data. We show that, in the case of four taxa, the distribution of gene trees observed from rooting estimated gene trees with either the molecular clock or with outgroup rooting can be closely matched by the distribution predicted by the coalescent model with specific choices of species tree branch lengths. We apply commonly-used coalescent-based methods of species tree inference to assess their performance in these situations. Copyright © 2013 Elsevier Inc. All rights reserved.
Archiving, processing, and disseminating ASTER products at the USGS EROS Data Center
Jones, B.; Tolk, B.; ,
2002-01-01
The U.S. Geological Survey EROS Data Center archives, processes, and disseminates Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data products. The ASTER instrument is one of five sensors onboard the Earth Observing System's Terra satellite launched December 18, 1999. ASTER collects broad spectral coverage with high spatial resolution at near infrared, shortwave infrared, and thermal infrared wavelengths with ground resolutions of 15, 30, and 90 meters, respectively. The ASTER data are used in many ways to understand local and regional earth-surface processes. Applications include land-surface climatology, volcanology, hazards monitoring, geology, agronomy, land cover change, and hydrology. The ASTER data are available for purchase from the ASTER Ground Data System in Japan and from the Land Processes Distributed Active Archive Center in the United States, which receives level 1A and level 1B data from Japan on a routine basis. These products are archived and made available to the public within 48 hours of receipt. The level 1A and level 1B data are used to generate higher level products that include routine and on-demand decorrelation stretch, brightness temperature at the sensor, emissivity, surface reflectance, surface kinetic temperature, surface radiance, polar surface and cloud classification, and digital elevation models. This paper describes the processes and procedures used to archive, process, and disseminate standard and on-demand higher level ASTER products at the Land Processes Distributed Active Archive Center.
Towards a Cloud Based Smart Traffic Management Framework
NASA Astrophysics Data System (ADS)
Rahimi, M. M.; Hakimpour, F.
2017-09-01
Traffic big data has brought many opportunities for traffic management applications. However several challenges like heterogeneity, storage, management, processing and analysis of traffic big data may hinder their efficient and real-time applications. All these challenges call for well-adapted distributed framework for smart traffic management that can efficiently handle big traffic data integration, indexing, query processing, mining and analysis. In this paper, we present a novel, distributed, scalable and efficient framework for traffic management applications. The proposed cloud computing based framework can answer technical challenges for efficient and real-time storage, management, process and analyse of traffic big data. For evaluation of the framework, we have used OpenStreetMap (OSM) real trajectories and road network on a distributed environment. Our evaluation results indicate that speed of data importing to this framework exceeds 8000 records per second when the size of datasets is near to 5 million. We also evaluate performance of data retrieval in our proposed framework. The data retrieval speed exceeds 15000 records per second when the size of datasets is near to 5 million. We have also evaluated scalability and performance of our proposed framework using parallelisation of a critical pre-analysis in transportation applications. The results show that proposed framework achieves considerable performance and efficiency in traffic management applications.
Bringing the CMS distributed computing system into scalable operations
NASA Astrophysics Data System (ADS)
Belforte, S.; Fanfani, A.; Fisk, I.; Flix, J.; Hernández, J. M.; Kress, T.; Letts, J.; Magini, N.; Miccio, V.; Sciabà, A.
2010-04-01
Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.
Developing a Hadoop-based Middleware for Handling Multi-dimensional NetCDF
NASA Astrophysics Data System (ADS)
Li, Z.; Yang, C. P.; Schnase, J. L.; Duffy, D.; Lee, T. J.
2014-12-01
Climate observations and model simulations are collecting and generating vast amounts of climate data, and these data are ever-increasing and being accumulated in a rapid speed. Effectively managing and analyzing these data are essential for climate change studies. Hadoop, a distributed storage and processing framework for large data sets, has attracted increasing attentions in dealing with the Big Data challenge. The maturity of Infrastructure as a Service (IaaS) of cloud computing further accelerates the adoption of Hadoop in solving Big Data problems. However, Hadoop is designed to process unstructured data such as texts, documents and web pages, and cannot effectively handle the scientific data format such as array-based NetCDF files and other binary data format. In this paper, we propose to build a Hadoop-based middleware for transparently handling big NetCDF data by 1) designing a distributed climate data storage mechanism based on POSIX-enabled parallel file system to enable parallel big data processing with MapReduce, as well as support data access by other systems; 2) modifying the Hadoop framework to transparently processing NetCDF data in parallel without sequencing or converting the data into other file formats, or loading them to HDFS; and 3) seamlessly integrating Hadoop, cloud computing and climate data in a highly scalable and fault-tolerance framework.
Secure distribution for high resolution remote sensing images
NASA Astrophysics Data System (ADS)
Liu, Jin; Sun, Jing; Xu, Zheng Q.
2010-09-01
The use of remote sensing images collected by space platforms is becoming more and more widespread. The increasing value of space data and its use in critical scenarios call for adoption of proper security measures to protect these data against unauthorized access and fraudulent use. In this paper, based on the characteristics of remote sensing image data and application requirements on secure distribution, a secure distribution method is proposed, including users and regions classification, hierarchical control and keys generation, and multi-level encryption based on regions. The combination of the three parts can make that the same remote sensing images after multi-level encryption processing are distributed to different permission users through multicast, but different permission users can obtain different degree information after decryption through their own decryption keys. It well meets user access control and security needs in the process of high resolution remote sensing image distribution. The experimental results prove the effectiveness of the proposed method which is suitable for practical use in the secure transmission of remote sensing images including confidential information over internet.
Distributed GPU Computing in GIScience
NASA Astrophysics Data System (ADS)
Jiang, Y.; Yang, C.; Huang, Q.; Li, J.; Sun, M.
2013-12-01
Geoscientists strived to discover potential principles and patterns hidden inside ever-growing Big Data for scientific discoveries. To better achieve this objective, more capable computing resources are required to process, analyze and visualize Big Data (Ferreira et al., 2003; Li et al., 2013). Current CPU-based computing techniques cannot promptly meet the computing challenges caused by increasing amount of datasets from different domains, such as social media, earth observation, environmental sensing (Li et al., 2013). Meanwhile CPU-based computing resources structured as cluster or supercomputer is costly. In the past several years with GPU-based technology matured in both the capability and performance, GPU-based computing has emerged as a new computing paradigm. Compare to traditional computing microprocessor, the modern GPU, as a compelling alternative microprocessor, has outstanding high parallel processing capability with cost-effectiveness and efficiency(Owens et al., 2008), although it is initially designed for graphical rendering in visualization pipe. This presentation reports a distributed GPU computing framework for integrating GPU-based computing within distributed environment. Within this framework, 1) for each single computer, computing resources of both GPU-based and CPU-based can be fully utilized to improve the performance of visualizing and processing Big Data; 2) within a network environment, a variety of computers can be used to build up a virtual super computer to support CPU-based and GPU-based computing in distributed computing environment; 3) GPUs, as a specific graphic targeted device, are used to greatly improve the rendering efficiency in distributed geo-visualization, especially for 3D/4D visualization. Key words: Geovisualization, GIScience, Spatiotemporal Studies Reference : 1. Ferreira de Oliveira, M. C., & Levkowitz, H. (2003). From visual data exploration to visual data mining: A survey. Visualization and Computer Graphics, IEEE Transactions on, 9(3), 378-394. 2. Li, J., Jiang, Y., Yang, C., Huang, Q., & Rice, M. (2013). Visualizing 3D/4D Environmental Data Using Many-core Graphics Processing Units (GPUs) and Multi-core Central Processing Units (CPUs). Computers & Geosciences, 59(9), 78-89. 3. Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). GPU computing. Proceedings of the IEEE, 96(5), 879-899.
The design and implementation of multi-source application middleware based on service bus
NASA Astrophysics Data System (ADS)
Li, Yichun; Jiang, Ningkang
2017-06-01
With the rapid development of the Internet of Things(IoT), the real-time monitoring data are increasing with different types and large amounts. Aiming at taking full advantages of the data, we designed and implemented an application middleware, which not only supports the three-layer architecture of IoT information system but also enables the flexible configuration of multiple resources access and other accessional modules. The middleware platform shows the characteristics of lightness, security, AoP (aspect-oriented programming), distribution and real-time, which can let application developers construct the information processing systems on related areas in a short period. It focuses not limited to these functions: pre-processing of data format, the definition of data entity, the callings and handlings of distributed service and massive data process. The result of experiment shows that the performance of middleware is more excellent than some message queue construction to some degree and its throughput grows better as the number of distributed nodes increases while the code is not complex. Currently, the middleware is applied to the system of Shanghai Pudong environmental protection agency and achieved a great success.
Integrating Data Distribution and Data Assimilation Between the OOI CI and the NOAA DIF
NASA Astrophysics Data System (ADS)
Meisinger, M.; Arrott, M.; Clemesha, A.; Farcas, C.; Farcas, E.; Im, T.; Schofield, O.; Krueger, I.; Klacansky, I.; Orcutt, J.; Peach, C.; Chave, A.; Raymer, D.; Vernon, F.
2008-12-01
The Ocean Observatories Initiative (OOI) is an NSF funded program to establish the ocean observing infrastructure of the 21st century benefiting research and education. It is currently approaching final design and promises to deliver cyber and physical observatory infrastructure components as well as substantial core instrumentation to study environmental processes of the ocean at various scales, from coastal shelf-slope exchange processes to the deep ocean. The OOI's data distribution network lies at the heart of its cyber- infrastructure, which enables a multitude of science and education applications, ranging from data analysis, to processing, visualization and ontology supported query and mediation. In addition, it fundamentally supports a class of applications exploiting the knowledge gained from analyzing observational data for objective-driven ocean observing applications, such as automatically triggered response to episodic environmental events and interactive instrument tasking and control. The U.S. Department of Commerce through NOAA operates the Integrated Ocean Observing System (IOOS) providing continuous data in various formats, rates and scales on open oceans and coastal waters to scientists, managers, businesses, governments, and the public to support research and inform decision-making. The NOAA IOOS program initiated development of the Data Integration Framework (DIF) to improve management and delivery of an initial subset of ocean observations with the expectation of achieving improvements in a select set of NOAA's decision-support tools. Both OOI and NOAA through DIF collaborate on an effort to integrate the data distribution, access and analysis needs of both programs. We present details and early findings from this collaboration; one part of it is the development of a demonstrator combining web-based user access to oceanographic data through ERDDAP, efficient science data distribution, and scalable, self-healing deployment in a cloud computing environment. ERDDAP is a web-based front-end application integrating oceanographic data sources of various formats, for instance CDF data files as aggregated through NcML or presented using a THREDDS server. The OOI-designed data distribution network provides global traffic management and computational load balancing for observatory resources; it makes use of the OpenDAP Data Access Protocol (DAP) for efficient canonical science data distribution over the network. A cloud computing strategy is the basis for scalable, self-healing organization of an observatory's computing and storage resources, independent of the physical location and technical implementation of these resources.
1993-12-01
0~0 S* NAVAL POSTGRADUATE SCHOOL Monterey, California DTIC ELECTE THESIS S APR 11 1994DU A SIMPLE, LOW OVERHEAD DATA COMPRESSION ALGORITHM FOR...A SIMPLE. LOW OVERHEAD DATA COMPRESSION ALGORITHM FOR CONVERTING LOSSY COMPRESSION PROCESSES TO LOSSLESS. 6. AUTHOR(S) Abbott, Walter D., III 7...Approved for public release; distribution is unlimited. A Simple, Low Overhead Data Compression Algorithm for Converting Lossy Processes to Lossless by
Research on distributed heterogeneous data PCA algorithm based on cloud platform
NASA Astrophysics Data System (ADS)
Zhang, Jin; Huang, Gang
2018-05-01
Principal component analysis (PCA) of heterogeneous data sets can solve the problem that centralized data scalability is limited. In order to reduce the generation of intermediate data and error components of distributed heterogeneous data sets, a principal component analysis algorithm based on heterogeneous data sets under cloud platform is proposed. The algorithm performs eigenvalue processing by using Householder tridiagonalization and QR factorization to calculate the error component of the heterogeneous database associated with the public key to obtain the intermediate data set and the lost information. Experiments on distributed DBM heterogeneous datasets show that the model method has the feasibility and reliability in terms of execution time and accuracy.
Discrete mathematics for spatial data classification and understanding
NASA Astrophysics Data System (ADS)
Mussio, Luigi; Nocera, Rossella; Poli, Daniela
1998-12-01
Data processing, in the field of information technology, requires new tools, involving discrete mathematics, like data compression, signal enhancement, data classification and understanding, hypertexts and multimedia (considering educational aspects too), because the mass of data implies automatic data management and doesn't permit any a priori knowledge. The methodologies and procedures used in this class of problems concern different kinds of segmentation techniques and relational strategies, like clustering, parsing, vectorization, formalization, fitting and matching. On the other hand, the complexity of this approach imposes to perform optimal sampling and outlier detection just at the beginning, in order to define the set of data to be processed: rough data supply very poor information. For these reasons, no hypotheses about the distribution behavior of the data can be generally done and a judgment should be acquired by distribution-free inference only.
A multiprocessing architecture for real-time monitoring
NASA Technical Reports Server (NTRS)
Schmidt, James L.; Kao, Simon M.; Read, Jackson Y.; Weitzenkamp, Scott M.; Laffey, Thomas J.
1988-01-01
A multitasking architecture for performing real-time monitoring and analysis using knowledge-based problem solving techniques is described. To handle asynchronous inputs and perform in real time, the system consists of three or more distributed processes which run concurrently and communicate via a message passing scheme. The Data Management Process acquires, compresses, and routes the incoming sensor data to other processes. The Inference Process consists of a high performance inference engine that performs a real-time analysis on the state and health of the physical system. The I/O Process receives sensor data from the Data Management Process and status messages and recommendations from the Inference Process, updates its graphical displays in real time, and acts as the interface to the console operator. The distributed architecture has been interfaced to an actual spacecraft (NASA's Hubble Space Telescope) and is able to process the incoming telemetry in real-time (i.e., several hundred data changes per second). The system is being used in two locations for different purposes: (1) in Sunnyville, California at the Space Telescope Test Control Center it is used in the preflight testing of the vehicle; and (2) in Greenbelt, Maryland at NASA/Goddard it is being used on an experimental basis in flight operations for health and safety monitoring.
Hadoop neural network for parallel and distributed feature selection.
Hodge, Victoria J; O'Keefe, Simon; Austin, Jim
2016-06-01
In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Electronic holography using binary phase modulation
NASA Astrophysics Data System (ADS)
Matoba, Osamu
2014-06-01
A 3D display system by using a phase-only distribution is presented. Especially, binary phase distribution is used to reconstruct a 3D object for wide viewing zone angle. To obtain the phase distribution to be displayed on a phase-mode spatial light modulator, both of experimental and numerical processes are available. In this paper, we present a numerical process by using a computer graphics data. A random phase distribution is attached to all polygons of an input 3D object to reconstruct a 3D object well from the binary phase distribution. Numerical and experimental results are presented to show the effectiveness of the proposed system.
Computational Unification: a Vision for Connecting Researchers
NASA Astrophysics Data System (ADS)
Troy, R. M.; Kingrey, O. J.
2002-12-01
Computational Unification of science, once only a vision, is becoming a reality. This technology is based upon a scientifically defensible, general solution for Earth Science data management and processing. The computational unification of science offers a real opportunity to foster inter and intra-discipline cooperation, and the end of 're-inventing the wheel'. As we move forward using computers as tools, it is past time to move from computationally isolating, "one-off" or discipline-specific solutions into a unified framework where research can be more easily shared, especially with researchers in other disciplines. The author will discuss how distributed meta-data, distributed processing and distributed data objects are structured to constitute a working interdisciplinary system, including how these resources lead to scientific defensibility through known lineage of all data products. Illustration of how scientific processes are encapsulated and executed illuminates how previously written processes and functions are integrated into the system efficiently and with minimal effort. Meta-data basics will illustrate how intricate relationships may easily be represented and used to good advantage. Retrieval techniques will be discussed including trade-offs of using meta-data versus embedded data, how the two may be integrated, and how simplifying assumptions may or may not help. This system is based upon the experience of the Sequoia 2000 and BigSur research projects at the University of California, Berkeley, whose goals were to find an alternative to the Hughes EOS-DIS system and is presently offered by Science Tools corporation, of which the author is a principal.
NASA's Earth Science Data Systems
NASA Technical Reports Server (NTRS)
Ramapriyan, H. K.
2015-01-01
NASA's Earth Science Data Systems (ESDS) Program has evolved over the last two decades, and currently has several core and community components. Core components provide the basic operational capabilities to process, archive, manage and distribute data from NASA missions. Community components provide a path for peer-reviewed research in Earth Science Informatics to feed into the evolution of the core components. The Earth Observing System Data and Information System (EOSDIS) is a core component consisting of twelve Distributed Active Archive Centers (DAACs) and eight Science Investigator-led Processing Systems spread across the U.S. The presentation covers how the ESDS Program continues to evolve and benefits from as well as contributes to advances in Earth Science Informatics.
NASA Technical Reports Server (NTRS)
1985-01-01
Topics addressed include: assessment models; model predictions of ozone changes; ozone and temperature trends; trace gas effects on climate; kinetics and photchemical data base; spectroscopic data base (infrared to microwave); instrument intercomparisons and assessments; and monthly mean distribution of ozone and temperature.
NASA Technical Reports Server (NTRS)
Pordes, Ruth (Editor)
1989-01-01
Papers on real-time computer applications in nuclear, particle, and plasma physics are presented, covering topics such as expert systems tactics in testing FASTBUS segment interconnect modules, trigger control in a high energy physcis experiment, the FASTBUS read-out system for the Aleph time projection chamber, a multiprocessor data acquisition systems, DAQ software architecture for Aleph, a VME multiprocessor system for plasma control at the JT-60 upgrade, and a multiasking, multisinked, multiprocessor data acquisition front end. Other topics include real-time data reduction using a microVAX processor, a transputer based coprocessor for VEDAS, simulation of a macropipelined multi-CPU event processor for use in FASTBUS, a distributed VME control system for the LISA superconducting Linac, a distributed system for laboratory process automation, and a distributed system for laboratory process automation. Additional topics include a structure macro assembler for the event handler, a data acquisition and control system for Thomson scattering on ATF, remote procedure execution software for distributed systems, and a PC-based graphic display real-time particle beam uniformity.
NASA Astrophysics Data System (ADS)
Carbone, F.; Bruno, A. G.; Naccarato, A.; De Simone, F.; Gencarelli, C. N.; Sprovieri, F.; Hedgecock, I. M.; Landis, M. S.; Skov, H.; Pfaffhuber, K. A.; Read, K. A.; Martin, L.; Angot, H.; Dommergue, A.; Magand, O.; Pirrone, N.
2018-01-01
The probability density function (PDF) of the time intervals between subsequent extreme events in atmospheric Hg0 concentration data series from different latitudes has been investigated. The Hg0 dynamic possesses a long-term memory autocorrelation function. Above a fixed threshold Q in the data, the PDFs of the interoccurrence time of the Hg0 data are well described by a Tsallis q-exponential function. This PDF behavior has been explained in the framework of superstatistics, where the competition between multiple mesoscopic processes affects the macroscopic dynamics. An extensive parameter μ, encompassing all possible fluctuations related to mesoscopic phenomena, has been identified. It follows a χ2 distribution, indicative of the superstatistical nature of the overall process. Shuffling the data series destroys the long-term memory, the distributions become independent of Q, and the PDFs collapse on to the same exponential distribution. The possible central role of atmospheric turbulence on extreme events in the Hg0 data is highlighted.
Analysis Using Bi-Spectral Related Technique
1993-11-17
filtering is employed as the data is processed (equation 1). Earlier results have shown that in contrast to the Wigner - Ville Distribution ( WVD ) no spectral...Technique by-o -~ Ralph Hippenstiel November 17, 1993 94 2 22 1 0 Approved for public reslease; distribution unlimited. Prepared for: Naval Command Control...Government. 12a. DISTRIBUTION /AVAILABILITY STATEMENT 12b. DISTRIBUTION ’.ODE Approved for public relkase; distribution unlimited. 13. ABSTRACT (Maximum
Research on distributed optical fiber sensing data processing method based on LabVIEW
NASA Astrophysics Data System (ADS)
Li, Zhonghu; Yang, Meifang; Wang, Luling; Wang, Jinming; Yan, Junhong; Zuo, Jing
2018-01-01
The pipeline leak detection and leak location problem have gotten extensive attention in the industry. In this paper, the distributed optical fiber sensing system is designed based on the heat supply pipeline. The data processing method of distributed optical fiber sensing based on LabVIEW is studied emphatically. The hardware system includes laser, sensing optical fiber, wavelength division multiplexer, photoelectric detector, data acquisition card and computer etc. The software system is developed using LabVIEW. The software system adopts wavelet denoising method to deal with the temperature information, which improved the SNR. By extracting the characteristic value of the fiber temperature information, the system can realize the functions of temperature measurement, leak location and measurement signal storage and inquiry etc. Compared with traditional negative pressure wave method or acoustic signal method, the distributed optical fiber temperature measuring system can measure several temperatures in one measurement and locate the leak point accurately. It has a broad application prospect.
NASA Astrophysics Data System (ADS)
WANG, Qingrong; ZHU, Changfeng
2017-06-01
Integration of distributed heterogeneous data sources is the key issues under the big data applications. In this paper the strategy of variable precision is introduced to the concept lattice, and the one-to-one mapping mode of variable precision concept lattice and ontology concept lattice is constructed to produce the local ontology by constructing the variable precision concept lattice for each subsystem, and the distributed generation algorithm of variable precision concept lattice based on ontology heterogeneous database is proposed to draw support from the special relationship between concept lattice and ontology construction. Finally, based on the standard of main concept lattice of the existing heterogeneous database generated, a case study has been carried out in order to testify the feasibility and validity of this algorithm, and the differences between the main concept lattice and the standard concept lattice are compared. Analysis results show that this algorithm above-mentioned can automatically process the construction process of distributed concept lattice under the heterogeneous data sources.
Enabling the Continuous EOS-SNPP Satellite Data Record thru EOSDIS Services
NASA Astrophysics Data System (ADS)
Hall, A.; Behnke, J.; Ho, E. L.
2015-12-01
Following Suomi National Polar-Orbiting Partnership (SNPP) launch of October 2011, the role of the NASA Science Data Segment (SDS) focused primarily on evaluation of the sensor data records (SDRs) and environmental data records (EDRs) produced by the Joint Polar Satellite System (JPSS), a National Oceanic and Atmosphere Administration (NOAA) Program as to their suitability for Earth system science. The evaluation has been completed for Visible Infrared Imager Radiometer Suite (VIIRS), Advanced Technology Microwave Sounder (ATMS), Cross-track Infrared Sounder (CrIS), and Ozone Mapper/Profiler Suite (OMPS) Nadir instruments. Since launch, the SDS has also been processing, archiving and distributing data from the Clouds and the Earth's Radiant Energy System (CERES) and Ozone Mapper/Profiler Suite (OMPS) Limb instruments and this work is planned to continue through the life of the mission. As NASA transitions to the production of standard, Earth Observing System (EOS)-like science products for all instruments aboard Suomi NPP, the Suomi NPP Science Team (ST) will need data processing and production facilities to produce the new science products they develop. The five Science Investigator-led Processing Systems (SIPS): Land, Ocean. Atmosphere, Ozone, and Sounder will produce the NASA SNPP standard Level 1, Level 2, and global Level 3 products and provide the products to the NASA's Distributed Active Archive Centers (DAACs) for distribution to the user community. The SIPS will ingest EOS compatible Level 0 data from EOS Data Operations System (EDOS) for their data processing. A key feature is the use of Earth Observing System Data and Information System (EOSDIS) services for the continuous EOS-SNPP satellite data record. This allows users to use the same tools and interfaces on SNPP as they would on the entire NASA Earth Science data collection in EOSDIS.
Rain-rate data base development and rain-rate climate analysis
NASA Technical Reports Server (NTRS)
Crane, Robert K.
1993-01-01
The single-year rain-rate distribution data available within the archives of Consultative Committee for International Radio (CCIR) Study Group 5 were compiled into a data base for use in rain-rate climate modeling and for the preparation of predictions of attenuation statistics. The four year set of tip-time sequences provided by J. Goldhirsh for locations near Wallops Island were processed to compile monthly and annual distributions of rain rate and of event durations for intervals above and below preset thresholds. A four-year data set of tropical rain-rate tip-time sequences were acquired from the NASA TRMM program for 30 gauges near Darwin, Australia. They were also processed for inclusion in the CCIR data base and the expanded data base for monthly observations at the University of Oklahoma. The empirical rain-rate distributions (edfs) accepted for inclusion in the CCIR data base were used to estimate parameters for several rain-rate distribution models: the lognormal model, the Crane two-component model, and the three parameter model proposed by Moupfuma. The intent of this segment of the study is to obtain a limited set of parameters that can be mapped globally for use in rain attenuation predictions. If the form of the distribution can be established, then perhaps available climatological data can be used to estimate the parameters rather than requiring years of rain-rate observations to set the parameters. The two-component model provided the best fit to the Wallops Island data but the Moupfuma model provided the best fit to the Darwin data.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.
2000-01-01
Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
Fission meter and neutron detection using poisson distribution comparison
Rowland, Mark S; Snyderman, Neal J
2014-11-18
A neutron detector system and method for discriminating fissile material from non-fissile material wherein a digital data acquisition unit collects data at high rate, and in real-time processes large volumes of data directly into information that a first responder can use to discriminate materials. The system comprises counting neutrons from the unknown source and detecting excess grouped neutrons to identify fission in the unknown source. Comparison of the observed neutron count distribution with a Poisson distribution is performed to distinguish fissile material from non-fissile material.
NASA SNPP SIPS - Following in the Path of EOS
NASA Technical Reports Server (NTRS)
Behnke, Jeanne; Hall, Alfreda; Ho, Evelyn
2016-01-01
NASA's Earth Science Data Information System (ESDIS) Project has been operating NASA's Suomi National Polar-Orbiting Partnership (SNPP) Science Data Segment (SDS) since the launch in October 2011. At launch, the SDS focused primarily on the evaluation of Sensor Data Records (SDRs) and Environmental Data Records (EDRs) produced by the Joint Polar Satellite System (JPSS), a National Oceanic and Atmosphere Administration (NOAA) Program, as to their suitability for Earth system science. During the summer of 2014, NASA transitioned to the production of standard Earth Observing System (EOS)-like science products for all instruments aboard Suomi NPP. The five Science Investigator-led Processing Systems (SIPS): Land, Ocean, Atmosphere, Ozone, and Sounder were established to produce the NASA SNPP standard Level 1, Level 2, and global Level 3 products developed by the SNPP Science Teams and to provide the products to NASA's Distributed Active Archive Centers (DAACs) for archive and distribution to the user community. The processing, archiving and distribution of data from NASA's Clouds and the Earth's Radiant Energy System (CERES) and Ozone Mapper/Profiler Suite (OMPS) Limb instruments will continue. With the implementation of the JPSS Block 2 architecture and the launch of JPSS-1, the SDS will receive SNPP data in near real-time via the JPSS Stored Mission Data Hub (JSH), as well as JPSS-1 and future JPSS-2 data. The SNPP SIPS will ingest EOS compatible Level 0 data from the EOS Data Operations System (EDOS) element for their data processing, enabling the continuous EOS-SNPP-JPSS Satellite Data Record.
A preliminary analysis of quantifying computer security vulnerability data in "the wild"
NASA Astrophysics Data System (ADS)
Farris, Katheryn A.; McNamara, Sean R.; Goldstein, Adam; Cybenko, George
2016-05-01
A system of computers, networks and software has some level of vulnerability exposure that puts it at risk to criminal hackers. Presently, most vulnerability research uses data from software vendors, and the National Vulnerability Database (NVD). We propose an alternative path forward through grounding our analysis in data from the operational information security community, i.e. vulnerability data from "the wild". In this paper, we propose a vulnerability data parsing algorithm and an in-depth univariate and multivariate analysis of the vulnerability arrival and deletion process (also referred to as the vulnerability birth-death process). We find that vulnerability arrivals are best characterized by the log-normal distribution and vulnerability deletions are best characterized by the exponential distribution. These distributions can serve as prior probabilities for future Bayesian analysis. We also find that over 22% of the deleted vulnerability data have a rate of zero, and that the arrival vulnerability data is always greater than zero. Finally, we quantify and visualize the dependencies between vulnerability arrivals and deletions through a bivariate scatterplot and statistical observations.
Distributed and parallel approach for handle and perform huge datasets
NASA Astrophysics Data System (ADS)
Konopko, Joanna
2015-12-01
Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.
Parallel task processing of very large datasets
NASA Astrophysics Data System (ADS)
Romig, Phillip Richardson, III
This research concerns the use of distributed computer technologies for the analysis and management of very large datasets. Improvements in sensor technology, an emphasis on global change research, and greater access to data warehouses all are increase the number of non-traditional users of remotely sensed data. We present a framework for distributed solutions to the challenges of datasets which exceed the online storage capacity of individual workstations. This framework, called parallel task processing (PTP), incorporates both the task- and data-level parallelism exemplified by many image processing operations. An implementation based on the principles of PTP, called Tricky, is also presented. Additionally, we describe the challenges and practical issues in modeling the performance of parallel task processing with large datasets. We present a mechanism for estimating the running time of each unit of work within a system and an algorithm that uses these estimates to simulate the execution environment and produce estimated runtimes. Finally, we describe and discuss experimental results which validate the design. Specifically, the system (a) is able to perform computation on datasets which exceed the capacity of any one disk, (b) provides reduction of overall computation time as a result of the task distribution even with the additional cost of data transfer and management, and (c) in the simulation mode accurately predicts the performance of the real execution environment.
Distributed visualization of gridded geophysical data: the Carbon Data Explorer, version 0.2.3
NASA Astrophysics Data System (ADS)
Endsley, K. A.; Billmire, M. G.
2016-01-01
Due to the proliferation of geophysical models, particularly climate models, the increasing resolution of their spatiotemporal estimates of Earth system processes, and the desire to easily share results with collaborators, there is a genuine need for tools to manage, aggregate, visualize, and share data sets. We present a new, web-based software tool - the Carbon Data Explorer - that provides these capabilities for gridded geophysical data sets. While originally developed for visualizing carbon flux, this tool can accommodate any time-varying, spatially explicit scientific data set, particularly NASA Earth system science level III products. In addition, the tool's open-source licensing and web presence facilitate distributed scientific visualization, comparison with other data sets and uncertainty estimates, and data publishing and distribution.
Some considerations on the use of ecological models to predict species' geographic distributions
Peterjohn, B.G.
2001-01-01
Peterson (2001) used Genetic Algorithm for Rule-set Prediction (GARP) models to predict distribution patterns from Breeding Bird Survey (BBS) data. Evaluations of these models should consider inherent limitations of BBS data: (1) BBS methods may not sample species and habitats equally; (2) using BBS data for both model development and testing may overlook poor fit of some models; and (3) BBS data may not provide the desired spatial resolution or capture temporal changes in species distributions. The predictive value of GARP models requires additional study, especially comparisons with distribution patterns from independent data sets. When employed at appropriate temporal and geographic scales, GARP models show considerable promise for conservation biology applications but provide limited inferences concerning processes responsible for the observed patterns.
Nonstationary Dynamics Data Analysis with Wavelet-SVD Filtering
NASA Technical Reports Server (NTRS)
Brenner, Marty; Groutage, Dale; Bessette, Denis (Technical Monitor)
2001-01-01
Nonstationary time-frequency analysis is used for identification and classification of aeroelastic and aeroservoelastic dynamics. Time-frequency multiscale wavelet processing generates discrete energy density distributions. The distributions are processed using the singular value decomposition (SVD). Discrete density functions derived from the SVD generate moments that detect the principal features in the data. The SVD standard basis vectors are applied and then compared with a transformed-SVD, or TSVD, which reduces the number of features into more compact energy density concentrations. Finally, from the feature extraction, wavelet-based modal parameter estimation is applied.
ClimateSpark: An In-memory Distributed Computing Framework for Big Climate Data Analytics
NASA Astrophysics Data System (ADS)
Hu, F.; Yang, C. P.; Duffy, D.; Schnase, J. L.; Li, Z.
2016-12-01
Massive array-based climate data is being generated from global surveillance systems and model simulations. They are widely used to analyze the environment problems, such as climate changes, natural hazards, and public health. However, knowing the underlying information from these big climate datasets is challenging due to both data- and computing- intensive issues in data processing and analyzing. To tackle the challenges, this paper proposes ClimateSpark, an in-memory distributed computing framework to support big climate data processing. In ClimateSpark, the spatiotemporal index is developed to enable Apache Spark to treat the array-based climate data (e.g. netCDF4, HDF4) as native formats, which are stored in Hadoop Distributed File System (HDFS) without any preprocessing. Based on the index, the spatiotemporal query services are provided to retrieve dataset according to a defined geospatial and temporal bounding box. The data subsets will be read out, and a data partition strategy will be applied to equally split the queried data to each computing node, and store them in memory as climateRDDs for processing. By leveraging Spark SQL and User Defined Function (UDFs), the climate data analysis operations can be conducted by the intuitive SQL language. ClimateSpark is evaluated by two use cases using the NASA Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. One use case is to conduct the spatiotemporal query and visualize the subset results in animation; the other one is to compare different climate model outputs using Taylor-diagram service. Experimental results show that ClimateSpark can significantly accelerate data query and processing, and enable the complex analysis services served in the SQL-style fashion.
Hazlehurst, Brian L; Kurtz, Stephen E; Masica, Andrew; Stevens, Victor J; McBurnie, Mary Ann; Puro, Jon E; Vijayadeva, Vinutha; Au, David H; Brannon, Elissa D; Sittig, Dean F
2015-10-01
Comparative effectiveness research (CER) requires the capture and analysis of data from disparate sources, often from a variety of institutions with diverse electronic health record (EHR) implementations. In this paper we describe the CER Hub, a web-based informatics platform for developing and conducting research studies that combine comprehensive electronic clinical data from multiple health care organizations. The CER Hub platform implements a data processing pipeline that employs informatics standards for data representation and web-based tools for developing study-specific data processing applications, providing standardized access to the patient-centric electronic health record (EHR) across organizations. The CER Hub is being used to conduct two CER studies utilizing data from six geographically distributed and demographically diverse health systems. These foundational studies address the effectiveness of medications for controlling asthma and the effectiveness of smoking cessation services delivered in primary care. The CER Hub includes four key capabilities: the ability to process and analyze both free-text and coded clinical data in the EHR; a data processing environment supported by distributed data and study governance processes; a clinical data-interchange format for facilitating standardized extraction of clinical data from EHRs; and a library of shareable clinical data processing applications. CER requires coordinated and scalable methods for extracting, aggregating, and analyzing complex, multi-institutional clinical data. By offering a range of informatics tools integrated into a framework for conducting studies using EHR data, the CER Hub provides a solution to the challenges of multi-institutional research using electronic medical record data. Copyright © 2015. Published by Elsevier Ireland Ltd.
Transverse parton momenta in single inclusive hadron production in e+ e- annihilation processes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boglione, M.; Gonzalez-Hernandez, J. O.; Taghavi, R.
Here, we study the transverse momentum distributions of single inclusive hadron production in e+e- annihilation processes. Although the only available experimental data are scarce and quite old, we find that the fundamental features of transverse momentum dependent (TMD) evolution, historically addressed in Drell–Yan processes and, more recently, in Semi-inclusive deep inelastic scattering processes, are visible in e+e- annihilations as well. Interesting effects related to its non-perturbative regime can be observed. We test two different parameterizations for the p more » $$\\perp$$ dependence of the cross section: the usual Gaussian distribution and a power-law model. We find the latter to be more appropriate in describing this particular set of experimental data, over a relatively large range of p $$\\perp$$ values. We use this model to map some of the features of the data within the framework of TMD evolution, and discuss the caveats of this and other possible interpretations, related to the one-dimensional nature of the available experimental data.« less
Transverse parton momenta in single inclusive hadron production in e+ e- annihilation processes
Boglione, M.; Gonzalez-Hernandez, J. O.; Taghavi, R.
2017-06-17
Here, we study the transverse momentum distributions of single inclusive hadron production in e+e- annihilation processes. Although the only available experimental data are scarce and quite old, we find that the fundamental features of transverse momentum dependent (TMD) evolution, historically addressed in Drell–Yan processes and, more recently, in Semi-inclusive deep inelastic scattering processes, are visible in e+e- annihilations as well. Interesting effects related to its non-perturbative regime can be observed. We test two different parameterizations for the p more » $$\\perp$$ dependence of the cross section: the usual Gaussian distribution and a power-law model. We find the latter to be more appropriate in describing this particular set of experimental data, over a relatively large range of p $$\\perp$$ values. We use this model to map some of the features of the data within the framework of TMD evolution, and discuss the caveats of this and other possible interpretations, related to the one-dimensional nature of the available experimental data.« less
Currie, L A
2001-07-01
Three general classes of skewed data distributions have been encountered in research on background radiation, chemical and radiochemical blanks, and low levels of 85Kr and 14C in the atmosphere and the cryosphere. The first class of skewed data can be considered to be theoretically, or fundamentally skewed. It is typified by the exponential distribution of inter-arrival times for nuclear counting events for a Poisson process. As part of a study of the nature of low-level (anti-coincidence) Geiger-Muller counter background radiation, tests were performed on the Poisson distribution of counts, the uniform distribution of arrival times, and the exponential distribution of inter-arrival times. The real laboratory system, of course, failed the (inter-arrival time) test--for very interesting reasons, linked to the physics of the measurement process. The second, computationally skewed, class relates to skewness induced by non-linear transformations. It is illustrated by non-linear concentration estimates from inverse calibration, and bivariate blank corrections for low-level 14C-12C aerosol data that led to highly asymmetric uncertainty intervals for the biomass carbon contribution to urban "soot". The third, environmentally, skewed, data class relates to a universal problem for the detection of excursions above blank or baseline levels: namely, the widespread occurrence of ab-normal distributions of environmental and laboratory blanks. This is illustrated by the search for fundamental factors that lurk behind skewed frequency distributions of sulfur laboratory blanks and 85Kr environmental baselines, and the application of robust statistical procedures for reliable detection decisions in the face of skewed isotopic carbon procedural blanks with few degrees of freedom.
Lu, Tao; Lu, Minggen; Wang, Min; Zhang, Jun; Dong, Guang-Hui; Xu, Yong
2017-12-18
Longitudinal competing risks data frequently arise in clinical studies. Skewness and missingness are commonly observed for these data in practice. However, most joint models do not account for these data features. In this article, we propose partially linear mixed-effects joint models to analyze skew longitudinal competing risks data with missingness. In particular, to account for skewness, we replace the commonly assumed symmetric distributions by asymmetric distribution for model errors. To deal with missingness, we employ an informative missing data model. The joint models that couple the partially linear mixed-effects model for the longitudinal process, the cause-specific proportional hazard model for competing risks process and missing data process are developed. To estimate the parameters in the joint models, we propose a fully Bayesian approach based on the joint likelihood. To illustrate the proposed model and method, we implement them to an AIDS clinical study. Some interesting findings are reported. We also conduct simulation studies to validate the proposed method.
Probabilistic measures of persistence and extinction in measles (meta)populations.
Gunning, Christian E; Wearing, Helen J
2013-08-01
Persistence and extinction are fundamental processes in ecological systems that are difficult to accurately measure due to stochasticity and incomplete observation. Moreover, these processes operate on multiple scales, from individual populations to metapopulations. Here, we examine an extensive new data set of measles case reports and associated demographics in pre-vaccine era US cities, alongside a classic England & Wales data set. We first infer the per-population quasi-continuous distribution of log incidence. We then use stochastic, spatially implicit metapopulation models to explore the frequency of rescue events and apparent extinctions. We show that, unlike critical community size, the inferred distributions account for observational processes, allowing direct comparisons between metapopulations. The inferred distributions scale with population size. We use these scalings to estimate extinction boundary probabilities. We compare these predictions with measurements in individual populations and random aggregates of populations, highlighting the importance of medium-sized populations in metapopulation persistence. © 2013 John Wiley & Sons Ltd/CNRS.
NASA Technical Reports Server (NTRS)
Behnke, Jeanne; Doescher, Chris
2015-01-01
This presentation discusses 25 years of interactions between NASA and the USGS to manage a Land Processes Distributed Active Archive Center (LPDAAC) for the purpose of providing users access to NASA's rich collection of Earth Science data. The presentation addresses challenges, efforts and metrics on the performance.
Building a Trustworthy Environmental Science Data Repository: Lessons Learned from the ORNL DAAC
NASA Astrophysics Data System (ADS)
Wei, Y.; Santhana Vannan, S. K.; Boyer, A.; Beaty, T.; Deb, D.; Hook, L.
2017-12-01
The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC, https://daac.ornl.gov) for biogeochemical dynamics is one of NASA's Earth Observing System Data and Information System (EOSDIS) data centers. The mission of the ORNL DAAC is to assemble, distribute, and provide data services for a comprehensive archive of terrestrial biogeochemistry and ecological dynamics observations and models to facilitate research, education, and decision-making in support of NASA's Earth Science. Since its establishment in 1994, ORNL DAAC has been continuously building itself into a trustworthy environmental science data repository by not only ensuring the quality and usability of its data holdings, but also optimizing its data publication and management process. This paper describes the lessons learned from ORNL DAAC's effort toward this goal. ORNL DAAC has been proactively implementing international community standards throughout its data management life cycle, including data publication, preservation, discovery, visualization, and distribution. Data files in standard formats, detailed documentation, and metadata following standard models are prepared to improve the usability and longevity of data products. Assignment of a Digital Object Identifier (DOI) ensures the identifiability and accessibility of every data product, including the different versions and revisions of its life cycle. ORNL DAAC's data citation policy assures data producers receive appropriate recognition of use of their products. Web service standards, such as OpenSearch and Open Geospatial Consortium (OGC), promotes the discovery, visualization, distribution, and integration of ORNL DAAC's data holdings. Recently, ORNL DAAC began efforts to optimize and standardize its data archival and data publication workflows, to improve the efficiency and transparency of its data archival and management processes.
Sentinel-1 Interferometry from the Cloud to the Scientist
NASA Astrophysics Data System (ADS)
Garron, J.; Stoner, C.; Johnston, A.; Arko, S. A.
2017-12-01
Big data problems and solutions are growing in the technological and scientific sectors daily. Cloud computing is a vertically and horizontally scalable solution available now for archiving and processing large volumes of data quickly, without significant on-site computing hardware costs. Be that as it may, the conversion of scientific data processors to these powerful platforms requires not only the proof of concept, but the demonstration of credibility in an operational setting. The Alaska Satellite Facility (ASF) Distributed Active Archive Center (DAAC), in partnership with NASA's Jet Propulsion Laboratory, is exploring the functional architecture of Amazon Web Services cloud computing environment for the processing, distribution and archival of Synthetic Aperture Radar data in preparation for the NASA-ISRO Synthetic Aperture Radar (NISAR) Mission. Leveraging built-in AWS services for logging, monitoring and dashboarding, the GRFN (Getting Ready for NISAR) team has built a scalable processing, distribution and archival system of Sentinel-1 L2 interferograms produced using the ISCE algorithm. This cloud-based functional prototype provides interferograms over selected global land deformation features (volcanoes, land subsidence, seismic zones) and are accessible to scientists via NASA's EarthData Search client and the ASF DAACs primary SAR interface, Vertex, for direct download. The interferograms are produced using nearest-neighbor logic for identifying pairs of granules for interferometric processing, creating deep stacks of BETA products from almost every satellite orbit for scientists to explore. This presentation highlights the functional lessons learned to date from this exercise, including the cost analysis of various data lifecycle policies as implemented through AWS. While demonstrating the architecture choices in support of efficient big science data management, we invite feedback and questions about the process and products from the InSAR community.
NASA Technical Reports Server (NTRS)
Sasmal, G. P.; Hochstein, J. I.; Wendl, M. C.; Hardy, T. L.
1991-01-01
A multidimensional computational model of the pressurization process in a slush hydrogen propellant storage tank was developed and its accuracy evaluated by comparison to experimental data measured for a 5 ft diameter spherical tank. The fluid mechanic, thermodynamic, and heat transfer processes within the ullage are represented by a finite-volume model. The model was shown to be in reasonable agreement with the experiment data. A parameter study was undertaken to examine the dependence of the pressurization process on initial ullage temperature distribution and pressurant mass flow rate. It is shown that for a given heat flux rate at the ullage boundary, the pressurization process is nearly independent of initial temperature distribution. Significant differences were identified between the ullage temperature and velocity fields predicted for pressurization of slush and those predicted for pressurization of liquid hydrogen. A simplified model of the pressurization process was constructed in search of a dimensionless characterization of the pressurization process. It is shown that the relationship derived from this simplified model collapses all of the pressure history data generated during this study into a single curve.
NASA Astrophysics Data System (ADS)
Shiklomanov, A. I.; Okladnikov, I.; Gordov, E. P.; Proussevitch, A. A.; Titov, A. G.
2016-12-01
Presented is a collaborative project carrying out by joint team of researchers from the Institute of Monitoring of Climatic and Ecological Systems, Russia and Earth Systems Research Center, University of New Hampshire, USA. Its main objective is development of a hardware and software prototype of Distributed Research Center (DRC) for monitoring and projecting of regional climatic and and their impacts on the environment over the Northern extratropical areas. In the framework of the project new approaches to "cloud" processing and analysis of large geospatial datasets (big geospatial data) are being developed. It will be deployed on technical platforms of both institutions and applied in research of climate change and its consequences. Datasets available at NCEI and IMCES include multidimensional arrays of climatic, environmental, demographic, and socio-economic characteristics. The project is aimed at solving several major research and engineering tasks: 1) structure analysis of huge heterogeneous climate and environmental geospatial datasets used in the project, their preprocessing and unification; 2) development of a new distributed storage and processing model based on a "shared nothing" paradigm; 3) development of a dedicated database of metadata describing geospatial datasets used in the project; 4) development of a dedicated geoportal and a high-end graphical frontend providing intuitive user interface, internet-accessible online tools for analysis of geospatial data and web services for interoperability with other geoprocessing software packages. DRC will operate as a single access point to distributed archives of spatial data and online tools for their processing. Flexible modular computational engine running verified data processing routines will provide solid results of geospatial data analysis. "Cloud" data analysis and visualization approach will guarantee access to the DRC online tools and data from all over the world. Additionally, exporting of data processing results through WMS and WFS services will be used to provide their interoperability. Financial support of this activity by the RF Ministry of Education and Science under Agreement 14.613.21.0037 (RFMEFI61315X0037) and by the Iola Hubbard Climate Change Endowment is acknowledged.
Software techniques for a distributed real-time processing system. [for spacecraft
NASA Technical Reports Server (NTRS)
Lesh, F.; Lecoq, P.
1976-01-01
The paper describes software techniques developed for the Unified Data System (UDS), a distributed processor network for control and data handling onboard a planetary spacecraft. These techniques include a structured language for specifying the programs contained in each module, and a small executive program in each module which performs scheduling and implements the module task.
Rapid Inversion of Angular Deflection Data for Certain Axisymmetric Refractive Index Distributions
NASA Technical Reports Server (NTRS)
Rubinstein, R.; Greenberg, P. S.
1994-01-01
Certain functions useful for representing axisymmetric refractive-index distributions are shown to have exact solutions for Abel transformation of the resulting angular deflection data. An advantage of this procedure over direct numerical Abel inversion is that least-squares curve fitting is a smoothing process that reduces the noise sensitivity of the computation
Extending Marine Species Distribution Maps Using Non-Traditional Sources
Moretzsohn, Fabio; Gibeaut, James
2015-01-01
Abstract Background Traditional sources of species occurrence data such as peer-reviewed journal articles and museum-curated collections are included in species databases after rigorous review by species experts and evaluators. The distribution maps created in this process are an important component of species survival evaluations, and are used to adapt, extend and sometimes contract polygons used in the distribution mapping process. New information During an IUCN Red List Gulf of Mexico Fishes Assessment Workshop held at The Harte Research Institute for Gulf of Mexico Studies, a session included an open discussion on the topic of including other sources of species occurrence data. During the last decade, advances in portable electronic devices and applications enable 'citizen scientists' to record images, location and data about species sightings, and submit that data to larger species databases. These applications typically generate point data. Attendees of the workshop expressed an interest in how that data could be incorporated into existing datasets, how best to ascertain the quality and value of that data, and what other alternate data sources are available. This paper addresses those issues, and provides recommendations to ensure quality data use. PMID:25941453
NASA Astrophysics Data System (ADS)
Wright, D. J.; Raad, M.; Hoel, E.; Park, M.; Mollenkopf, A.; Trujillo, R.
2016-12-01
Introduced is a new approach for processing spatiotemporal big data by leveraging distributed analytics and storage. A suite of temporally-aware analysis tools summarizes data nearby or within variable windows, aggregates points (e.g., for various sensor observations or vessel positions), reconstructs time-enabled points into tracks (e.g., for mapping and visualizing storm tracks), joins features (e.g., to find associations between features based on attributes, spatial relationships, temporal relationships or all three simultaneously), calculates point densities, finds hot spots (e.g., in species distributions), and creates space-time slices and cubes (e.g., in microweather applications with temperature, humidity, and pressure, or within human mobility studies). These "feature geo analytics" tools run in both batch and streaming spatial analysis mode as distributed computations across a cluster of servers on typical "big" data sets, where static data exist in traditional geospatial formats (e.g., shapefile) locally on a disk or file share, attached as static spatiotemporal big data stores, or streamed in near-real-time. In other words, the approach registers large datasets or data stores with ArcGIS Server, then distributes analysis across a cluster of machines for parallel processing. Several brief use cases will be highlighted based on a 16-node server cluster at 14 Gb RAM per node, allowing, for example, the buffering of over 8 million points or thousands of polygons in 1 minute. The approach is "hybrid" in that ArcGIS Server integrates open-source big data frameworks such as Apache Hadoop and Apache Spark on the cluster in order to run the analytics. In addition, the user may devise and connect custom open-source interfaces and tools developed in Python or Python Notebooks; the common denominator being the familiar REST API.
Matthew P. Peters; Louis R. Iverson; Anantha M. Prasad; Steve N. Matthews
2013-01-01
Fine-scale soil (SSURGO) data were processed at the county level for 37 states within the eastern United States, initially for use as predictor variables in a species distribution model called DISTRIB II. Values from county polygon files converted into a continuous 30-m raster grid were aggregated to 4-km cells and integrated with other environmental and site condition...
On the application of Rice's exceedance statistics to atmospheric turbulence.
NASA Technical Reports Server (NTRS)
Chen, W. Y.
1972-01-01
Discrepancies produced by the application of Rice's exceedance statistics to atmospheric turbulence are examined. First- and second-order densities from several data sources have been measured for this purpose. Particular care was paid to each selection of turbulence that provides stationary mean and variance over the entire segment. Results show that even for a stationary segment of turbulence, the process is still highly non-Gaussian, in spite of a Gaussian appearance for its first-order distribution. Data also indicate strongly non-Gaussian second-order distributions. It is therefore concluded that even stationary atmospheric turbulence with a normal first-order distribution cannot be considered a Gaussian process, and consequently the application of Rice's exceedance statistics should be approached with caution.
Distributions of Autocorrelated First-Order Kinetic Outcomes: Illness Severity
Englehardt, James D.
2015-01-01
Many complex systems produce outcomes having recurring, power law-like distributions over wide ranges. However, the form necessarily breaks down at extremes, whereas the Weibull distribution has been demonstrated over the full observed range. Here the Weibull distribution is derived as the asymptotic distribution of generalized first-order kinetic processes, with convergence driven by autocorrelation, and entropy maximization subject to finite positive mean, of the incremental compounding rates. Process increments represent multiplicative causes. In particular, illness severities are modeled as such, occurring in proportion to products of, e.g., chronic toxicant fractions passed by organs along a pathway, or rates of interacting oncogenic mutations. The Weibull form is also argued theoretically and by simulation to be robust to the onset of saturation kinetics. The Weibull exponential parameter is shown to indicate the number and widths of the first-order compounding increments, the extent of rate autocorrelation, and the degree to which process increments are distributed exponential. In contrast with the Gaussian result in linear independent systems, the form is driven not by independence and multiplicity of process increments, but by increment autocorrelation and entropy. In some physical systems the form may be attracting, due to multiplicative evolution of outcome magnitudes towards extreme values potentially much larger and smaller than control mechanisms can contain. The Weibull distribution is demonstrated in preference to the lognormal and Pareto I for illness severities versus (a) toxicokinetic models, (b) biologically-based network models, (c) scholastic and psychological test score data for children with prenatal mercury exposure, and (d) time-to-tumor data of the ED01 study. PMID:26061263
Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service.
Bao, Shunxing; Plassard, Andrew J; Landman, Bennett A; Gokhale, Aniruddha
2017-04-01
Traditional in-house, laboratory-based medical imaging studies use hierarchical data structures (e.g., NFS file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance from these approaches is, however, impeded by standard network switches since they can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. To that end, a cloud-based "medical image processing-as-a-service" offers promise in utilizing the ecosystem of Apache Hadoop, which is a flexible framework providing distributed, scalable, fault tolerant storage and parallel computational modules, and HBase, which is a NoSQL database built atop Hadoop's distributed file system. Despite this promise, HBase's load distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). This paper makes two contributions to address these concerns by describing key cloud engineering principles and technology enhancements we made to the Apache Hadoop ecosystem for medical imaging applications. First, we propose a row-key design for HBase, which is a necessary step that is driven by the hierarchical organization of imaging data. Second, we propose a novel data allocation policy within HBase to strongly enforce collocation of hierarchically related imaging data. The proposed enhancements accelerate data processing by minimizing network usage and localizing processing to machines where the data already exist. Moreover, our approach is amenable to the traditional scan, subject, and project-level analysis procedures, and is compatible with standard command line/scriptable image processing software. Experimental results for an illustrative sample of imaging data reveals that our new HBase policy results in a three-fold time improvement in conversion of classic DICOM to NiFTI file formats when compared with the default HBase region split policy, and nearly a six-fold improvement over a commonly available network file system (NFS) approach even for relatively small file sets. Moreover, file access latency is lower than network attached storage.
Method for localizing and isolating an errant process step
Tobin, Jr., Kenneth W.; Karnowski, Thomas P.; Ferrell, Regina K.
2003-01-01
A method for localizing and isolating an errant process includes the steps of retrieving from a defect image database a selection of images each image having image content similar to image content extracted from a query image depicting a defect, each image in the selection having corresponding defect characterization data. A conditional probability distribution of the defect having occurred in a particular process step is derived from the defect characterization data. A process step as a highest probable source of the defect according to the derived conditional probability distribution is then identified. A method for process step defect identification includes the steps of characterizing anomalies in a product, the anomalies detected by an imaging system. A query image of a product defect is then acquired. A particular characterized anomaly is then correlated with the query image. An errant process step is then associated with the correlated image.
Lindley frailty model for a class of compound Poisson processes
NASA Astrophysics Data System (ADS)
Kadilar, Gamze Özel; Ata, Nihal
2013-10-01
The Lindley distribution gain importance in survival analysis for the similarity of exponential distribution and allowance for the different shapes of hazard function. Frailty models provide an alternative to proportional hazards model where misspecified or omitted covariates are described by an unobservable random variable. Despite of the distribution of the frailty is generally assumed to be continuous, it is appropriate to consider discrete frailty distributions In some circumstances. In this paper, frailty models with discrete compound Poisson process for the Lindley distributed failure time are introduced. Survival functions are derived and maximum likelihood estimation procedures for the parameters are studied. Then, the fit of the models to the earthquake data set of Turkey are examined.
Building Big Flares: Constraining Generating Processes of Solar Flare Distributions
NASA Astrophysics Data System (ADS)
Wyse Jackson, T.; Kashyap, V.; McKillop, S.
2015-12-01
We address mechanisms which seek to explain the observed solar flare distribution, dN/dE ~ E1.8. We have compiled a comprehensive database, from GOES, NOAA, XRT, and AIA data, of solar flares and their characteristics, covering the year 2013. These datasets allow us to probe how stored magnetic energy is released over the course of an active region's evolution. We fit power-laws to flare distributions over various attribute groupings. For instance, we compare flares that occur before and after an active region reaches its maximum area, and show that the corresponding flare distributions are indistinguishable; thus, the processes that lead to magnetic reconnection are similar in both cases. A turnover in the distribution is not detectable at the energies accessible to our study, suggesting that a self-organized critical (SOC) process is a valid mechanism. However, we find changes in the distributions that suggest that the simple picture of an SOC where flares draw energy from an inexhaustible reservoir of stored magnetic energy is incomplete. Following the evolution of the flare distribution over the lifetimes of active regions, we find that the distribution flattens with time, and for larger active regions, and that a single power-law model is insufficient. This implies that flares that occur later in the lifetime of the active region tend towards higher energies. We conclude that the SOC process must have an upper bound. Increasing the scope of the study to include data from other years and more instruments will increase the robustness of these results. This work was supported by the NSF-REU Solar Physics Program at SAO, grant number AGS 1263241, NASA Contract NAS8-03060 to the Chandra X-ray Center and by NASA Hinode/XRT contract NNM07AB07C to SAO
Serious games for elderly continuous monitoring.
Lemus-Zúñiga, Lenin-G; Navarro-Pardo, Esperanza; Moret-Tatay, Carmen; Pocinho, Ricardo
2015-01-01
Information technology (IT) and serious games allow older population to remain independent for longer. Hence, when designing technology for this population, developmental changes, such as attention and/or perception, should be considered. For instance, a crucial developmental change has been related to cognitive speed in terms of reaction time (RT). However, this variable presents a skewed distribution that difficult data analysis. An alternative strategy is to characterize the data to an ex-Gaussian function. Furthermore, this procedure provides different parameters that have been related to underlying cognitive processes in the literature. Another issue to be considered is the optimal data recording, storing and processing. For that purpose mobile devices (smart phones and tablets) are a good option for targeting serious games where valuable information can be stored (time spent in the application, reaction time, frequency of use, and a long etcetera). The data stored inside the smartphones and tablets can be sent to a central computer (cloud storage) in order to store the data collected to not only fill the distribution of reaction times to mathematical functions, but also to estimate parameters which may reflect cognitive processes underlying language, aging, and decisional process.
NASA Astrophysics Data System (ADS)
Hung, Nguyen Trong; Thuan, Le Ba; Thanh, Tran Chi; Nhuan, Hoang; Khoai, Do Van; Tung, Nguyen Van; Lee, Jin-Young; Jyothi, Rajesh Kumar
2018-06-01
Modeling uranium dioxide pellet process from ammonium uranyl carbonate - derived uranium dioxide powder (UO2 ex-AUC powder) and predicting fuel rod temperature distribution were reported in the paper. Response surface methodology (RSM) and FRAPCON-4.0 code were used to model the process and to predict the fuel rod temperature under steady-state operating condition. Fuel rod design of AP-1000 designed by Westinghouse Electric Corporation, in these the pellet fabrication parameters are from the study, were input data for the code. The predictive data were suggested the relationship between the fabrication parameters of UO2 pellets and their temperature image in nuclear reactor.
A Multiagent System for Dynamic Data Aggregation in Medical Research
Urovi, Visara; Barba, Imanol; Aberer, Karl; Schumacher, Michael Ignaz
2016-01-01
The collection of medical data for research purposes is a challenging and long-lasting process. In an effort to accelerate and facilitate this process we propose a new framework for dynamic aggregation of medical data from distributed sources. We use agent-based coordination between medical and research institutions. Our system employs principles of peer-to-peer network organization and coordination models to search over already constructed distributed databases and to identify the potential contributors when a new database has to be built. Our framework takes into account both the requirements of a research study and current data availability. This leads to better definition of database characteristics such as schema, content, and privacy parameters. We show that this approach enables a more efficient way to collect data for medical research. PMID:27975063
GSKY: A scalable distributed geospatial data server on the cloud
NASA Astrophysics Data System (ADS)
Rozas Larraondo, Pablo; Pringle, Sean; Antony, Joseph; Evans, Ben
2017-04-01
Earth systems, environmental and geophysical datasets are an extremely valuable sources of information about the state and evolution of the Earth. Being able to combine information coming from different geospatial collections is in increasing demand by the scientific community, and requires managing and manipulating data with different formats and performing operations such as map reprojections, resampling and other transformations. Due to the large data volume inherent in these collections, storing multiple copies of them is unfeasible and so such data manipulation must be performed on-the-fly using efficient, high performance techniques. Ideally this should be performed using a trusted data service and common system libraries to ensure wide use and reproducibility. Recent developments in distributed computing based on dynamic access to significant cloud infrastructure opens the door for such new ways of processing geospatial data on demand. The National Computational Infrastructure (NCI), hosted at the Australian National University (ANU), has over 10 Petabytes of nationally significant research data collections. Some of these collections, which comprise a variety of observed and modelled geospatial data, are now made available via a highly distributed geospatial data server, called GSKY (pronounced [jee-skee]). GSKY supports on demand processing of large geospatial data products such as satellite earth observation data as well as numerical weather products, allowing interactive exploration and analysis of the data. It dynamically and efficiently distributes the required computations among cloud nodes providing a scalable analysis framework that can adapt to serve large number of concurrent users. Typical geospatial workflows handling different file formats and data types, or blending data in different coordinate projections and spatio-temporal resolutions, is handled transparently by GSKY. This is achieved by decoupling the data ingestion and indexing process as an independent service. An indexing service crawls data collections either locally or remotely by extracting, storing and indexing all spatio-temporal metadata associated with each individual record. GSKY provides the user with the ability of specifying how ingested data should be aggregated, transformed and presented. It presents an OGC standards-compliant interface, allowing ready accessibility for users of the data via Web Map Services (WMS), Web Processing Services (WPS) or raw data arrays using Web Coverage Services (WCS). The presentation will show some cases where we have used this new capability to provide a significant improvement over previous approaches.
How Bright is the Proton? A Precise Determination of the Photon Parton Distribution Function
DOE Office of Scientific and Technical Information (OSTI.GOV)
Manohar, Aneesh; Nason, Paolo; Salam, Gavin P.
2016-12-09
It has become apparent in recent years that it is important, notably for a range of physics studies at the Large Hadron Collider, to have accurate knowledge on the distribution of photons in the proton. We show how the photon parton distribution function (PDF) can be determined in a model-independent manner, using electron-proton (ep) scattering data, in effect viewing the ep → e + X process as an electron scattering off the photon field of the proton. To this end, we consider an imaginary, beyond the Standard Model process with a flavor changing photon-lepton vertex. We write its cross sectionmore » in two ways: one in terms of proton structure functions, the other in terms of a photon distribution. Requiring their equivalence yields the photon distribution as an integral over proton structure functions. As a result of the good precision of ep data, we constrain the photon PDF at the level of 1%–2% over a wide range of momentum fractions.« less
How Bright is the Proton? A Precise Determination of the Photon Parton Distribution Function.
Manohar, Aneesh; Nason, Paolo; Salam, Gavin P; Zanderighi, Giulia
2016-12-09
It has become apparent in recent years that it is important, notably for a range of physics studies at the Large Hadron Collider, to have accurate knowledge on the distribution of photons in the proton. We show how the photon parton distribution function (PDF) can be determined in a model-independent manner, using electron-proton (ep) scattering data, in effect viewing the ep→e+X process as an electron scattering off the photon field of the proton. To this end, we consider an imaginary, beyond the Standard Model process with a flavor changing photon-lepton vertex. We write its cross section in two ways: one in terms of proton structure functions, the other in terms of a photon distribution. Requiring their equivalence yields the photon distribution as an integral over proton structure functions. As a result of the good precision of ep data, we constrain the photon PDF at the level of 1%-2% over a wide range of momentum fractions.
NASA Technical Reports Server (NTRS)
Danilova, T. V.; Dubovy, A. G.; Erlykin, A. D.; Nesterova, N. M.; Chubenko, A. P.
1985-01-01
The lateral distributions of extensive air showers (EAS) hadrons obtained at Tien-Shan array are compared with the simulations. The simulation data have been treated in the same way as experimental data, including the recording method. The comparison shows that the experimental hadron lateral distributions are wider than simulated ones. On the base of this result the conclusion is drawn that the fraction of processes with large p (perpendicular) increases in hadron-air interactions at energies 5 x 10 to the 14 to 10 to the 16 eV compared with accelerator data in p-p interactions at lower energies.
Moore, Reagan W.; Rajasekar, Arcot; Wan, Michael Y.
2004-01-13
A system of and method for maintaining data objects in containers across a network of distributed heterogeneous resources in a manner which is transparent to a client. A client request pertaining to containers is resolved by querying meta data for the container, processing the request through one or more copies of the container maintained on the system, updating the meta data for the container to reflect any changes made to the container as a result processing the request, and, if a copy of the container has changed, changing the status of the copy to indicate dirty status or synchronizing the copy to one or more other copies that may be present on the system.
Moore, Reagan W [San Diego, CA; Rajasekar, Arcot [Del Mar, CA; Wan, Michael Y [San Diego, CA
2007-09-11
A system of and method for maintaining data objects in containers across a network of distributed heterogeneous resources in a manner which is transparent to a client. A client request pertaining to containers is resolved by querying meta data for the container, processing the request through one or more copies of the container maintained on the system, updating the meta data for the container to reflect any changes made to the container as a result processing the re quest, and, if a copy of the container has changed, changing the status of the copy to indicate dirty status or synchronizing the copy to one or more other copies that may be present on the system.
Moore, Reagan W.; Rajasekar, Arcot; Wan, Michael Y.
2010-09-21
A system of and method for maintaining data objects in containers across a network of distributed heterogeneous resources in a manner which is transparent to a client. A client request pertaining to containers is resolved by querying meta data for the container, processing the request through one or more copies of the container maintained on the system, updating the meta data for the container to reflect any changes made to the container as a result processing the request, and, if a copy of the container has changed, changing the status of the copy to indicate dirty status or synchronizing the copy to one or more other copies that may be present on the system.
Estimating the Propagation of Interdependent Cascading Outages with Multi-Type Branching Processes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qi, Junjian; Ju, Wenyun; Sun, Kai
In this paper, the multi-type branching process is applied to describe the statistics and interdependencies of line outages, the load shed, and isolated buses. The offspring mean matrix of the multi-type branching process is estimated by the Expectation Maximization (EM) algorithm and can quantify the extent of outage propagation. The joint distribution of two types of outages is estimated by the multi-type branching process via the Lagrange-Good inversion. The proposed model is tested with data generated by the AC OPA cascading simulations on the IEEE 118-bus system. The largest eigenvalues of the offspring mean matrix indicate that the system ismore » closer to criticality when considering the interdependence of different types of outages. Compared with empirically estimating the joint distribution of the total outages, good estimate is obtained by using the multitype branching process with a much smaller number of cascades, thus greatly improving the efficiency. It is shown that the multitype branching process can effectively predict the distribution of the load shed and isolated buses and their conditional largest possible total outages even when there are no data of them.« less
Distributed systems status and control
NASA Technical Reports Server (NTRS)
Kreidler, David; Vickers, David
1990-01-01
Concepts are investigated for an automated status and control system for a distributed processing environment. System characteristics, data requirements for health assessment, data acquisition methods, system diagnosis methods and control methods were investigated in an attempt to determine the high-level requirements for a system which can be used to assess the health of a distributed processing system and implement control procedures to maintain an accepted level of health for the system. A potential concept for automated status and control includes the use of expert system techniques to assess the health of the system, detect and diagnose faults, and initiate or recommend actions to correct the faults. Therefore, this research included the investigation of methods by which expert systems were developed for real-time environments and distributed systems. The focus is on the features required by real-time expert systems and the tools available to develop real-time expert systems.
NASA Astrophysics Data System (ADS)
Beach, A. L., III; Northup, E. A.; Early, A. B.; Chen, G.
2016-12-01
Airborne field studies are an effective way to gain a detailed understanding of atmospheric processes for scientific research on climate change and air quality relevant issues. One major function of airborne project data management is to maintain seamless data access within the science team. This allows individual instrument principal investigators (PIs) to process and validate their own data, which requires analysis of data sets from other PIs (or instruments). The project's web platform streamlines data ingest, distribution processes, and data format validation. In May 2016, the NASA Langley Research Center (LaRC) Atmospheric Science Data Center (ASDC) developed a new data management capability to help support the Korea U.S.-Air Quality (KORUS-AQ) science team. This effort is aimed at providing direct NASA Distributed Active Archive Center (DAAC) support to an airborne field study. Working closely with the science team, the ASDC developed a scalable architecture that allows investigators to easily upload and distribute their data and documentation within a secure collaborative environment. The user interface leverages modern design elements to intuitively guide the PI through each step of the data management process. In addition, the new framework creates an abstraction layer between how the data files are stored and how the data itself is organized(i.e. grouping files by PI). This approach makes it easy for PIs to simply transfer their data to one directory, while the system itself can automatically group/sort data as needed. Moreover, the platform is "server agnostic" to a certain degree, making deployment and customization more straightforward as hardware needs change. This flexible design will improve development efficiency and can be leveraged for future field campaigns. This presentation will examine the KORUS-AQ data portal as a scalable solution that applies consistent and intuitive usability design practices to support ingest and management of airborne data.
Disribution and interplay of geologic processes on Titan from Cassini radar data
Lopes, R.M.C.; Stofan, E.R.; Peckyno, R.; Radebaugh, J.; Mitchell, K.L.; Mitri, Giuseppe; Wood, C.A.; Kirk, R.L.; Wall, S.D.; Lunine, J.I.; Hayes, A.; Lorenz, R.; Farr, Tom; Wye, L.; Craig, J.; Ollerenshaw, R.J.; Janssen, M.; LeGall, A.; Paganelli, F.; West, R.; Stiles, B.; Callahan, P.; Anderson, Y.; Valora, P.; Soderblom, L.
2010-01-01
The Cassini Titan Radar Mapper is providing an unprecedented view of Titan's surface geology. Here we use Synthetic Aperture Radar (SAR) image swaths (Ta-T30) obtained from October 2004 to December 2007 to infer the geologic processes that have shaped Titan's surface. These SAR swaths cover about 20% of the surface, at a spatial resolution ranging from ~350 m to ~2 km. The SAR data are distributed over a wide latitudinal and longitudinal range, enabling some conclusions to be drawn about the global distribution of processes. They reveal a geologically complex surface that has been modified by all the major geologic processes seen on Earth - volcanism, tectonism, impact cratering, and erosion and deposition by fluvial and aeolian activity. In this paper, we map geomorphological units from SAR data and analyze their areal distribution and relative ages of modification in order to infer the geologic evolution of Titan's surface. We find that dunes and hummocky and mountainous terrains are more widespread than lakes, putative cryovolcanic features, mottled plains, and craters and crateriform structures that may be due to impact. Undifferentiated plains are the largest areal unit; their origin is uncertain. In terms of latitudinal distribution, dunes and hummocky and mountainous terrains are located mostly at low latitudes (less than 30 degrees), with no dunes being present above 60 degrees. Channels formed by fluvial activity are present at all latitudes, but lakes are at high latitudes only. Crateriform structures that may have been formed by impact appear to be uniformly distributed with latitude, but the well-preserved impact craters are all located at low latitudes, possibly indicating that more resurfacing has occurred at higher latitudes. Cryovolcanic features are not ubiquitous, and are mostly located between 30 degrees and 60 degrees north. We examine temporal relationships between units wherever possible, and conclude that aeolian and fluvial/pluvial/lacustrine processes are the most recent, while tectonic processes that led to the formation of mountains and Xanadu are likely the most ancient.
Data collection system: Earth Resources Technology Satellite-1
NASA Technical Reports Server (NTRS)
Cooper, S. (Editor); Ryan, P. T. (Editor)
1975-01-01
Subjects covered at the meeting concerned results on the overall data collection system including sensors, interface hardware, power supplies, environmental enclosures, data transmission, processing and distribution, maintenance and integration in resources management systems.
Design distributed simulation platform for vehicle management system
NASA Astrophysics Data System (ADS)
Wen, Zhaodong; Wang, Zhanlin; Qiu, Lihua
2006-11-01
Next generation military aircraft requires the airborne management system high performance. General modules, data integration, high speed data bus and so on are needed to share and manage information of the subsystems efficiently. The subsystems include flight control system, propulsion system, hydraulic power system, environmental control system, fuel management system, electrical power system and so on. The unattached or mixed architecture is changed to integrated architecture. That means the whole airborne system is regarded into one system to manage. So the physical devices are distributed but the system information is integrated and shared. The process function of each subsystem are integrated (including general process modules, dynamic reconfiguration), furthermore, the sensors and the signal processing functions are shared. On the other hand, it is a foundation for power shared. Establish a distributed vehicle management system using 1553B bus and distributed processors which can provide a validation platform for the research of airborne system integrated management. This paper establishes the Vehicle Management System (VMS) simulation platform. Discuss the software and hardware configuration and analyze the communication and fault-tolerant method.
Probability distribution functions for intermittent scrape-off layer plasma fluctuations
NASA Astrophysics Data System (ADS)
Theodorsen, A.; Garcia, O. E.
2018-03-01
A stochastic model for intermittent fluctuations in the scrape-off layer of magnetically confined plasmas has been constructed based on a super-position of uncorrelated pulses arriving according to a Poisson process. In the most common applications of the model, the pulse amplitudes are assumed exponentially distributed, supported by conditional averaging of large-amplitude fluctuations in experimental measurement data. This basic assumption has two potential limitations. First, statistical analysis of measurement data using conditional averaging only reveals the tail of the amplitude distribution to be exponentially distributed. Second, exponentially distributed amplitudes leads to a positive definite signal which cannot capture fluctuations in for example electric potential and radial velocity. Assuming pulse amplitudes which are not positive definite often make finding a closed form for the probability density function (PDF) difficult, even if the characteristic function remains relatively simple. Thus estimating model parameters requires an approach based on the characteristic function, not the PDF. In this contribution, the effect of changing the amplitude distribution on the moments, PDF and characteristic function of the process is investigated and a parameter estimation method using the empirical characteristic function is presented and tested on synthetically generated data. This proves valuable for describing intermittent fluctuations of all plasma parameters in the boundary region of magnetized plasmas.
Predicting durations of online collective actions based on Peaks' heights
NASA Astrophysics Data System (ADS)
Lu, Peng; Nie, Shizhao; Wang, Zheng; Jing, Ziwei; Yang, Jianwu; Qi, Zhongxiang; Pujia, Wangmo
2018-02-01
Capturing the whole process of collective actions, the peak model contains four stages, including Prepare, Outbreak, Peak, and Vanish. Based on the peak model, one of the key variables, factors and parameters are further investigated in this paper, which is the rate between peaks and spans. Although the durations or spans and peaks' heights are highly diversified, it seems that the ratio between them is quite stable. If the rate's regularity is discovered, we can predict how long the collective action lasts and when it ends based on the peak's height. In this work, we combined mathematical simulations and empirical big data of 148 cases to explore the regularity of ratio's distribution. It is indicated by results of simulations that the rate has some regularities of distribution, which is not normal distribution. The big data has been collected from the 148 online collective actions and the whole processes of participation are recorded. The outcomes of empirical big data indicate that the rate seems to be closer to being log-normally distributed. This rule holds true for both the total cases and subgroups of 148 online collective actions. The Q-Q plot is applied to check the normal distribution of the rate's logarithm, and the rate's logarithm does follow the normal distribution.
NASA Astrophysics Data System (ADS)
Ott, Stephan; Herschel Science Ground Segment Consortium
2010-05-01
The Herschel Space Observatory, the fourth cornerstone mission in the ESA science program, was launched 14th of May 2009. With a 3.5 m telescope, it is the largest space telescope ever launched. Herschel's three instruments (HIFI, PACS, and SPIRE) perform photometry and spectroscopy in the 55 - 672 micron range and will deliver exciting science for the astronomical community during at least three years of routine observations. Since 2nd of December 2009 Herschel has been performing and processing observations in routine science mode. The development of the Herschel Data Processing System started eight years ago to support the data analysis for Instrument Level Tests. To fulfil the expectations of the astronomical community, additional resources were made available to implement a freely distributable Data Processing System capable of interactively and automatically reducing Herschel data at different processing levels. The system combines data retrieval, pipeline execution and scientific analysis in one single environment. The Herschel Interactive Processing Environment (HIPE) is the user-friendly face of Herschel Data Processing. The software is coded in Java and Jython to be platform independent and to avoid the need for commercial licenses. It is distributed under the GNU Lesser General Public License (LGPL), permitting everyone to access and to re-use its code. We will summarise the current capabilities of the Herschel Data Processing System and give an overview about future development milestones and plans, and how the astronomical community can contribute to HIPE. The Herschel Data Processing System is a joint development by the Herschel Science Ground Segment Consortium, consisting of ESA, the NASA Herschel Science Center, and the HIFI, PACS and SPIRE consortium members.
First-Passage-Time Distribution for Variable-Diffusion Processes
NASA Astrophysics Data System (ADS)
Barney, Liberty; Gunaratne, Gemunu H.
2017-05-01
First-passage-time distribution, which presents the likelihood of a stock reaching a pre-specified price at a given time, is useful in establishing the value of financial instruments and in designing trading strategies. First-passage-time distribution for Wiener processes has a single peak, while that for stocks exhibits a notable second peak within a trading day. This feature has only been discussed sporadically—often dismissed as due to insufficient/incorrect data or circumvented by conversion to tick time—and to the best of our knowledge has not been explained in terms of the underlying stochastic process. It was shown previously that intra-day variations in the market can be modeled by a stochastic process containing two variable-diffusion processes (Hua et al. in, Physica A 419:221-233, 2015). We show here that the first-passage-time distribution of this two-stage variable-diffusion model does exhibit a behavior similar to the empirical observation. In addition, we find that an extended model incorporating overnight price fluctuations exhibits intra- and inter-day behavior similar to those of empirical first-passage-time distributions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jinsong
2013-05-01
Development of a hierarchical Bayesian model to estimate the spatiotemporal distribution of aqueous geochemical parameters associated with in-situ bioremediation using surface spectral induced polarization (SIP) data and borehole geochemical measurements collected during a bioremediation experiment at a uranium-contaminated site near Rifle, Colorado. The SIP data are first inverted for Cole-Cole parameters including chargeability, time constant, resistivity at the DC frequency and dependence factor, at each pixel of two-dimensional grids using a previously developed stochastic method. Correlations between the inverted Cole-Cole parameters and the wellbore-based groundwater chemistry measurements indicative of key metabolic processes within the aquifer (e.g. ferrous iron, sulfate, uranium)more » were established and used as a basis for petrophysical model development. The developed Bayesian model consists of three levels of statistical sub-models: 1) data model, providing links between geochemical and geophysical attributes, 2) process model, describing the spatial and temporal variability of geochemical properties in the subsurface system, and 3) parameter model, describing prior distributions of various parameters and initial conditions. The unknown parameters are estimated using Markov chain Monte Carlo methods. By combining the temporally distributed geochemical data with the spatially distributed geophysical data, we obtain the spatio-temporal distribution of ferrous iron, sulfate and sulfide, and their associated uncertainity information. The obtained results can be used to assess the efficacy of the bioremediation treatment over space and time and to constrain reactive transport models.« less
Barista: A Framework for Concurrent Speech Processing by USC-SAIL
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G.; Narayanan, Shrikanth S.
2016-01-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0. PMID:27610047
Barista: A Framework for Concurrent Speech Processing by USC-SAIL.
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G; Narayanan, Shrikanth S
2014-05-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0.
Semiparametric Bayesian classification with longitudinal markers
De la Cruz-Mesía, Rolando; Quintana, Fernando A.; Müller, Peter
2013-01-01
Summary We analyse data from a study involving 173 pregnant women. The data are observed values of the β human chorionic gonadotropin hormone measured during the first 80 days of gestational age, including from one up to six longitudinal responses for each woman. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from data that are available at the early stages of pregnancy. We achieve the desired classification with a semiparametric hierarchical model. Specifically, we consider a Dirichlet process mixture prior for the distribution of the random effects in each group. The unknown random-effects distributions are allowed to vary across groups but are made dependent by using a design vector to select different features of a single underlying random probability measure. The resulting model is an extension of the dependent Dirichlet process model, with an additional probability model for group classification. The model is shown to perform better than an alternative model which is based on independent Dirichlet processes for the groups. Relevant posterior distributions are summarized by using Markov chain Monte Carlo methods. PMID:24368871
NASA Technical Reports Server (NTRS)
Berrios, Daniel C.; Thompson, Terri G.
2015-01-01
NASA GeneLab is expected to capture and distribute omics data and experimental and process conditions most relevant to research community in their statistical and theoretical analysis of NASAs omics data.
Online catalog access and distribution of remotely sensed information
NASA Astrophysics Data System (ADS)
Lutton, Stephen M.
1997-09-01
Remote sensing is providing voluminous data and value added information products. Electronic sensors, communication electronics, computer software, hardware, and network communications technology have matured to the point where a distributed infrastructure for remotely sensed information is a reality. The amount of remotely sensed data and information is making distributed infrastructure almost a necessity. This infrastructure provides data collection, archiving, cataloging, browsing, processing, and viewing for applications from scientific research to economic, legal, and national security decision making. The remote sensing field is entering a new exciting stage of commercial growth and expansion into the mainstream of government and business decision making. This paper overviews this new distributed infrastructure and then focuses on describing a software system for on-line catalog access and distribution of remotely sensed information.
NASA Technical Reports Server (NTRS)
Beach, Aubrey; Northup, Emily; Early, Amanda; Wang, Dali; Kusterer, John; Quam, Brandi; Chen, Gao
2015-01-01
The current data management practices for NASA airborne field projects have successfully served science team data needs over the past 30 years to achieve project science objectives, however, users have discovered a number of issues in terms of data reporting and format. The ICARTT format, a NASA standard since 2010, is currently the most popular among the airborne measurement community. Although easy for humans to use, the format standard is not sufficiently rigorous to be machine-readable. This makes data use and management tedious and resource intensive, and also create problems in Distributed Active Archive Center (DAAC) data ingest procedures and distribution. Further, most DAACs use metadata models that concentrate on satellite data observations, making them less prepared to deal with airborne data.
Probabilistic reasoning in data analysis.
Sirovich, Lawrence
2011-09-20
This Teaching Resource provides lecture notes, slides, and a student assignment for a lecture on probabilistic reasoning in the analysis of biological data. General probabilistic frameworks are introduced, and a number of standard probability distributions are described using simple intuitive ideas. Particular attention is focused on random arrivals that are independent of prior history (Markovian events), with an emphasis on waiting times, Poisson processes, and Poisson probability distributions. The use of these various probability distributions is applied to biomedical problems, including several classic experimental studies.
NASA Technical Reports Server (NTRS)
Spinhirne, J. D.; Welton, E. J.; Campbell, J. R.; Berkoff, T. A.; Starr, David OC. (Technical Monitor)
2002-01-01
The NASA MPL-net project goal is consistent data products of the vertical distribution of clouds and aerosol from globally distributed lidar observation sites. The four ARM micro pulse lidars are a basis of the network to consist of over twelve sites. The science objective is ground truth for global satellite retrievals and accurate vertical distribution information in combination with surface radiation measurements for aerosol and cloud models. The project involves improvement in instruments and data processing and cooperation with ARM and other partners.
An online detection system for aggregate sizes and shapes based on digital image processing
NASA Astrophysics Data System (ADS)
Yang, Jianhong; Chen, Sijia
2017-02-01
Traditional aggregate size measuring methods are time-consuming, taxing, and do not deliver online measurements. A new online detection system for determining aggregate size and shape based on a digital camera with a charge-coupled device, and subsequent digital image processing, have been developed to overcome these problems. The system captures images of aggregates while falling and flat lying. Using these data, the particle size and shape distribution can be obtained in real time. Here, we calibrate this method using standard globules. Our experiments show that the maximum particle size distribution error was only 3 wt%, while the maximum particle shape distribution error was only 2 wt% for data derived from falling aggregates, having good dispersion. In contrast, the data for flat-lying aggregates had a maximum particle size distribution error of 12 wt%, and a maximum particle shape distribution error of 10 wt%; their accuracy was clearly lower than for falling aggregates. However, they performed well for single-graded aggregates, and did not require a dispersion device. Our system is low-cost and easy to install. It can successfully achieve online detection of aggregate size and shape with good reliability, and it has great potential for aggregate quality assurance.
Foo, Brian; van der Schaar, Mihaela
2010-11-01
In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.
NASA Technical Reports Server (NTRS)
2002-01-01
TRMM has acquired more than four years of data since its launch in November 1997. All TRMM standard products are processed by the TRMM Science Data and Information System (TSDIS) and archived and distributed to general users by the GES DAAC. Table 1 shows the total archive and distribution as of February 28, 2002. The Utilization Ratio (UR), defined as the ratio of the number of distributed files to the number of archived files, of the TRMM standard products has been steadily increasing since 1998 and is currently at 6.98.
Jian Yang; Peter J. Weisberg; Thomas E. Dilts; E. Louise Loudermilk; Robert M. Scheller; Alison Stanton; Carl Skinner
2015-01-01
Strategic fire and fuel management planning benefits from detailed understanding of how wildfire occurrences are distributed spatially under current climate, and from predictive models of future wildfire occurrence given climate change scenarios. In this study, we fitted historical wildfire occurrence data from 1986 to 2009 to a suite of spatial point process (SPP)...
MaxEnt alternatives to pearson family distributions
NASA Astrophysics Data System (ADS)
Stokes, Barrie J.
2012-05-01
In a previous MaxEnt conference [11] a method of obtaining MaxEnt univariate distributions under a variety of constraints was presented. The Mathematica function Interpolation[], normally used with numerical data, can also process "semi-symbolic" data, and Lagrange Multiplier equations were solved for a set of symbolic ordinates describing the required MaxEnt probability density function. We apply a more developed version of this approach to finding MaxEnt distributions having prescribed β1 and β2 values, and compare the entropy of the MaxEnt distribution to that of the Pearson family distribution having the same β1 and β2. These MaxEnt distributions do have, in general, greater entropy than the related Pearson distribution. In accordance with Jaynes' Maximum Entropy Principle, these MaxEnt distributions are thus to be preferred to the corresponding Pearson distributions as priors in Bayes' Theorem.
Design and development of a medical big data processing system based on Hadoop.
Yao, Qin; Tian, Yu; Li, Peng-Fei; Tian, Li-Li; Qian, Yang-Ming; Li, Jing-Song
2015-03-01
Secondary use of medical big data is increasingly popular in healthcare services and clinical research. Understanding the logic behind medical big data demonstrates tendencies in hospital information technology and shows great significance for hospital information systems that are designing and expanding services. Big data has four characteristics--Volume, Variety, Velocity and Value (the 4 Vs)--that make traditional systems incapable of processing these data using standalones. Apache Hadoop MapReduce is a promising software framework for developing applications that process vast amounts of data in parallel with large clusters of commodity hardware in a reliable, fault-tolerant manner. With the Hadoop framework and MapReduce application program interface (API), we can more easily develop our own MapReduce applications to run on a Hadoop framework that can scale up from a single node to thousands of machines. This paper investigates a practical case of a Hadoop-based medical big data processing system. We developed this system to intelligently process medical big data and uncover some features of hospital information system user behaviors. This paper studies user behaviors regarding various data produced by different hospital information systems for daily work. In this paper, we also built a five-node Hadoop cluster to execute distributed MapReduce algorithms. Our distributed algorithms show promise in facilitating efficient data processing with medical big data in healthcare services and clinical research compared with single nodes. Additionally, with medical big data analytics, we can design our hospital information systems to be much more intelligent and easier to use by making personalized recommendations.
Distributed Computing Framework for Synthetic Radar Application
NASA Technical Reports Server (NTRS)
Gurrola, Eric M.; Rosen, Paul A.; Aivazis, Michael
2006-01-01
We are developing an extensible software framework, in response to Air Force and NASA needs for distributed computing facilities for a variety of radar applications. The objective of this work is to develop a Python based software framework, that is the framework elements of the middleware that allows developers to control processing flow on a grid in a distributed computing environment. Framework architectures to date allow developers to connect processing functions together as interchangeable objects, thereby allowing a data flow graph to be devised for a specific problem to be solved. The Pyre framework, developed at the California Institute of Technology (Caltech), and now being used as the basis for next-generation radar processing at JPL, is a Python-based software framework. We have extended the Pyre framework to include new facilities to deploy processing components as services, including components that monitor and assess the state of the distributed network for eventual real-time control of grid resources.
Comparative analysis through probability distributions of a data set
NASA Astrophysics Data System (ADS)
Cristea, Gabriel; Constantinescu, Dan Mihai
2018-02-01
In practice, probability distributions are applied in such diverse fields as risk analysis, reliability engineering, chemical engineering, hydrology, image processing, physics, market research, business and economic research, customer support, medicine, sociology, demography etc. This article highlights important aspects of fitting probability distributions to data and applying the analysis results to make informed decisions. There are a number of statistical methods available which can help us to select the best fitting model. Some of the graphs display both input data and fitted distributions at the same time, as probability density and cumulative distribution. The goodness of fit tests can be used to determine whether a certain distribution is a good fit. The main used idea is to measure the "distance" between the data and the tested distribution, and compare that distance to some threshold values. Calculating the goodness of fit statistics also enables us to order the fitted distributions accordingly to how good they fit to data. This particular feature is very helpful for comparing the fitted models. The paper presents a comparison of most commonly used goodness of fit tests as: Kolmogorov-Smirnov, Anderson-Darling, and Chi-Squared. A large set of data is analyzed and conclusions are drawn by visualizing the data, comparing multiple fitted distributions and selecting the best model. These graphs should be viewed as an addition to the goodness of fit tests.
Gabard-Durnam, Laurel J; Mendez Leal, Adriana S; Wilkinson, Carol L; Levin, April R
2018-01-01
Electroenchephalography (EEG) recordings collected with developmental populations present particular challenges from a data processing perspective. These EEGs have a high degree of artifact contamination and often short recording lengths. As both sample sizes and EEG channel densities increase, traditional processing approaches like manual data rejection are becoming unsustainable. Moreover, such subjective approaches preclude standardized metrics of data quality, despite the heightened importance of such measures for EEGs with high rates of initial artifact contamination. There is presently a paucity of automated resources for processing these EEG data and no consistent reporting of data quality measures. To address these challenges, we propose the Harvard Automated Processing Pipeline for EEG (HAPPE) as a standardized, automated pipeline compatible with EEG recordings of variable lengths and artifact contamination levels, including high-artifact and short EEG recordings from young children or those with neurodevelopmental disorders. HAPPE processes event-related and resting-state EEG data from raw files through a series of filtering, artifact rejection, and re-referencing steps to processed EEG suitable for time-frequency-domain analyses. HAPPE also includes a post-processing report of data quality metrics to facilitate the evaluation and reporting of data quality in a standardized manner. Here, we describe each processing step in HAPPE, perform an example analysis with EEG files we have made freely available, and show that HAPPE outperforms seven alternative, widely-used processing approaches. HAPPE removes more artifact than all alternative approaches while simultaneously preserving greater or equivalent amounts of EEG signal in almost all instances. We also provide distributions of HAPPE's data quality metrics in an 867 file dataset as a reference distribution and in support of HAPPE's performance across EEG data with variable artifact contamination and recording lengths. HAPPE software is freely available under the terms of the GNU General Public License at https://github.com/lcnhappe/happe.
Gabard-Durnam, Laurel J.; Mendez Leal, Adriana S.; Wilkinson, Carol L.; Levin, April R.
2018-01-01
Electroenchephalography (EEG) recordings collected with developmental populations present particular challenges from a data processing perspective. These EEGs have a high degree of artifact contamination and often short recording lengths. As both sample sizes and EEG channel densities increase, traditional processing approaches like manual data rejection are becoming unsustainable. Moreover, such subjective approaches preclude standardized metrics of data quality, despite the heightened importance of such measures for EEGs with high rates of initial artifact contamination. There is presently a paucity of automated resources for processing these EEG data and no consistent reporting of data quality measures. To address these challenges, we propose the Harvard Automated Processing Pipeline for EEG (HAPPE) as a standardized, automated pipeline compatible with EEG recordings of variable lengths and artifact contamination levels, including high-artifact and short EEG recordings from young children or those with neurodevelopmental disorders. HAPPE processes event-related and resting-state EEG data from raw files through a series of filtering, artifact rejection, and re-referencing steps to processed EEG suitable for time-frequency-domain analyses. HAPPE also includes a post-processing report of data quality metrics to facilitate the evaluation and reporting of data quality in a standardized manner. Here, we describe each processing step in HAPPE, perform an example analysis with EEG files we have made freely available, and show that HAPPE outperforms seven alternative, widely-used processing approaches. HAPPE removes more artifact than all alternative approaches while simultaneously preserving greater or equivalent amounts of EEG signal in almost all instances. We also provide distributions of HAPPE's data quality metrics in an 867 file dataset as a reference distribution and in support of HAPPE's performance across EEG data with variable artifact contamination and recording lengths. HAPPE software is freely available under the terms of the GNU General Public License at https://github.com/lcnhappe/happe. PMID:29535597
Random-Walk Type Model with Fat Tails for Financial Markets
NASA Astrophysics Data System (ADS)
Matuttis, Hans-Geors
Starting from the random-walk model, practices of financial markets are included into the random-walk so that fat tail distributions like those in the high frequency data of the SP500 index are reproduced, though the individual mechanisms are modeled by normally distributed data. The incorporation of local correlation narrows the distribution for "frequent" events, whereas global correlations due to technical analysis leads to fat tails. Delay of market transactions in the trading process shifts the fat tail probabilities downwards. Such an inclusion of reactions to market fluctuations leads to mini-trends which are distributed with unit variance.
NASA Technical Reports Server (NTRS)
Graves, Sara J.
1994-01-01
Work on this project was focused on information management techniques for Marshall Space Flight Center's EOSDIS Version 0 Distributed Active Archive Center (DAAC). The centerpiece of this effort has been participation in EOSDIS catalog interoperability research, the result of which is a distributed Information Management System (IMS) allowing the user to query the inventories of all the DAAC's from a single user interface. UAH has provided the MSFC DAAC database server for the distributed IMS, and has contributed to definition and development of the browse image display capabilities in the system's user interface. Another important area of research has been in generating value-based metadata through data mining. In addition, information management applications for local inventory and archive management, and for tracking data orders were provided.
Final Report for DOE Award ER25756
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kesselman, Carl
2014-11-17
The SciDAC-funded Center for Enabling Distributed Petascale Science (CEDPS) was established to address technical challenges that arise due to the frequent geographic distribution of data producers (in particular, supercomputers and scientific instruments) and data consumers (people and computers) within the DOE laboratory system. Its goal is to produce technical innovations that meet DOE end-user needs for (a) rapid and dependable placement of large quantities of data within a distributed high-performance environment, and (b) the convenient construction of scalable science services that provide for the reliable and high-performance processing of computation and data analysis requests from many remote clients. The Centermore » is also addressing (c) the important problem of troubleshooting these and other related ultra-high-performance distributed activities from the perspective of both performance and functionality« less
NASA Astrophysics Data System (ADS)
Ott, S.
2011-07-01
(On behalf of all contributors to the Herschel mission) The Herschel Space Observatory, the fourth cornerstone mission in the ESA science program, was launched 14th of May 2009. With a 3.5 m telescope, it is the largest space telescope ever launched. Herschel's three instruments (HIFI, PACS, and SPIRE) perform photometry and spectroscopy in the 55-671 micron range and will deliver exciting science for the astronomical community during at least three years of routine observations. Starting October 2009 Herschel has been performing and processing observations in routine science mode. The development of the Herschel Data Processing System (HIPE) started nine years ago to support the data analysis for Instrument Level Tests. To fulfil the expectations of the astronomical community, additional resources were made available to implement a freely distributable Data Processing System capable of interactively and automatically reducing Herschel data at different processing levels. The system combines data retrieval, pipeline execution, data quality checking and scientific analysis in one single environment. HIPE is the user-friendly face of Herschel interactive Data Processing. The software is coded in Java and Jython to be platform independent and to avoid the need for commercial licenses. It is distributed under the GNU Lesser General Public License (LGPL), permitting everyone to access and to re-use its code. We will summarise the current capabilities of the Herschel Data Processing system, highlight how the Herschel Data Processing system supported the Herschel observatory to meet the challenges of this large project, give an overview about future development milestones and plans, and how the astronomical community can contribute to HIPE.
Development of climate data storage and processing model
NASA Astrophysics Data System (ADS)
Okladnikov, I. G.; Gordov, E. P.; Titov, A. G.
2016-11-01
We present a storage and processing model for climate datasets elaborated in the framework of a virtual research environment (VRE) for climate and environmental monitoring and analysis of the impact of climate change on the socio-economic processes on local and regional scales. The model is based on a «shared nothings» distributed computing architecture and assumes using a computing network where each computing node is independent and selfsufficient. Each node holds a dedicated software for the processing and visualization of geospatial data providing programming interfaces to communicate with the other nodes. The nodes are interconnected by a local network or the Internet and exchange data and control instructions via SSH connections and web services. Geospatial data is represented by collections of netCDF files stored in a hierarchy of directories in the framework of a file system. To speed up data reading and processing, three approaches are proposed: a precalculation of intermediate products, a distribution of data across multiple storage systems (with or without redundancy), and caching and reuse of the previously obtained products. For a fast search and retrieval of the required data, according to the data storage and processing model, a metadata database is developed. It contains descriptions of the space-time features of the datasets available for processing, their locations, as well as descriptions and run options of the software components for data analysis and visualization. The model and the metadata database together will provide a reliable technological basis for development of a high- performance virtual research environment for climatic and environmental monitoring.
75 FR 56522 - Notice of Proposed Information Collection Requests
Federal Register 2010, 2011, 2012, 2013, 2014
2010-09-16
... process improvement, the Department routinely conducts a review of the application data elements to... amended (HEA), mandates that the Secretary of Education ``* * * shall produce, distribute, and process... 56523
ERIC Educational Resources Information Center
Popyk, Marilyn K.
1986-01-01
Discusses the new automated office and its six major technologies (data processing, word processing, graphics, image, voice, and networking), the information processing cycle (input, processing, output, distribution/communication, and storage and retrieval), ergonomics, and ways to expand office education classes (versus class instruction). (CT)
The ‘hit’ phenomenon: a mathematical model of human dynamics interactions as a stochastic process
NASA Astrophysics Data System (ADS)
Ishii, Akira; Arakaki, Hisashi; Matsuda, Naoya; Umemura, Sanae; Urushidani, Tamiko; Yamagata, Naoya; Yoshida, Narihiko
2012-06-01
A mathematical model for the ‘hit’ phenomenon in entertainment within a society is presented as a stochastic process of human dynamics interactions. The model uses only the advertisement budget time distribution as an input, and word-of-mouth (WOM), represented by posts on social network systems, is used as data to make a comparison with the calculated results. The unit of time is days. The WOM distribution in time is found to be very close to the revenue distribution in time. Calculations for the Japanese motion picture market based on the mathematical model agree well with the actual revenue distribution in time.
Dinov, Ivo D
2016-01-01
Managing, processing and understanding big healthcare data is challenging, costly and demanding. Without a robust fundamental theory for representation, analysis and inference, a roadmap for uniform handling and analyzing of such complex data remains elusive. In this article, we outline various big data challenges, opportunities, modeling methods and software techniques for blending complex healthcare data, advanced analytic tools, and distributed scientific computing. Using imaging, genetic and healthcare data we provide examples of processing heterogeneous datasets using distributed cloud services, automated and semi-automated classification techniques, and open-science protocols. Despite substantial advances, new innovative technologies need to be developed that enhance, scale and optimize the management and processing of large, complex and heterogeneous data. Stakeholder investments in data acquisition, research and development, computational infrastructure and education will be critical to realize the huge potential of big data, to reap the expected information benefits and to build lasting knowledge assets. Multi-faceted proprietary, open-source, and community developments will be essential to enable broad, reliable, sustainable and efficient data-driven discovery and analytics. Big data will affect every sector of the economy and their hallmark will be 'team science'.
Lee, Heung-Rae
1997-01-01
A three-dimensional image reconstruction method comprises treating the object of interest as a group of elements with a size that is determined by the resolution of the projection data, e.g., as determined by the size of each pixel. One of the projections is used as a reference projection. A fictitious object is arbitrarily defined that is constrained by such reference projection. The method modifies the known structure of the fictitious object by comparing and optimizing its four projections to those of the unknown structure of the real object and continues to iterate until the optimization is limited by the residual sum of background noise. The method is composed of several sub-processes that acquire four projections from the real data and the fictitious object: generate an arbitrary distribution to define the fictitious object, optimize the four projections, generate a new distribution for the fictitious object, and enhance the reconstructed image. The sub-process for the acquisition of the four projections from the input real data is simply the function of acquiring the four projections from the data of the transmitted intensity. The transmitted intensity represents the density distribution, that is, the distribution of absorption coefficients through the object.
Design of a Data Distribution Core Model for Seafloor Observatories in East China Sea
NASA Astrophysics Data System (ADS)
Chen, H.; Qin, R.; Xu, H.
2017-12-01
High loadings of nutrients and pollutants from agriculture, industries and city waste waters are carried by Changjiang (Yangtze) River and transformed into the foodweb in the river freshwater plume. Understanding these transport and transformation processes is essential for the ecosystem protection, fisheries resources management, seafood safety and human health. As Xiaoqushan Seafloor Observatory and Zhujiajian Seafloor Observatory built in East China Sea, it is an opportunity and a new way for the research of Changjiang River plume. Data collected by seafloor observatory should be accessed conveniently by end users in real time or near real time, which can make it play a better role. Therefore, data distribution is one of major issues for seafloor observatory characterized by long term, real time, high resolution and continuous observation. This study describes a Data Distribution core Model for Seafloor Observatories in East China Sea (ESDDM) containing Data Acquisition Module (DAM), Data Interpretation Module (DIM), Data Transmission Module (DTM) and Data Storage Module (DTM), which enables acquiring, interpreting, transmitting and storing various types of data in real time. A Data Distribution Model Makeup Language (DDML) based on XML is designed to enhance the expansibility and flexibility of the system implemented by ESDDM. Network sniffer is used to acquire data by IP address and port number in DAM promising to release the operating pressure of junction boxes. Data interface, core data processing plugins and common libraries consist of DIM helping it interpret data in a hot swapping way. DTM is an external module in ESDDM transmitting designated raw data packets to Secondary Receiver Terminal. The technology of database connection pool used in DSM facilitates the efficiency of large volumes of continuous data storage. Given a successful scenario in Zhujiajian Seafloor Observatory, the protosystem based on ESDDM running up to 1500h provides a reference for other seafloor observatories in data distribution service.
Software system for data management and distributed processing of multichannel biomedical signals.
Franaszczuk, P J; Jouny, C C
2004-01-01
The presented software is designed for efficient utilization of cluster of PC computers for signal analysis of multichannel physiological data. The system consists of three main components: 1) a library of input and output procedures, 2) a database storing additional information about location in a storage system, 3) a user interface for selecting data for analysis, choosing programs for analysis, and distributing computing and output data on cluster nodes. The system allows for processing multichannel time series data in multiple binary formats. The description of data format, channels and time of recording are included in separate text files. Definition and selection of multiple channel montages is possible. Epochs for analysis can be selected both manually and automatically. Implementation of a new signal processing procedures is possible with a minimal programming overhead for the input/output processing and user interface. The number of nodes in cluster used for computations and amount of storage can be changed with no major modification to software. Current implementations include the time-frequency analysis of multiday, multichannel recordings of intracranial EEG of epileptic patients as well as evoked response analyses of repeated cognitive tasks.
Zipkin, Elise F.; Leirness, Jeffery B.; Kinlan, Brian P.; O'Connell, Allan F.; Silverman, Emily D.
2014-01-01
Determining appropriate statistical distributions for modeling animal count data is important for accurate estimation of abundance, distribution, and trends. In the case of sea ducks along the U.S. Atlantic coast, managers want to estimate local and regional abundance to detect and track population declines, to define areas of high and low use, and to predict the impact of future habitat change on populations. In this paper, we used a modified marked point process to model survey data that recorded flock sizes of Common eiders, Long-tailed ducks, and Black, Surf, and White-winged scoters. The data come from an experimental aerial survey, conducted by the United States Fish & Wildlife Service (USFWS) Division of Migratory Bird Management, during which east-west transects were flown along the Atlantic Coast from Maine to Florida during the winters of 2009–2011. To model the number of flocks per transect (the points), we compared the fit of four statistical distributions (zero-inflated Poisson, zero-inflated geometric, zero-inflated negative binomial and negative binomial) to data on the number of species-specific sea duck flocks that were recorded for each transect flown. To model the flock sizes (the marks), we compared the fit of flock size data for each species to seven statistical distributions: positive Poisson, positive negative binomial, positive geometric, logarithmic, discretized lognormal, zeta and Yule–Simon. Akaike’s Information Criterion and Vuong’s closeness tests indicated that the negative binomial and discretized lognormal were the best distributions for all species for the points and marks, respectively. These findings have important implications for estimating sea duck abundances as the discretized lognormal is a more skewed distribution than the Poisson and negative binomial, which are frequently used to model avian counts; the lognormal is also less heavy-tailed than the power law distributions (e.g., zeta and Yule–Simon), which are becoming increasingly popular for group size modeling. Choosing appropriate statistical distributions for modeling flock size data is fundamental to accurately estimating population summaries, determining required survey effort, and assessing and propagating uncertainty through decision-making processes.
Code of Federal Regulations, 2011 CFR
2011-01-01
... CONCERNING USE OF THE NOAA SPACE-BASED DATA COLLECTION SYSTEMS § 911.3 Definitions. For purposes of this part... data from fixed and moving platforms and provides platform location data. This system consists of... Data Processing and Distribution for the National Environmental Satellite, Data, and Information...
Mission operations update for the restructured Earth Observing System (EOS) mission
NASA Technical Reports Server (NTRS)
Kelly, Angelita Castro; Chang, Edward S.
1993-01-01
The National Aeronautics and Space Administration's (NASA) Earth Observing System (EOS) will provide a comprehensive long term set of observations of the Earth to the Earth science research community. The data will aid in determining global changes caused both naturally and through human interaction. Understanding man's impact on the global environment will allow sound policy decisions to be made to protect our future. EOS is a major component of the Mission to Planet Earth program, which is NASA's contribution to the U.S. Global Change Research Program. EOS consists of numerous instruments on multiple spacecraft and a distributed ground system. The EOS Data and Information System (EOSDIS) is the major ground system developed to support EOS. The EOSDIS will provide EOS spacecraft command and control, data processing, product generation, and data archival and distribution services for EOS spacecraft. Data from EOS instruments on other Earth science missions (e.g., Tropical Rainfall Measuring Mission (TRMM)) will also be processed, distributed, and archived in EOSDIS. The U.S. and various International Partners (IP) (e.g., the European Space Agency (ESA), the Ministry of International Trade and Industry (MITI) of Japan, and the Canadian Space Agency (CSA)) participate in and contribute to the international EOS program. The EOSDIS will also archive processed data from other designated NASA Earth science missions (e.g., UARS) that are under the broad umbrella of Mission to Planet Earth.
Fully automated processing of fMRI data in SPM: from MRI scanner to PACS.
Maldjian, Joseph A; Baer, Aaron H; Kraft, Robert A; Laurienti, Paul J; Burdette, Jonathan H
2009-01-01
Here we describe the Wake Forest University Pipeline, a fully automated method for the processing of fMRI data using SPM. The method includes fully automated data transfer and archiving from the point of acquisition, real-time batch script generation, distributed grid processing, interface to SPM in MATLAB, error recovery and data provenance, DICOM conversion and PACS insertion. It has been used for automated processing of fMRI experiments, as well as for the clinical implementation of fMRI and spin-tag perfusion imaging. The pipeline requires no manual intervention, and can be extended to any studies requiring offline processing.
Distributed service-based approach for sensor data fusion in IoT environments.
Rodríguez-Valenzuela, Sandra; Holgado-Terriza, Juan A; Gutiérrez-Guerrero, José M; Muros-Cobos, Jesús L
2014-10-15
The Internet of Things (IoT) enables the communication among smart objects promoting the pervasive presence around us of a variety of things or objects that are able to interact and cooperate jointly to reach common goals. IoT objects can obtain data from their context, such as the home, office, industry or body. These data can be combined to obtain new and more complex information applying data fusion processes. However, to apply data fusion algorithms in IoT environments, the full system must deal with distributed nodes, decentralized communication and support scalability and nodes dynamicity, among others restrictions. In this paper, a novel method to manage data acquisition and fusion based on a distributed service composition model is presented, improving the data treatment in IoT pervasive environments.
Distributed Service-Based Approach for Sensor Data Fusion in IoT Environments
Rodríguez-Valenzuela, Sandra; Holgado-Terriza, Juan A.; Gutiérrez-Guerrero, José M.; Muros-Cobos, Jesús L.
2014-01-01
The Internet of Things (IoT) enables the communication among smart objects promoting the pervasive presence around us of a variety of things or objects that are able to interact and cooperate jointly to reach common goals. IoT objects can obtain data from their context, such as the home, office, industry or body. These data can be combined to obtain new and more complex information applying data fusion processes. However, to apply data fusion algorithms in IoT environments, the full system must deal with distributed nodes, decentralized communication and support scalability and nodes dynamicity, among others restrictions. In this paper, a novel method to manage data acquisition and fusion based on a distributed service composition model is presented, improving the data treatment in IoT pervasive environments. PMID:25320907
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Qishi; Zhu, Mengxia; Rao, Nageswara S
We propose an intelligent decision support system based on sensor and computer networks that incorporates various component techniques for sensor deployment, data routing, distributed computing, and information fusion. The integrated system is deployed in a distributed environment composed of both wireless sensor networks for data collection and wired computer networks for data processing in support of homeland security defense. We present the system framework and formulate the analytical problems and develop approximate or exact solutions for the subtasks: (i) sensor deployment strategy based on a two-dimensional genetic algorithm to achieve maximum coverage with cost constraints; (ii) data routing scheme tomore » achieve maximum signal strength with minimum path loss, high energy efficiency, and effective fault tolerance; (iii) network mapping method to assign computing modules to network nodes for high-performance distributed data processing; and (iv) binary decision fusion rule that derive threshold bounds to improve system hit rate and false alarm rate. These component solutions are implemented and evaluated through either experiments or simulations in various application scenarios. The extensive results demonstrate that these component solutions imbue the integrated system with the desirable and useful quality of intelligence in decision making.« less
Ipsen, Andreas
2015-02-03
Despite the widespread use of mass spectrometry (MS) in a broad range of disciplines, the nature of MS data remains very poorly understood, and this places important constraints on the quality of MS data analysis as well as on the effectiveness of MS instrument design. In the following, a procedure for calculating the statistical distribution of the mass peak intensity for MS instruments that use analog-to-digital converters (ADCs) and electron multipliers is presented. It is demonstrated that the physical processes underlying the data-generation process, from the generation of the ions to the signal induced at the detector, and on to the digitization of the resulting voltage pulse, result in data that can be well-approximated by a Gaussian distribution whose mean and variance are determined by physically meaningful instrumental parameters. This allows for a very precise understanding of the signal-to-noise ratio of mass peak intensities and suggests novel ways of improving it. Moreover, it is a prerequisite for being able to address virtually all data analytical problems in downstream analyses in a statistically rigorous manner. The model is validated with experimental data.
RUSHMAPS: Real-Time Uploadable Spherical Harmonic Moment Analysis for Particle Spectrometers
NASA Technical Reports Server (NTRS)
Figueroa-Vinas, Adolfo
2013-01-01
RUSHMAPS is a new onboard data reduction scheme that gives real-time access to key science parameters (e.g. moments) of a class of heliophysics science and/or solar system exploration investigation that includes plasma particle spectrometers (PPS), but requires moments reporting (density, bulk-velocity, temperature, pressure, etc.) of higher-level quality, and tolerates a lowpass (variable quality) spectral representation of the corresponding particle velocity distributions, such that telemetry use is minimized. The proposed methodology trades access to the full-resolution velocity distribution data, saving on telemetry, for real-time access to both the moments and an adjustable-quality (increasing quality increases volume) spectral representation of distribution functions. Traditional onboard data storage and downlink bandwidth constraints severely limit PPS system functionality and drive cost, which, as a consequence, drives a limited data collection and lower angular energy and time resolution. This prototypical system exploit, using high-performance processing technology at GSFC (Goddard Space Flight Center), uses a SpaceCube and/or Maestro-type platform for processing. These processing platforms are currently being used on the International Space Station as a technology demonstration, and work is currently ongoing in a new onboard computation system for the Earth Science missions, but they have never been implemented in heliospheric science or solar system exploration missions. Preliminary analysis confirms that the targeted processor platforms possess the processing resources required for realtime application of these algorithms to the spectrometer data. SpaceCube platforms demonstrate that the target architecture possesses the sort of compact, low-mass/power, radiation-tolerant characteristics needed for flight. These high-performing hybrid systems embed unprecedented amounts of onboard processing power in the CPU (central processing unit), FPGAs (field programmable gate arrays), and DSP (digital signal processing) elements. The fundamental computational algorithm de constructs 3D velocity distributions in terms of spherical harmonic spectral coefficients (which are analogous to a Fourier sine-cosine decomposition), but uses instead spherical harmonics Legendre polynomial orthogonal functions as a basis for the expansion, portraying each 2D angular distribution at every energy or, geometrically, spherical speed-shell swept by the particle spectrometer. Optionally, these spherical harmonic spectral coefficients may be telemetered to the ground. These will provide a smoothed description of the velocity distribution function whose quality will depend on the number of coefficients determined. Successfully implemented on the GSFC-developed processor, the capability to integrate the proposed methodology with both heritage and anticipated future plasma particle spectrometer designs is demonstrated (with sufficiently detailed design analysis to advance TRL) to show specific science relevancy with future HSD (Heliophysics Science Division) solar-interplanetary, planetary missions, sounding rockets and/or CubeSat missions.
Surveillance of industrial processes with correlated parameters
White, Andrew M.; Gross, Kenny C.; Kubic, William L.; Wigeland, Roald A.
1996-01-01
A system and method for surveillance of an industrial process. The system and method includes a plurality of sensors monitoring industrial process parameters, devices to convert the sensed data to computer compatible information and a computer which executes computer software directed to analyzing the sensor data to discern statistically reliable alarm conditions. The computer software is executed to remove serial correlation information and then calculate Mahalanobis distribution data to carry out a probability ratio test to determine alarm conditions.
Computer Sciences and Data Systems, volume 2
NASA Technical Reports Server (NTRS)
1987-01-01
Topics addressed include: data storage; information network architecture; VHSIC technology; fiber optics; laser applications; distributed processing; spaceborne optical disk controller; massively parallel processors; and advanced digital SAR processors.
Measurement Comparisons Towards Improving the Understanding of Aerosol-Cloud Processing
NASA Astrophysics Data System (ADS)
Noble, Stephen R.
Cloud processing of aerosol is an aerosol-cloud interaction that is not heavily researched but could have implications on climate. The three types of cloud processing are chemical processing, collision and coalescence processing, and Brownian capture of interstitial particles. All types improve cloud condensation nuclei (CCN) in size or hygroscopicity (kappa). These improved CCN affect subsequent clouds. This dissertation focuses on measurement comparisons to improve our observations and understanding of aerosol-cloud processing. Particle size distributions measured at the continental Southern Great Plains (SGP) site were compared with ground based measurements of cloud fraction (CF) and cloud base altitude (CBA). Particle size distributions were described by a new objective shape parameter to define bimodality rather than an old subjective one. Cloudy conditions at SGP were found to be correlated with lagged shape parameter. Horizontal wind speed and regional CF explained 42%+ of this lag time. Many of these surface particle size distributions were influenced by aerosol-cloud processing. Thus, cloud processing may be more widespread with more implications than previously thought. Particle size distributions measured during two aircraft field campaigns (MArine Stratus/stratocumulus Experiment; MASE; and Ice in Cloud Experiment-Tropical; ICE-T) were compared to CCN distributions. Tuning particle size to critical supersaturation revealed hygroscopicity expressed as ? when the distributions were overlain. Distributions near cumulus clouds (ICE-T) had a higher frequency of the same ?s (48% in ICE-T to 42% in MASE) between the accumulation (processed) and Aitken (unprocessed) modes. This suggested physical processing domination in ICE-T. More MASE (stratus cloud) kappa differences between modes pointed to chemical cloud processing. Chemistry measurements made in MASE showed increases in sulfates and nitrates with distributions that were more processed. This supported chemical cloud processing in MASE. This new method to determine kappa provides the needed information without interrupting ambient measurements. MODIS derived cloud optical thickness (COT), cloud liquid water path (LWP), and cloud effective radius (re) were compared to the same in situ derived variables from cloud probe measurements of two stratus/stratocumulus cloud campaigns (MASE and Physics Of Stratocumulus Tops; POST). In situ data were from complete vertical cloud penetrations, while MODIS data were from pixels along the aircraft penetration path. Comparisons were well correlated except that MODIS LWP (14-36%) and re (20-30%) were biased high. The LWP bias was from re bias and was not improved by using the vertically stratified assumption. MODIS re bias was almost removed when compared to cloud top maximum in situ re, but, that does not describe re for the full depth of the cloud. COT is validated by in situ COT. High correlations suggest that MODIS variables are useful in self-comparisons such as gradient changes in stratus cloud re during aerosol-cloud processing.
Lognormal field size distributions as a consequence of economic truncation
Attanasi, E.D.; Drew, L.J.
1985-01-01
The assumption of lognormal (parent) field size distributions has for a long time been applied to resource appraisal and evaluation of exploration strategy by the petroleum industry. However, frequency distributions estimated with observed data and used to justify this hypotheses are conditional. Examination of various observed field size distributions across basins and over time shows that such distributions should be regarded as the end result of an economic filtering process. Commercial discoveries depend on oil and gas prices and field development costs. Some new fields are eliminated due to location, depths, or water depths. This filtering process is called economic truncation. Economic truncation may occur when predictions of a discovery process are passed through an economic appraisal model. We demonstrate that (1) economic resource appraisals, (2) forecasts of levels of petroleum industry activity, and (3) expected benefits of developing and implementing cost reducing technology are sensitive to assumptions made about the nature of that portion of (parent) field size distribution subject to economic truncation. ?? 1985 Plenum Publishing Corporation.
Dorazio, Robert; Karanth, K. Ullas
2017-01-01
MotivationSeveral spatial capture-recapture (SCR) models have been developed to estimate animal abundance by analyzing the detections of individuals in a spatial array of traps. Most of these models do not use the actual dates and times of detection, even though this information is readily available when using continuous-time recorders, such as microphones or motion-activated cameras. Instead most SCR models either partition the period of trap operation into a set of subjectively chosen discrete intervals and ignore multiple detections of the same individual within each interval, or they simply use the frequency of detections during the period of trap operation and ignore the observed times of detection. Both practices make inefficient use of potentially important information in the data.Model and data analysisWe developed a hierarchical SCR model to estimate the spatial distribution and abundance of animals detected with continuous-time recorders. Our model includes two kinds of point processes: a spatial process to specify the distribution of latent activity centers of individuals within the region of sampling and a temporal process to specify temporal patterns in the detections of individuals. We illustrated this SCR model by analyzing spatial and temporal patterns evident in the camera-trap detections of tigers living in and around the Nagarahole Tiger Reserve in India. We also conducted a simulation study to examine the performance of our model when analyzing data sets of greater complexity than the tiger data.BenefitsOur approach provides three important benefits: First, it exploits all of the information in SCR data obtained using continuous-time recorders. Second, it is sufficiently versatile to allow the effects of both space use and behavior of animals to be specified as functions of covariates that vary over space and time. Third, it allows both the spatial distribution and abundance of individuals to be estimated, effectively providing a species distribution model, even in cases where spatial covariates of abundance are unknown or unavailable. We illustrated these benefits in the analysis of our data, which allowed us to quantify differences between nocturnal and diurnal activities of tigers and to estimate their spatial distribution and abundance across the study area. Our continuous-time SCR model allows an analyst to specify many of the ecological processes thought to be involved in the distribution, movement, and behavior of animals detected in a spatial trapping array of continuous-time recorders. We plan to extend this model to estimate the population dynamics of animals detected during multiple years of SCR surveys.
Application of TIMS data in stratigraphic analysis
NASA Technical Reports Server (NTRS)
Lang, H. R.
1986-01-01
An in-progress study demonstrates the utility of Thermal Infrared Multispectral Scanner (TIMS) data for unraveling the stratigraphic sequence of a western interior, North American foreland basin. The TIMS data can be used to determine the stratigraphic distribution of minerals that are diagnostic of specific depositional distribution. The thematic mapper (TM) and TIMS data were acquired in the Wind River/Bighorn area of central Wyoming in November 1982, and July 1983, respectively. Combined image processing, photogeologic, and spectral analysis methods were used to: map strata; construct stratigraphic columns; correlate data; and identify mineralogical facies.
NASA Technical Reports Server (NTRS)
Falke, Stefan; Husar, Rudolf
2011-01-01
The goal of this REASoN applications and technology project is to deliver and use Earth Science Enterprise (ESE) data and tools in support of air quality management. Its scope falls within the domain of air quality management and aims to develop a federated air quality information sharing network that includes data from NASA, EPA, US States and others. Project goals were achieved through a access of satellite and ground observation data, web services information technology, interoperability standards, and air quality community collaboration. In contributing to a network of NASA ESE data in support of particulate air quality management, the project will develop access to distributed data, build Web infrastructure, and create tools for data processing and analysis. The key technologies used in the project include emerging web services for developing self describing and modular data access and processing tools, and service oriented architecture for chaining web services together to assemble customized air quality management applications. The technology and tools required for this project were developed within DataFed.net, a shared infrastructure that supports collaborative atmospheric data sharing and processing web services. Much of the collaboration was facilitated through community interactions through the Federation of Earth Science Information Partners (ESIP) Air Quality Workgroup. The main activities during the project that successfully advanced DataFed, enabled air quality applications and established community-oriented infrastructures were: develop access to distributed data (surface and satellite), build Web infrastructure to support data access, processing and analysis create tools for data processing and analysis foster air quality community collaboration and interoperability.
Joint inversion of NMR and SIP data to estimate pore size distribution of geomaterials
NASA Astrophysics Data System (ADS)
Niu, Qifei; Zhang, Chi
2018-03-01
There are growing interests in using geophysical tools to characterize the microstructure of geomaterials because of the non-invasive nature and the applicability in field. In these applications, multiple types of geophysical data sets are usually processed separately, which may be inadequate to constrain the key feature of target variables. Therefore, simultaneous processing of multiple data sets could potentially improve the resolution. In this study, we propose a method to estimate pore size distribution by joint inversion of nuclear magnetic resonance (NMR) T2 relaxation and spectral induced polarization (SIP) spectra. The petrophysical relation between NMR T2 relaxation time and SIP relaxation time is incorporated in a nonlinear least squares problem formulation, which is solved using Gauss-Newton method. The joint inversion scheme is applied to a synthetic sample and a Berea sandstone sample. The jointly estimated pore size distributions are very close to the true model and results from other experimental method. Even when the knowledge of the petrophysical models of the sample is incomplete, the joint inversion can still capture the main features of the pore size distribution of the samples, including the general shape and relative peak positions of the distribution curves. It is also found from the numerical example that the surface relaxivity of the sample could be extracted with the joint inversion of NMR and SIP data if the diffusion coefficient of the ions in the electrical double layer is known. Comparing to individual inversions, the joint inversion could improve the resolution of the estimated pore size distribution because of the addition of extra data sets. The proposed approach might constitute a first step towards a comprehensive joint inversion that can extract the full pore geometry information of a geomaterial from NMR and SIP data.
Cluster analysis for determining distribution center location
NASA Astrophysics Data System (ADS)
Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian
2017-12-01
Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.
Real-time flight test data distribution and display
NASA Technical Reports Server (NTRS)
Nesel, Michael C.; Hammons, Kevin R.
1988-01-01
Enhancements to the real-time processing and display systems of the NASA Western Aeronautical Test Range are described. Display processing has been moved out of the telemetry and radar acquisition processing systems super-minicomputers into user/client interactive graphic workstations. Real-time data is provided to the workstations by way of Ethernet. Future enhancement plans include use of fiber optic cable to replace the Ethernet.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hammond, Glenn Edward; Song, Xuehang; Ye, Ming
A new approach is developed to delineate the spatial distribution of discrete facies (geological units that have unique distributions of hydraulic, physical, and/or chemical properties) conditioned not only on direct data (measurements directly related to facies properties, e.g., grain size distribution obtained from borehole samples) but also on indirect data (observations indirectly related to facies distribution, e.g., hydraulic head and tracer concentration). Our method integrates for the first time ensemble data assimilation with traditional transition probability-based geostatistics. The concept of level set is introduced to build shape parameterization that allows transformation between discrete facies indicators and continuous random variables. Themore » spatial structure of different facies is simulated by indicator models using conditioning points selected adaptively during the iterative process of data assimilation. To evaluate the new method, a two-dimensional semi-synthetic example is designed to estimate the spatial distribution and permeability of two distinct facies from transient head data induced by pumping tests. The example demonstrates that our new method adequately captures the spatial pattern of facies distribution by imposing spatial continuity through conditioning points. The new method also reproduces the overall response in hydraulic head field with better accuracy compared to data assimilation with no constraints on spatial continuity on facies.« less
Creating Composite Age Groups to Smooth Percentile Rank Distributions of Small Samples
ERIC Educational Resources Information Center
Lopez, Francesca; Olson, Amy; Bansal, Naveen
2011-01-01
Individually administered tests are often normed on small samples, a process that may result in irregularities within and across various age or grade distributions. Test users often smooth distributions guided by Thurstone assumptions (normality and linearity) to result in norms that adhere to assumptions made about how the data should look. Test…
Spatial Distribution of Small Water Body Types in Indiana Ecoregions
Due to their large numbers and biogeochemical activity, small water bodies (SWBs), such as ponds and wetlands, can have substantial cumulative effects on hydrologic and biogeochemical processes. Using updated National Wetland Inventory data, we describe the spatial distribution o...
NASA Astrophysics Data System (ADS)
Li, Peng; Olmi, Claudio; Song, Gangbing
2010-04-01
Piezoceramic based transducers are widely researched and used for structural health monitoring (SHM) systems due to the piezoceramic material's inherent advantage of dual sensing and actuation. Wireless sensor network (WSN) technology benefits from advances made in piezoceramic based structural health monitoring systems, allowing easy and flexible installation, low system cost, and increased robustness over wired system. However, piezoceramic wireless SHM systems still faces some drawbacks, one of these is that the piezoceramic based SHM systems require relatively high computational capabilities to calculate damage information, however, battery powered WSN sensor nodes have strict power consumption limitation and hence limited computational power. On the other hand, commonly used centralized processing networks require wireless sensors to transmit all data back to the network coordinator for analysis. This signal processing procedure can be problematic for piezoceramic based SHM applications as it is neither energy efficient nor robust. In this paper, we aim to solve these problems with a distributed wireless sensor network for piezoceramic base structural health monitoring systems. Three important issues: power system, waking up from sleep impact detection, and local data processing, are addressed to reach optimized energy efficiency. Instead of sweep sine excitation that was used in the early research, several sine frequencies were used in sequence to excite the concrete structure. The wireless sensors record the sine excitations and compute the time domain energy for each sine frequency locally to detect the energy change. By comparing the data of the damaged concrete frame with the healthy data, we are able to find out the damage information of the concrete frame. A relative powerful wireless microcontroller was used to carry out the sampling and distributed data processing in real-time. The distributed wireless network dramatically reduced the data transmission between wireless sensor and the wireless coordinator, which in turn reduced the power consumption of the overall system.
New Technology/Old Technology: Comparing Lunar Grain Size Distribution Data and Methods
NASA Technical Reports Server (NTRS)
Fruland, R. M.; Cooper, Bonnie L.; Gonzalexz, C. P.; McKay, David S.
2011-01-01
Laser diffraction technology generates reproducible grain size distributions and reveals new structures not apparent in old sieve data. The comparison of specific sieve fractions with the Microtrac distribution curve generated for those specific fractions shows a reasonable match for the mean of each fraction between the two techniques, giving us confidence that the large existing body of sieve data can be cross-correlated with new data based on laser diffraction. It is well-suited for lunar soils, which have as much as 25% of the material in the less than 20 micrometer fraction. The fines in this range are of particular interest because they may contain a record of important space weathering processes.
Replacement predictions for drinking water networks through historical data.
Malm, Annika; Ljunggren, Olle; Bergstedt, Olof; Pettersson, Thomas J R; Morrison, Gregory M
2012-05-01
Lifetime distribution functions and current network age data can be combined to provide an assessment of the future replacement needs for drinking water distribution networks. Reliable lifetime predictions are limited by a lack of understanding of deterioration processes for different pipe materials under varied conditions. An alternative approach is the use of real historical data for replacement over an extended time series. In this paper, future replacement needs are predicted through historical data representing more than one hundred years of drinking water pipe replacement in Gothenburg, Sweden. The verified data fits well with commonly used lifetime distribution curves. Predictions for the future are discussed in the context of path dependence theory. Copyright © 2012 Elsevier Ltd. All rights reserved.
2003-06-27
KENNEDY SPACE CENTER, FLA. - At Vandenberg Air Force Base, Calif., the Pegasus launch vehicle is moved toward its hangar. The Pegasus will carry the SciSat-1 spacecraft in a 400-mile-high polar orbit to investigate processes that control the distribution of ozone in the upper atmosphere. The scientific mission of SciSat-1 is to measure and understand the chemical processes that control the distribution of ozone in the Earth’s atmosphere, particularly at high altitudes. The data from the satellite will provide Canadian and international scientists with improved measurements relating to global ozone processes and help policymakers assess existing environmental policy and develop protective measures for improving the health of our atmosphere, preventing further ozone depletion. The mission is designed to last two years.
2003-06-27
KENNEDY SPACE CENTER, FLA. - The Pegasus launch vehicle is moved back to its hangar at Vandenberg Air Force Base, Calif. The Pegasus will carry the SciSat-1 spacecraft in a 400-mile-high polar orbit to investigate processes that control the distribution of ozone in the upper atmosphere. The scientific mission of SciSat-1 is to measure and understand the chemical processes that control the distribution of ozone in the Earth’s atmosphere, particularly at high altitudes. The data from the satellite will provide Canadian and international scientists with improved measurements relating to global ozone processes and help policymakers assess existing environmental policy and develop protective measures for improving the health of our atmosphere, preventing further ozone depletion. The mission is designed to last two years.
2003-06-26
KENNEDY SPACE CENTER, FLA. - The SciSat-1 spacecraft is uncrated at Vandenberg Air Force Base, Calif. SciSat-1 weighs approximately 330 pounds and will be placed in a 400-mile-high polar orbit to investigate processes that control the distribution of ozone in the upper atmosphere. The scientific mission of SciSat-1 is to measure and understand the chemical processes that control the distribution of ozone in the Earth’s atmosphere, particularly at high altitudes. The data from the satellite will provide Canadian and international scientists with improved measurements relating to global ozone processes and help policymakers assess existing environmental policy and develop protective measures for improving the health of our atmosphere, preventing further ozone depletion. The mission is designed to last two years.
2003-06-26
KENNEDY SPACE CENTER, FLA. - The SciSat-1 spacecraft is revealed after being uncrated at Vandenberg Air Force Base, Calif. SciSat-1 weighs approximately 330 pounds and will be placed in a 400-mile-high polar orbit to investigate processes that control the distribution of ozone in the upper atmosphere. The scientific mission of SciSat-1 is to measure and understand the chemical processes that control the distribution of ozone in the Earth’s atmosphere, particularly at high altitudes. The data from the satellite will provide Canadian and international scientists with improved measurements relating to global ozone processes and help policymakers assess existing environmental policy and develop protective measures for improving the health of our atmosphere, preventing further ozone depletion. The mission is designed to last two years.
2003-06-26
KENNEDY SPACE CENTER, FLA. - Workers at Vandenberg Air Force Base, Calif., prepare to move the SciSat-1 spacecraft. SciSat-1 weighs approximately 330 pounds and will be placed in a 400-mile-high polar orbit to investigate processes that control the distribution of ozone in the upper atmosphere. The scientific mission of SciSat-1 is to measure and understand the chemical processes that control the distribution of ozone in the Earth’s atmosphere, particularly at high altitudes. The data from the satellite will provide Canadian and international scientists with improved measurements relating to global ozone processes and help policymakers assess existing environmental policy and develop protective measures for improving the health of our atmosphere, preventing further ozone depletion. The mission is designed to last two years.
2003-06-27
KENNEDY SPACE CENTER, FLA. - At Vandenberg Air Force Base, Calif., the Pegasus launch vehicle is moved into its hangar. The Pegasus will carry the SciSat-1 spacecraft in a 400-mile-high polar orbit to investigate processes that control the distribution of ozone in the upper atmosphere. The scientific mission of SciSat-1 is to measure and understand the chemical processes that control the distribution of ozone in the Earth’s atmosphere, particularly at high altitudes. The data from the satellite will provide Canadian and international scientists with improved measurements relating to global ozone processes and help policymakers assess existing environmental policy and develop protective measures for improving the health of our atmosphere, preventing further ozone depletion. The mission is designed to last two years.
KNET - DISTRIBUTED COMPUTING AND/OR DATA TRANSFER PROGRAM
NASA Technical Reports Server (NTRS)
Hui, J.
1994-01-01
KNET facilitates distributed computing between a UNIX compatible local host and a remote host which may or may not be UNIX compatible. It is capable of automatic remote login. That is, it performs on the user's behalf the chore of handling host selection, user name, and password to the designated host. Once the login has been successfully completed, the user may interactively communicate with the remote host. Data output from the remote host may be directed to the local screen, to a local file, and/or to a local process. Conversely, data input from the keyboard, a local file, or a local process may be directed to the remote host. KNET takes advantage of the multitasking and terminal mode control features of the UNIX operating system. A parent process is used as the upper layer for interfacing with the local user. A child process is used for a lower layer for interfacing with the remote host computer, and optionally one or more child processes can be used for the remote data output. Output may be directed to the screen and/or to the local processes under the control of a data pipe switch. In order for KNET to operate, the local and remote hosts must observe a common communications protocol. KNET is written in ANSI standard C-language for computers running UNIX. It has been successfully implemented on several Sun series computers and a DECstation 3100 and used to run programs remotely on VAX VMS and UNIX based computers. It requires 100K of RAM under SunOS and 120K of RAM under DEC RISC ULTRIX. An electronic copy of the documentation is provided on the distribution medium. The standard distribution medium for KNET is a .25 inch streaming magnetic tape cartridge in UNIX tar format. It is also available on a 3.5 inch diskette in UNIX tar format. KNET was developed in 1991 and is a copyrighted work with all copyright vested in NASA. UNIX is a registered trademark of AT&T Bell Laboratories. Sun and SunOS are trademarks of Sun Microsystems, Inc. DECstation, VAX, VMS, and ULTRIX are trademarks of Digital Equipment Corporation.
Research and Development in Very Long Baseline Interferometry (VLBI)
NASA Technical Reports Server (NTRS)
Himwich, William E.
2004-01-01
Contents include the following: 1.Observation coordination. 2. Data acquisition system control software. 3. Station support. 4. Correlation, data processing, and analysis. 5. Data distribution and archiving. 6. Technique improvement and research. 7. Computer support.
NASA Technical Reports Server (NTRS)
1974-01-01
The relative merits of several international data acquisition (IDA) alternatives for the Earth Observatory Satellite (EOS) are established and rated on a cost effectiveness basis. The primary alternatives under consideration are: (1) direct transmission to foreign ground stations, (2) a wideband video tape recorder system for collection of foreign data and processing and distribution from the United States, and (3) a tracking and data relay satellite (TDRS) system for the relay of foreign data to the United States for processing and distribution. A requirements model is established for the analysis on the basis of the heaviest concentration of agricultural areas around the world. The model, the orbit path and the constraints of EOS and data volume summaries are presented. Alternative system descriptions and costs are given in addition to cost-performance summaries.
Variations in global thunderstorm activity inferred from the OTD records
NASA Astrophysics Data System (ADS)
Nickolaenko, A. P.; Hayakawa, M.; Sekiguchi, M.
2006-03-01
We use the data on the planetary distribution of thunderstorms collected by optical transient detector (OTD) to derive the properties of global electric activity. Processing of optical data indicates that modern observations from space confirm the general concept of thunderstorm distribution and motion. Close similarity is demonstrated between the World Meteorological Organization data and modern records including Carnegie curve. Departures noted might be caused by thunderstorms redistribution owing to climate change; the issue deserves a special examination.
A Distributed Processing Approach to Payroll Time Reporting for a Large School District.
ERIC Educational Resources Information Center
Freeman, Raoul J.
1983-01-01
Describes a system for payroll reporting from geographically disparate locations in which data is entered, edited, and verified locally on minicomputers and then uploaded to a central computer for the standard payroll process. Communications and hardware, time-reporting software, data input techniques, system implementation, and its advantages are…
NASA Technical Reports Server (NTRS)
Byrne, F.
1981-01-01
Time-shared interface speeds data processing in distributed computer network. Two-level high-speed scanning approach routes information to buffer, portion of which is reserved for series of "first-in, first-out" memory stacks. Buffer address structure and memory are protected from noise or failed components by error correcting code. System is applicable to any computer or processing language.
Database technology and the management of multimedia data in the Mirror project
NASA Astrophysics Data System (ADS)
de Vries, Arjen P.; Blanken, H. M.
1998-10-01
Multimedia digital libraries require an open distributed architecture instead of a monolithic database system. In the Mirror project, we use the Monet extensible database kernel to manage different representation of multimedia objects. To maintain independence between content, meta-data, and the creation of meta-data, we allow distribution of data and operations using CORBA. This open architecture introduces new problems for data access. From an end user's perspective, the problem is how to search the available representations to fulfill an actual information need; the conceptual gap between human perceptual processes and the meta-data is too large. From a system's perspective, several representations of the data may semantically overlap or be irrelevant. We address these problems with an iterative query process and active user participating through relevance feedback. A retrieval model based on inference networks assists the user with query formulation. The integration of this model into the database design has two advantages. First, the user can query both the logical and the content structure of multimedia objects. Second, the use of different data models in the logical and the physical database design provides data independence and allows algebraic query optimization. We illustrate query processing with a music retrieval application.
A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data
NASA Astrophysics Data System (ADS)
Li, Z.; Hodgson, M.; Li, W.
2016-12-01
Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-15
... ``* * * shall produce, distribute, and process free of charge common financial reporting forms as described in... developed an application process to collect and process the data necessary to determine a student's eligibility to receive Title IV, HEA program assistance. The application process involves an applicant's...
NASA Technical Reports Server (NTRS)
Parrish, R. S.; Carter, M. C.
1974-01-01
This analysis utilizes computer simulation and statistical estimation. Realizations of stationary gaussian stochastic processes with selected autocorrelation functions are computer simulated. Analysis of the simulated data revealed that the mean and the variance of a process were functionally dependent upon the autocorrelation parameter and crossing level. Using predicted values for the mean and standard deviation, by the method of moments, the distribution parameters was estimated. Thus, given the autocorrelation parameter, crossing level, mean, and standard deviation of a process, the probability of exceeding the crossing level for a particular length of time was calculated.
Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation
NASA Astrophysics Data System (ADS)
Anisenkov, A. V.
2018-03-01
In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).
Army Logistician. Volume 39, Issue 2, March-April 2007
2007-04-01
Most Army Reduces Tactical Supply System Footprint by thoMas h. aMent, jr. Centralizing all of the Army’s Corps/Theater Automated Data Processing...Middleware, which comprises both hardware and software, revises data in the Standard Army Retail Supply System (SARSS), thereby extending the use of the...Logistics: Supply Based or Distribution Based? The Changing Face of Fuel Management Combat Logistics Patrol Methodology Distribution-Based
Architectures Toward Reusable Science Data Systems
NASA Astrophysics Data System (ADS)
Moses, J. F.
2014-12-01
Science Data Systems (SDS) comprise an important class of data processing systems that support product generation from remote sensors and in-situ observations. These systems enable research into new science data products, replication of experiments and verification of results. NASA has been building ground systems for satellite data processing since the first Earth observing satellites launched and is continuing development of systems to support NASA science research, NOAA's weather satellites and USGS's Earth observing satellite operations. The basic data processing workflows and scenarios continue to be valid for remote sensor observations research as well as for the complex multi-instrument operational satellite data systems being built today. System functions such as ingest, product generation and distribution need to be configured and performed in a consistent and repeatable way with an emphasis on scalability. This paper will examine the key architectural elements of several NASA satellite data processing systems currently in operation and under development that make them suitable for scaling and reuse. Examples of architectural elements that have become attractive include virtual machine environments, standard data product formats, metadata content and file naming, workflow and job management frameworks, data acquisition, search, and distribution protocols. By highlighting key elements and implementation experience the goal is to recognize architectures that will outlast their original application and be readily adaptable for new applications. Concepts and principles are explored that lead to sound guidance for SDS developers and strategists.
Architectures Toward Reusable Science Data Systems
NASA Technical Reports Server (NTRS)
Moses, John
2015-01-01
Science Data Systems (SDS) comprise an important class of data processing systems that support product generation from remote sensors and in-situ observations. These systems enable research into new science data products, replication of experiments and verification of results. NASA has been building systems for satellite data processing since the first Earth observing satellites launched and is continuing development of systems to support NASA science research and NOAAs Earth observing satellite operations. The basic data processing workflows and scenarios continue to be valid for remote sensor observations research as well as for the complex multi-instrument operational satellite data systems being built today. System functions such as ingest, product generation and distribution need to be configured and performed in a consistent and repeatable way with an emphasis on scalability. This paper will examine the key architectural elements of several NASA satellite data processing systems currently in operation and under development that make them suitable for scaling and reuse. Examples of architectural elements that have become attractive include virtual machine environments, standard data product formats, metadata content and file naming, workflow and job management frameworks, data acquisition, search, and distribution protocols. By highlighting key elements and implementation experience we expect to find architectures that will outlast their original application and be readily adaptable for new applications. Concepts and principles are explored that lead to sound guidance for SDS developers and strategists.
NASA Astrophysics Data System (ADS)
Noh, S.; Tachikawa, Y.; Shiiba, M.; Kim, S.
2011-12-01
Applications of the sequential data assimilation methods have been increasing in hydrology to reduce uncertainty in the model prediction. In a distributed hydrologic model, there are many types of state variables and each variable interacts with each other based on different time scales. However, the framework to deal with the delayed response, which originates from different time scale of hydrologic processes, has not been thoroughly addressed in the hydrologic data assimilation. In this study, we propose the lagged filtering scheme to consider the lagged response of internal states in a distributed hydrologic model using two filtering schemes; particle filtering (PF) and ensemble Kalman filtering (EnKF). The EnKF is one of the widely used sub-optimal filters implementing an efficient computation with limited number of ensemble members, however, still based on Gaussian approximation. PF can be an alternative in which the propagation of all uncertainties is carried out by a suitable selection of randomly generated particles without any assumptions about the nature of the distributions involved. In case of PF, advanced particle regularization scheme is implemented together to preserve the diversity of the particle system. In case of EnKF, the ensemble square root filter (EnSRF) are implemented. Each filtering method is parallelized and implemented in the high performance computing system. A distributed hydrologic model, the water and energy transfer processes (WEP) model, is applied for the Katsura River catchment, Japan to demonstrate the applicability of proposed approaches. Forecasted results via PF and EnKF are compared and analyzed in terms of the prediction accuracy and the probabilistic adequacy. Discussions are focused on the prospects and limitations of each data assimilation method.
Species, functional groups, and thresholds in ecological resilience
Sundstrom, Shana M.; Allen, Craig R.; Barichievy, Chris
2012-01-01
The cross-scale resilience model states that ecological resilience is generated in part from the distribution of functions within and across scales in a system. Resilience is a measure of a system's ability to remain organized around a particular set of mutually reinforcing processes and structures, known as a regime. We define scale as the geographic extent over which a process operates and the frequency with which a process occurs. Species can be categorized into functional groups that are a link between ecosystem processes and structures and ecological resilience. We applied the cross-scale resilience model to avian species in a grassland ecosystem. A species’ morphology is shaped in part by its interaction with ecological structure and pattern, so animal body mass reflects the spatial and temporal distribution of resources. We used the log-transformed rank-ordered body masses of breeding birds associated with grasslands to identify aggregations and discontinuities in the distribution of those body masses. We assessed cross-scale resilience on the basis of 3 metrics: overall number of functional groups, number of functional groups within an aggregation, and the redundancy of functional groups across aggregations. We assessed how the loss of threatened species would affect cross-scale resilience by removing threatened species from the data set and recalculating values of the 3 metrics. We also determined whether more function was retained than expected after the loss of threatened species by comparing observed loss with simulated random loss in a Monte Carlo process. The observed distribution of function compared with the random simulated loss of function indicated that more functionality in the observed data set was retained than expected. On the basis of our results, we believe an ecosystem with a full complement of species can sustain considerable species losses without affecting the distribution of functions within and across aggregations, although ecological resilience is reduced. We propose that the mechanisms responsible for shaping discontinuous distributions of body mass and the nonrandom distribution of functions may also shape species losses such that local extinctions will be nonrandom with respect to the retention and distribution of functions and that the distribution of function within and across aggregations will be conserved despite extinctions.
GANGA: A tool for computational-task management and easy access to Grid resources
NASA Astrophysics Data System (ADS)
Mościcki, J. T.; Brochu, F.; Ebke, J.; Egede, U.; Elmsheuser, J.; Harrison, K.; Jones, R. W. L.; Lee, H. C.; Liko, D.; Maier, A.; Muraru, A.; Patrick, G. N.; Pajchel, K.; Reece, W.; Samset, B. H.; Slater, M. W.; Soroko, A.; Tan, C. L.; van der Ster, D. C.; Williams, M.
2009-11-01
In this paper, we present the computational task-management tool GANGA, which allows for the specification, submission, bookkeeping and post-processing of computational tasks on a wide set of distributed resources. GANGA has been developed to solve a problem increasingly common in scientific projects, which is that researchers must regularly switch between different processing systems, each with its own command set, to complete their computational tasks. GANGA provides a homogeneous environment for processing data on heterogeneous resources. We give examples from High Energy Physics, demonstrating how an analysis can be developed on a local system and then transparently moved to a Grid system for processing of all available data. GANGA has an API that can be used via an interactive interface, in scripts, or through a GUI. Specific knowledge about types of tasks or computational resources is provided at run-time through a plugin system, making new developments easy to integrate. We give an overview of the GANGA architecture, give examples of current use, and demonstrate how GANGA can be used in many different areas of science. Catalogue identifier: AEEN_v1_0 Program summary URL:
de Blasio, Birgitte Freiesleben; Seierstad, Taral Guldahl; Aalen, Odd O
2011-01-01
Preferential attachment is a proportionate growth process in networks, where nodes receive new links in proportion to their current degree. Preferential attachment is a popular generative mechanism to explain the widespread observation of power-law-distributed networks. An alternative explanation for the phenomenon is a randomly grown network with large individual variation in growth rates among the nodes (frailty). We derive analytically the distribution of individual rates, which will reproduce the connectivity distribution that is obtained from a general preferential attachment process (Yule process), and the structural differences between the two types of graphs are examined by simulations. We present a statistical test to distinguish the two generative mechanisms from each other and we apply the test to both simulated data and two real data sets of scientific citation and sexual partner networks. The findings from the latter analyses argue for frailty effects as an important mechanism underlying the dynamics of complex networks. PMID:21572513
WAMS measurements pre-processing for detecting low-frequency oscillations in power systems
NASA Astrophysics Data System (ADS)
Kovalenko, P. Y.
2017-07-01
Processing the data received from measurement systems implies the situation when one or more registered values stand apart from the sample collection. These values are referred to as “outliers”. The processing results may be influenced significantly by the presence of those in the data sample under consideration. In order to ensure the accuracy of low-frequency oscillations detection in power systems the corresponding algorithm has been developed for the outliers detection and elimination. The algorithm is based on the concept of the irregular component of measurement signal. This component comprises measurement errors and is assumed to be Gauss-distributed random. The median filtering is employed to detect the values lying outside the range of the normally distributed measurement error on the basis of a 3σ criterion. The algorithm has been validated involving simulated signals and WAMS data as well.
Surveillance of industrial processes with correlated parameters
White, A.M.; Gross, K.C.; Kubic, W.L.; Wigeland, R.A.
1996-12-17
A system and method for surveillance of an industrial process are disclosed. The system and method includes a plurality of sensors monitoring industrial process parameters, devices to convert the sensed data to computer compatible information and a computer which executes computer software directed to analyzing the sensor data to discern statistically reliable alarm conditions. The computer software is executed to remove serial correlation information and then calculate Mahalanobis distribution data to carry out a probability ratio test to determine alarm conditions. 10 figs.
The imaging node for the Planetary Data System
Eliason, E.M.; LaVoie, S.K.; Soderblom, L.A.
1996-01-01
The Planetary Data System Imaging Node maintains and distributes the archives of planetary image data acquired from NASA's flight projects with the primary goal of enabling the science community to perform image processing and analysis on the data. The Node provides direct and easy access to the digital image archives through wide distribution of the data on CD-ROM media and on-line remote-access tools by way of Internet services. The Node provides digital image processing tools and the expertise and guidance necessary to understand the image collections. The data collections, now approaching one terabyte in volume, provide a foundation for remote sensing studies for virtually all the planetary systems in our solar system (except for Pluto). The Node is responsible for restoring data sets from past missions in danger of being lost. The Node works with active flight projects to assist in the creation of their archive products and to ensure that their products and data catalogs become an integral part of the Node's data collections.
BES-III distributed computing status
NASA Astrophysics Data System (ADS)
Belov, S. D.; Deng, Z. Y.; Korenkov, V. V.; Li, W. D.; Lin, T.; Ma, Z. T.; Nicholson, C.; Pelevanyuk, I. S.; Suo, B.; Trofimov, V. V.; Tsaregorodtsev, A. U.; Uzhinskiy, A. V.; Yan, T.; Yan, X. F.; Zhang, X. M.; Zhemchugov, A. S.
2016-09-01
The BES-III experiment at the Institute of High Energy Physics (Beijing, China) is aimed at the precision measurements in e+e- annihilation in the energy range from 2.0 till 4.6 GeV. The world's largest samples of J/psi and psi' events and unique samples of XYZ data have been already collected. The expected increase of the data volume in the coming years required a significant evolution of the computing model, namely shift from a centralized data processing to a distributed one. This report summarizes a current design of the BES-III distributed computing system, some of key decisions and experience gained during 2 years of operations.
NASA Astrophysics Data System (ADS)
Longmore, S. P.; Bikos, D.; Szoke, E.; Miller, S. D.; Brummer, R.; Lindsey, D. T.; Hillger, D.
2014-12-01
The increasing use of mobile phones equipped with digital cameras and the ability to post images and information to the Internet in real-time has significantly improved the ability to report events almost instantaneously. In the context of severe weather reports, a representative digital image conveys significantly more information than a simple text or phone relayed report to a weather forecaster issuing severe weather warnings. It also allows the forecaster to reasonably discern the validity and quality of a storm report. Posting geo-located, time stamped storm report photographs utilizing a mobile phone application to NWS social media weather forecast office pages has generated recent positive feedback from forecasters. Building upon this feedback, this discussion advances the concept, development, and implementation of a formalized Photo Storm Report (PSR) mobile application, processing and distribution system and Advanced Weather Interactive Processing System II (AWIPS-II) plug-in display software.The PSR system would be composed of three core components: i) a mobile phone application, ii) a processing and distribution software and hardware system, and iii) AWIPS-II data, exchange and visualization plug-in software. i) The mobile phone application would allow web-registered users to send geo-location, view direction, and time stamped PSRs along with severe weather type and comments to the processing and distribution servers. ii) The servers would receive PSRs, convert images and information to NWS network bandwidth manageable sizes in an AWIPS-II data format, distribute them on the NWS data communications network, and archive the original PSRs for possible future research datasets. iii) The AWIPS-II data and exchange plug-ins would archive PSRs, and the visualization plug-in would display PSR locations, times and directions by hour, similar to surface observations. Hovering on individual PSRs would reveal photo thumbnails and clicking on them would display the full resolution photograph.Here, we present initial NWS forecaster feedback received from social media posted PSRs, motivating the possible advantages of PSRs within AWIPS-II, the details of developing and implementing a PSR system, and possible future applications beyond severe weather reports and AWIPS-II.
2015-09-01
Figures iv List of Tables iv 1. Introduction 1 2. Device Status Data 1 2.1 SNMP 1 2.2 NMS 1 2.3 ICMP Ping 2 3. Data Collection 2 4. Hydra ...Configuration 3 4.1 Status Codes 4 4.2 Request Time 5 4.3 Hydra BLOb Metadata 6 5. Data Processing 6 5.1 Hydra Data Processing Framework 6 5.1.1...Basic Components 6 5.1.2 Map Component 7 5.1.3 Postmap Methods 8 5.1.4 Data Flow 9 5.1.5 Distributed Processing Considerations 9 5.2 Specific Hydra
The Joint Distribution Process Analysis Center (JDPAC): Background and Current Capability
2007-06-12
Systems Integration and Data Management JDDE Analysis/Global Distribution Performance Assessment Futures/Transformation Analysis Balancing Operational Art ... Science JDPAC “101” USTRANSCOM Future Operations Center SDDC – TEA Army SES (Dual Hat) • Transportability Engineering • Other Title 10
The Role of Graphlets in Viral Processes on Networks
NASA Astrophysics Data System (ADS)
Khorshidi, Samira; Al Hasan, Mohammad; Mohler, George; Short, Martin B.
2018-05-01
Predicting the evolution of viral processes on networks is an important problem with applications arising in biology, the social sciences, and the study of the Internet. In existing works, mean-field analysis based upon degree distribution is used for the prediction of viral spreading across networks of different types. However, it has been shown that degree distribution alone fails to predict the behavior of viruses on some real-world networks and recent attempts have been made to use assortativity to address this shortcoming. In this paper, we show that adding assortativity does not fully explain the variance in the spread of viruses for a number of real-world networks. We propose using the graphlet frequency distribution in combination with assortativity to explain variations in the evolution of viral processes across networks with identical degree distribution. Using a data-driven approach by coupling predictive modeling with viral process simulation on real-world networks, we show that simple regression models based on graphlet frequency distribution can explain over 95% of the variance in virality on networks with the same degree distribution but different network topologies. Our results not only highlight the importance of graphlets but also identify a small collection of graphlets which may have the highest influence over the viral processes on a network.
Extensible packet processing architecture
Robertson, Perry J.; Hamlet, Jason R.; Pierson, Lyndon G.; Olsberg, Ronald R.; Chun, Guy D.
2013-08-20
A technique for distributed packet processing includes sequentially passing packets associated with packet flows between a plurality of processing engines along a flow through data bus linking the plurality of processing engines in series. At least one packet within a given packet flow is marked by a given processing engine to signify by the given processing engine to the other processing engines that the given processing engine has claimed the given packet flow for processing. A processing function is applied to each of the packet flows within the processing engines and the processed packets are output on a time-shared, arbitered data bus coupled to the plurality of processing engines.
Scoring in genetically modified organism proficiency tests based on log-transformed results.
Thompson, Michael; Ellison, Stephen L R; Owen, Linda; Mathieson, Kenneth; Powell, Joanne; Key, Pauline; Wood, Roger; Damant, Andrew P
2006-01-01
The study considers data from 2 UK-based proficiency schemes and includes data from a total of 29 rounds and 43 test materials over a period of 3 years. The results from the 2 schemes are similar and reinforce each other. The amplification process used in quantitative polymerase chain reaction determinations predicts a mixture of normal, binomial, and lognormal distributions dominated by the latter 2. As predicted, the study results consistently follow a positively skewed distribution. Log-transformation prior to calculating z-scores is effective in establishing near-symmetric distributions that are sufficiently close to normal to justify interpretation on the basis of the normal distribution.
Exploiting replication in distributed systems
NASA Technical Reports Server (NTRS)
Birman, Kenneth P.; Joseph, T. A.
1989-01-01
Techniques are examined for replicating data and execution in directly distributed systems: systems in which multiple processes interact directly with one another while continuously respecting constraints on their joint behavior. Directly distributed systems are often required to solve difficult problems, ranging from management of replicated data to dynamic reconfiguration in response to failures. It is shown that these problems reduce to more primitive, order-based consistency problems, which can be solved using primitives such as the reliable broadcast protocols. Moreover, given a system that implements reliable broadcast primitives, a flexible set of high-level tools can be provided for building a wide variety of directly distributed application programs.
IDCDACS: IDC's Distributed Application Control System
NASA Astrophysics Data System (ADS)
Ertl, Martin; Boresch, Alexander; Kianička, Ján; Sudakov, Alexander; Tomuta, Elena
2015-04-01
The Preparatory Commission for the CTBTO is an international organization based in Vienna, Austria. Its mission is to establish a global verification regime to monitor compliance with the Comprehensive Nuclear-Test-Ban Treaty (CTBT), which bans all nuclear explosions. For this purpose time series data from a global network of seismic, hydro-acoustic and infrasound (SHI) sensors are transmitted to the International Data Centre (IDC) in Vienna in near-real-time, where it is processed to locate events that may be nuclear explosions. We newly designed the distributed application control system that glues together the various components of the automatic waveform data processing system at the IDC (IDCDACS). Our highly-scalable solution preserves the existing architecture of the IDC processing system that proved successful over many years of operational use, but replaces proprietary components with open-source solutions and custom developed software. Existing code was refactored and extended to obtain a reusable software framework that is flexibly adaptable to different types of processing workflows. Automatic data processing is organized in series of self-contained processing steps, each series being referred to as a processing pipeline. Pipelines process data by time intervals, i.e. the time-series data received from monitoring stations is organized in segments based on the time when the data was recorded. So-called data monitor applications queue the data for processing in each pipeline based on specific conditions, e.g. data availability, elapsed time or completion states of preceding processing pipelines. IDCDACS consists of a configurable number of distributed monitoring and controlling processes, a message broker and a relational database. All processes communicate through message queues hosted on the message broker. Persistent state information is stored in the database. A configurable processing controller instantiates and monitors all data processing applications. Due to decoupling by message queues the system is highly versatile and failure tolerant. The implementation utilizes the RabbitMQ open-source messaging platform that is based upon the Advanced Message Queuing Protocol (AMQP), an on-the-wire protocol (like HTML) and open industry standard. IDCDACS uses high availability capabilities provided by RabbitMQ and is equipped with failure recovery features to survive network and server outages. It is implemented in C and Python and is operated in a Linux environment at the IDC. Although IDCDACS was specifically designed for the existing IDC processing system its architecture is generic and reusable for different automatic processing workflows, e.g. similar to those described in (Olivieri et al. 2012, Kværna et al. 2012). Major advantages are its independence of the specific data processing applications used and the possibility to reconfigure IDCDACS for different types of processing, data and trigger logic. A possible future development would be to use the IDCDACS framework for different scientific domains, e.g. for processing of Earth observation satellite data extending the one-dimensional time-series intervals to spatio-temporal data cubes. REFERENCES Olivieri M., J. Clinton (2012) An almost fair comparison between Earthworm and SeisComp3, Seismological Research Letters, 83(4), 720-727. Kværna, T., S. J. Gibbons, D. B. Harris, D. A. Dodge (2012) Adapting pipeline architectures to track developing aftershock sequences and recurrent explosions, Proceedings of the 2012 Monitoring Research Review: Ground-Based Nuclear Explosion Monitoring Technologies, 776-785.
THE BERKELEY DATA ANALYSIS SYSTEM (BDAS): AN OPEN SOURCE PLATFORM FOR BIG DATA ANALYTICS
2017-09-01
Evan Sparks, Oliver Zahn, Michael J. Franklin, David A. Patterson, Saul Perlmutter. Scientific Computing Meets Big Data Technology: An Astronomy ...Processing Astronomy Imagery Using Big Data Technology. IEEE Transaction on Big Data, 2016. Approved for Public Release; Distribution Unlimited. 22 [93
Classification of cognitive systems dedicated to data sharing
NASA Astrophysics Data System (ADS)
Ogiela, Lidia; Ogiela, Marek R.
2017-08-01
In this paper will be presented classification of new cognitive information systems dedicated to cryptographic data splitting and sharing processes. Cognitive processes of semantic data analysis and interpretation, will be used to describe new classes of intelligent information and vision systems. In addition, cryptographic data splitting algorithms and cryptographic threshold schemes will be used to improve processes of secure and efficient information management with application of such cognitive systems. The utility of the proposed cognitive sharing procedures and distributed data sharing algorithms will be also presented. A few possible application of cognitive approaches for visual information management and encryption will be also described.
The research of approaches of applying the results of big data analysis in higher education
NASA Astrophysics Data System (ADS)
Kochetkov, O. T.; Prokhorov, I. V.
2017-01-01
This article briefly discusses the approaches to the use of Big Data in the educational process of higher educational institutions. There is a brief description of nature of big data, their distribution in the education industry and new ways to use Big Data as part of the educational process are offered as well. This article describes a method for the analysis of the relevant requests by using Yandex.Wordstat (for laboratory works on the processing of data) and Google Trends (for actual pictures of interest and preference in a higher education institution).
Heterogeneous distributed query processing: The DAVID system
NASA Technical Reports Server (NTRS)
Jacobs, Barry E.
1985-01-01
The objective of the Distributed Access View Integrated Database (DAVID) project is the development of an easy to use computer system with which NASA scientists, engineers and administrators can uniformly access distributed heterogeneous databases. Basically, DAVID will be a database management system that sits alongside already existing database and file management systems. Its function is to enable users to access the data in other languages and file systems without having to learn the data manipulation languages. Given here is an outline of a talk on the DAVID project and several charts.
NASA Astrophysics Data System (ADS)
Collins, J.; Riegler, G.; Schrader, H.; Tinz, M.
2015-04-01
The Geo-intelligence division of Airbus Defence and Space and the German Aerospace Center (DLR) have partnered to produce the first fully global, high-accuracy Digital Surface Model (DSM) using SAR data from the twin satellite constellation: TerraSAR-X and TanDEM-X. The DLR is responsible for the processing and distribution of the TanDEM-X elevation model for the world's scientific community, while Airbus DS is responsible for the commercial production and distribution of the data, under the brand name WorldDEM. For the provision of a consumer-ready product, Airbus DS undertakes several steps to reduce the effect of radar-specific artifacts in the WorldDEM data. These artifacts can be divided into two categories: terrain and hydrological. Airbus DS has developed proprietary software and processes to detect and correct these artifacts in the most efficient manner. Some processes are fullyautomatic, while others require manual or semi-automatic control by operators.
Anastario, Michael P; Rodriguez, Hector P; Gallagher, Patricia M; Cleary, Paul D; Shaller, Dale; Rogers, William H; Bogen, Karen; Safran, Dana Gelb
2010-01-01
Objective To assess the effect of survey distribution protocol (mail versus handout) on data quality and measurement of patient care experiences. Data Sources/Study Setting Multisite randomized trial of survey distribution protocols. Analytic sample included 2,477 patients of 15 clinicians at three practice sites in New York State. Data Collection/Extraction Methods Mail and handout distribution modes were alternated weekly at each site for 6 weeks. Principal Findings Handout protocols yielded an incomplete distribution rate (74 percent) and lower overall response rates (40 percent versus 58 percent) compared with mail. Handout distribution rates decreased over time and resulted in more favorable survey scores compared with mailed surveys. There were significant mode–physician interaction effects, indicating that data cannot simply be pooled and adjusted for mode. Conclusions In-office survey distribution has the potential to bias measurement and comparison of physicians and sites on patient care experiences. Incomplete distribution rates observed in-office, together with between-office differences in distribution rates and declining rates over time suggest staff may be burdened by the process and selective in their choice of patients. Further testing with a larger physician and site sample is important to definitively establish the potential role for in-office distribution in obtaining reliable, valid assessment of patient care experiences. PMID:20579126
Optimally Distributed Kalman Filtering with Data-Driven Communication †
Dormann, Katharina
2018-01-01
For multisensor data fusion, distributed state estimation techniques that enable a local processing of sensor data are the means of choice in order to minimize storage and communication costs. In particular, a distributed implementation of the optimal Kalman filter has recently been developed. A significant disadvantage of this algorithm is that the fusion center needs access to each node so as to compute a consistent state estimate, which requires full communication each time an estimate is requested. In this article, different extensions of the optimally distributed Kalman filter are proposed that employ data-driven transmission schemes in order to reduce communication expenses. As a first relaxation of the full-rate communication scheme, it can be shown that each node only has to transmit every second time step without endangering consistency of the fusion result. Also, two data-driven algorithms are introduced that even allow for lower transmission rates, and bounds are derived to guarantee consistent fusion results. Simulations demonstrate that the data-driven distributed filtering schemes can outperform a centralized Kalman filter that requires each measurement to be sent to the center node. PMID:29596392
A simple model for factory distribution: Historical effect in an industry city
NASA Astrophysics Data System (ADS)
Uehara, Takashi; Sato, Kazunori; Morita, Satoru; Maeda, Yasunobu; Yoshimura, Jin; Tainaka, Kei-ichi
2016-02-01
The construction and discontinuance processes of factories are complicated problems in sociology. We focus on the spatial and temporal changes of factories at Hamamatsu city in Japan. Real data indicate that the clumping degree of factories decreases as the density of factory increases. To represent the spatial and temporal changes of factories, we apply "contact process" which is one of cellular automata. This model roughly explains the dynamics of factory distribution. We also find "historical effect" in spatial distribution. Namely, the recent factories have been dispersed due to the past distribution during the period of economic bubble. This effect may be related to heavy shock in Japanese stock market.
Benkner, Siegfried; Arbona, Antonio; Berti, Guntram; Chiarini, Alessandro; Dunlop, Robert; Engelbrecht, Gerhard; Frangi, Alejandro F; Friedrich, Christoph M; Hanser, Susanne; Hasselmeyer, Peer; Hose, Rod D; Iavindrasana, Jimison; Köhler, Martin; Iacono, Luigi Lo; Lonsdale, Guy; Meyer, Rodolphe; Moore, Bob; Rajasekaran, Hariharan; Summers, Paul E; Wöhrer, Alexander; Wood, Steven
2010-11-01
The increasing volume of data describing human disease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the @neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system's architecture is generic enough that it could be adapted to the treatment of other diseases. Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers clinicians the tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medical researchers gain access to a critical mass of aneurysm related data due to the system's ability to federate distributed information sources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access and work on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand for performing computationally intensive simulations for treatment planning and research.
Data Strategies to Support Automated Multi-Sensor Data Fusion in a Service Oriented Architecture
2008-06-01
and employ vast quantities of content. This dissertation provides two software architectural patterns and an auto-fusion process that guide the...UDDI), Simple Order Access Protocol (SOAP), Java, Maritime Domain Awareness (MDA), Business Process Execution Language for Web Service (BPEL4WS) 16...content. This dissertation provides two software architectural patterns and an auto-fusion process that guide the development of a distributed
Neti, Prasad V.S.V.; Howell, Roger W.
2010-01-01
Recently, the distribution of radioactivity among a population of cells labeled with 210Po was shown to be well described by a log-normal (LN) distribution function (J Nucl Med. 2006;47:1049–1058) with the aid of autoradiography. To ascertain the influence of Poisson statistics on the interpretation of the autoradiographic data, the present work reports on a detailed statistical analysis of these earlier data. Methods The measured distributions of α-particle tracks per cell were subjected to statistical tests with Poisson, LN, and Poisson-lognormal (P-LN) models. Results The LN distribution function best describes the distribution of radioactivity among cell populations exposed to 0.52 and 3.8 kBq/mL of 210Po-citrate. When cells were exposed to 67 kBq/mL, the P-LN distribution function gave a better fit; however, the underlying activity distribution remained log-normal. Conclusion The present analysis generally provides further support for the use of LN distributions to describe the cellular uptake of radioactivity. Care should be exercised when analyzing autoradiographic data on activity distributions to ensure that Poisson processes do not distort the underlying LN distribution. PMID:18483086
Neti, Prasad V.S.V.; Howell, Roger W.
2008-01-01
Recently, the distribution of radioactivity among a population of cells labeled with 210Po was shown to be well described by a log normal distribution function (J Nucl Med 47, 6 (2006) 1049-1058) with the aid of an autoradiographic approach. To ascertain the influence of Poisson statistics on the interpretation of the autoradiographic data, the present work reports on a detailed statistical analyses of these data. Methods The measured distributions of alpha particle tracks per cell were subjected to statistical tests with Poisson (P), log normal (LN), and Poisson – log normal (P – LN) models. Results The LN distribution function best describes the distribution of radioactivity among cell populations exposed to 0.52 and 3.8 kBq/mL 210Po-citrate. When cells were exposed to 67 kBq/mL, the P – LN distribution function gave a better fit, however, the underlying activity distribution remained log normal. Conclusions The present analysis generally provides further support for the use of LN distributions to describe the cellular uptake of radioactivity. Care should be exercised when analyzing autoradiographic data on activity distributions to ensure that Poisson processes do not distort the underlying LN distribution. PMID:16741316
Stabilization process of human population: a descriptive approach.
Kayani, A K; Krotki, K J
1981-01-01
An attempt is made to inquire into the process of stabilization of a human population. The same age distribution distorted by past variations in fertility is subjected to several fixed schedules of fertility. The schedules are different from each other monotonically over a narrow range. The primary concern is with the process, almost year by year, through which the populations become stable. There is particular interest in the differential impact in the same original age distribution of the narrowly different fixed fertility schedules. The exercise is prepared in 3 stages: general background of the process of stabilization; methodology and data used; and analysis and discussion of the stabilization process. Among the several approaches through which the analysis of stable population is possible, 2 are popular: the integral equation and the projection matrix. In this presentation the interest is in evaluating the effects of fertility on the stabilization process of a population. Therefore, only 1 initial age distribution and only 1 life table but a variety of narrowly different schedules of fertility have been used. Specifically, the U.S. 1963 female population is treated as the initial population. The process of stabilization is viewed in the light of the changes in the slopes between 2 successive age groups of an age distribution. A high fertility schedule with the given initial age distribution and mortality level overcomes the oscillations more quickly than the low fertility schedule. Simulation confirms the intuitively expected positive relationship between the mean of the slope and the level of fertility. The variance of the slope distribution is an indicator of the aging of the distribution.
Rainfall continuous time stochastic simulation for a wet climate in the Cantabric Coast
NASA Astrophysics Data System (ADS)
Rebole, Juan P.; Lopez, Jose J.; Garcia-Guzman, Adela
2010-05-01
Rain is the result of a series of complex atmospheric processes which are influenced by numerous factors. This complexity makes its simulation practically unfeasible from a physical basis, advising the use of stochastic diagrams. These diagrams, which are based on observed characteristics (Todorovic and Woolhiser, 1975), allow the introduction of renewal alternating processes, that account for the occurrence of rainfall at different time lapses (Markov chains are a particular case, where lapses can be described by exponential distributions). Thus, a sequential rainfall process can be defined as a temporal series in which rainfall events (periods in which rainfall is recorded) alternate with non rain events (periods in which no rainfall is recorded). The variables of a temporal rain sequence have been characterized (duration of the rainfall event, duration of the non rainfall event, average intensity of the rain in the rain event, and a temporal distribution of the amount of rain in the rain event) in a wet climate such as that of the coastal area of Guipúzcoa. The study has been performed from two series recorded at the meteorological stations of Igueldo-San Sebastián and Fuenterrabia / Airport (data every ten minutes and for its hourly aggregation). As a result of this work, the variables satisfactorily fitted the following distribution functions: the duration of the rain event to a exponential function; the duration of the dry event to a truncated exponential mixed distribution; the average intensity to a Weibull distribution; and the distribution of the rain fallen to the Beta distribution. The characterization was made for an hourly aggregation of the recorded interval of ten minutes. The parameters of the fitting functions were better obtained by means of the maximum likelihood method than the moment method. The parameters obtained from the characterization were used to develop a stochastic rainfall process simulation model by means of a three states Markov chain (Hutchinson, 1990), performed in an hourly basis by García-Guzmán (1993) and Castro et al. (1997, 2005 ). Simulation process results were valid in the hourly case for all the four described variables, with a slightly better response in Fuenterrabia than in Igueldo. In summary, all the variables were better simulated in Fuenterrabia than in Igueldo. Fuenterrabia data series is shorter and with longer sequences without missing data, compared to Igueldo. The latter shows higher number of missing data events, whereas its mean duration is longer in Fuenterrabia.
Yiu, Sean; Tom, Brian Dm
2017-01-01
Several researchers have described two-part models with patient-specific stochastic processes for analysing longitudinal semicontinuous data. In theory, such models can offer greater flexibility than the standard two-part model with patient-specific random effects. However, in practice, the high dimensional integrations involved in the marginal likelihood (i.e. integrated over the stochastic processes) significantly complicates model fitting. Thus, non-standard computationally intensive procedures based on simulating the marginal likelihood have so far only been proposed. In this paper, we describe an efficient method of implementation by demonstrating how the high dimensional integrations involved in the marginal likelihood can be computed efficiently. Specifically, by using a property of the multivariate normal distribution and the standard marginal cumulative distribution function identity, we transform the marginal likelihood so that the high dimensional integrations are contained in the cumulative distribution function of a multivariate normal distribution, which can then be efficiently evaluated. Hence, maximum likelihood estimation can be used to obtain parameter estimates and asymptotic standard errors (from the observed information matrix) of model parameters. We describe our proposed efficient implementation procedure for the standard two-part model parameterisation and when it is of interest to directly model the overall marginal mean. The methodology is applied on a psoriatic arthritis data set concerning functional disability.
Distributed Offline Data Reconstruction in BaBar
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pulliam, Teela M
The BaBar experiment at SLAC is in its fourth year of running. The data processing system has been continuously evolving to meet the challenges of higher luminosity running and the increasing bulk of data to re-process each year. To meet these goals a two-pass processing architecture has been adopted, where 'rolling calibrations' are quickly calculated on a small fraction of the events in the first pass and the bulk data reconstruction done in the second. This allows for quick detector feedback in the first pass and allows for the parallelization of the second pass over two or more separate farms.more » This two-pass system allows also for distribution of processing farms off-site. The first such site has been setup at INFN Padova. The challenges met here were many. The software was ported to a full Linux-based, commodity hardware system. The raw dataset, 90 TB, was imported from SLAC utilizing a 155 Mbps network link. A system for quality control and export of the processed data back to SLAC was developed. Between SLAC and Padova we are currently running three pass-one farms, with 32 CPUs each, and nine pass-two farms with 64 to 80 CPUs each. The pass-two farms can process between 2 and 4 million events per day. Details about the implementation and performance of the system will be presented.« less
Angular distributions and mechanisms of fragmentation by relativistic heavy ions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stoenner, R.W.; Haustein, P.E.; Cumming, J.B.
1984-07-23
Angular distributions of massive fragments from relativistic heavy-ion interactions are reported. Sideward peaking is observed for the light fragment /sup 37/Ar, from 25-GeV /sup 12/C+Au, while the distribution for /sup 127/Xe is strongly forward peaked. Conflicts of these observations and other existing data with predictions of models for the fragmentation process are discussed.
Baldovin-Stella stochastic volatility process and Wiener process mixtures
NASA Astrophysics Data System (ADS)
Peirano, P. P.; Challet, D.
2012-08-01
Starting from inhomogeneous time scaling and linear decorrelation between successive price returns, Baldovin and Stella recently proposed a powerful and consistent way to build a model describing the time evolution of a financial index. We first make it fully explicit by using Student distributions instead of power law-truncated Lévy distributions and show that the analytic tractability of the model extends to the larger class of symmetric generalized hyperbolic distributions and provide a full computation of their multivariate characteristic functions; more generally, we show that the stochastic processes arising in this framework are representable as mixtures of Wiener processes. The basic Baldovin and Stella model, while mimicking well volatility relaxation phenomena such as the Omori law, fails to reproduce other stylized facts such as the leverage effect or some time reversal asymmetries. We discuss how to modify the dynamics of this process in order to reproduce real data more accurately.
78 FR 47003 - Draft National Spatial Data Infrastructure Strategic Plan; Comment Request
Federal Register 2010, 2011, 2012, 2013, 2014
2013-08-02
... NSDI.'' Executive Order 12906 describes the NSDI as ``the technology, policies, standards, and human resources necessary to acquire, process, store, distribute, and improve utilization of geospatial data...
Distributed Computer Networks in Support of Complex Group Practices
Wess, Bernard P.
1978-01-01
The economics of medical computer networks are presented in context with the patient care and administrative goals of medical networks. Design alternatives and network topologies are discussed with an emphasis on medical network design requirements in distributed data base design, telecommunications, satellite systems, and software engineering. The success of the medical computer networking technology is predicated on the ability of medical and data processing professionals to design comprehensive, efficient, and virtually impenetrable security systems to protect data bases, network access and services, and patient confidentiality.
Communication Interface for a System of Distributed Processes.
1981-03-03
Tnterprocess Communication Service Apnlied in a nistrihuted Data Rase System" (submitte to VCI, Miinchen 10/Rl) and "The Trans- port Service of an...1980 4) B6’-hme, K.: An Tnternrocess Communication Service Applied in a Distribut.1 Data Pase System. Submitted to FCI, tliinchen, Oct. 3091l 5) ribme...K(. The "ransport Service of an Tnterprocess Commiunication Systell. Snbmitte to 7th Data Cor’nunicatinrin rooi1 !’exico City, Oct. 1091 ’"’h~e oi1
7 CFR 275.15 - Data management.
Code of Federal Regulations, 2012 CFR
2012-01-01
... Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING SYSTEM Data Analysis and Evaluation § 275.15 Data management. (a) Analysis. Analysis is the process of classifying data, such as by areas of...
7 CFR 275.15 - Data management.
Code of Federal Regulations, 2014 CFR
2014-01-01
... Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING SYSTEM Data Analysis and Evaluation § 275.15 Data management. (a) Analysis. Analysis is the process of classifying data, such as by areas of...
7 CFR 275.15 - Data management.
Code of Federal Regulations, 2011 CFR
2011-01-01
... Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING SYSTEM Data Analysis and Evaluation § 275.15 Data management. (a) Analysis. Analysis is the process of classifying data, such as by areas of...
7 CFR 275.15 - Data management.
Code of Federal Regulations, 2013 CFR
2013-01-01
... Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE FOOD STAMP AND FOOD DISTRIBUTION PROGRAM PERFORMANCE REPORTING SYSTEM Data Analysis and Evaluation § 275.15 Data management. (a) Analysis. Analysis is the process of classifying data, such as by areas of...
A Distributed Architecture for Tsunami Early Warning and Collaborative Decision-support in Crises
NASA Astrophysics Data System (ADS)
Moßgraber, J.; Middleton, S.; Hammitzsch, M.; Poslad, S.
2012-04-01
The presentation will describe work on the system architecture that is being developed in the EU FP7 project TRIDEC on "Collaborative, Complex and Critical Decision-Support in Evolving Crises". The challenges for a Tsunami Early Warning System (TEWS) are manifold and the success of a system depends crucially on the system's architecture. A modern warning system following a system-of-systems approach has to integrate various components and sub-systems such as different information sources, services and simulation systems. Furthermore, it has to take into account the distributed and collaborative nature of warning systems. In order to create an architecture that supports the whole spectrum of a modern, distributed and collaborative warning system one must deal with multiple challenges. Obviously, one cannot expect to tackle these challenges adequately with a monolithic system or with a single technology. Therefore, a system architecture providing the blueprints to implement the system-of-systems approach has to combine multiple technologies and architectural styles. At the bottom layer it has to reliably integrate a large set of conventional sensors, such as seismic sensors and sensor networks, buoys and tide gauges, and also innovative and unconventional sensors, such as streams of messages from social media services. At the top layer it has to support collaboration on high-level decision processes and facilitates information sharing between organizations. In between, the system has to process all data and integrate information on a semantic level in a timely manner. This complex communication follows an event-driven mechanism allowing events to be published, detected and consumed by various applications within the architecture. Therefore, at the upper layer the event-driven architecture (EDA) aspects are combined with principles of service-oriented architectures (SOA) using standards for communication and data exchange. The most prominent challenges on this layer include providing a framework for information integration on a syntactic and semantic level, leveraging distributed processing resources for a scalable data processing platform, and automating data processing and decision support workflows.
Hail Size Distribution Mapping
NASA Technical Reports Server (NTRS)
2008-01-01
A 3-D weather radar visualization software program was developed and implemented as part of an experimental Launch Pad 39 Hail Monitor System. 3DRadPlot, a radar plotting program, is one of several software modules that form building blocks of the hail data processing and analysis system (the complete software processing system under development). The spatial and temporal mapping algorithms were originally developed through research at the University of Central Florida, funded by NASA s Tropical Rainfall Measurement Mission (TRMM), where the goal was to merge National Weather Service (NWS) Next-Generation Weather Radar (NEXRAD) volume reflectivity data with drop size distribution data acquired from a cluster of raindrop disdrometers. In this current work, we adapted these algorithms to process data from a cluster of hail disdrometers positioned around Launch Pads 39A or 39B, along with the corresponding NWS radar data. Radar data from all NWS NEXRAD sites is archived at the National Climatic Data Center (NCDC). That data can be readily accessed at
NASA Astrophysics Data System (ADS)
Aaboud, M.; Aad, G.; Abbott, B.; Abdallah, J.; Abdinov, O.; Abeloos, B.; Aben, R.; Abouzeid, O. S.; Abraham, N. L.; Abramowicz, H.; Abreu, H.; Abreu, R.; Abulaiti, Y.; Acharya, B. S.; Adachi, S.; Adamczyk, L.; Adams, D. L.; Adelman, J.; Adomeit, S.; Adye, T.; Affolder, A. A.; Agatonovic-Jovin, T.; Aguilar-Saavedra, J. A.; Ahlen, S. P.; Ahmadov, F.; Aielli, G.; Akerstedt, H.; Åkesson, T. P. A.; Akimov, A. V.; Alberghi, G. L.; Albert, J.; Albrand, S.; Alconada Verzini, M. J.; Aleksa, M.; Aleksandrov, I. N.; Alexa, C.; Alexander, G.; Alexopoulos, T.; Alhroob, M.; Ali, B.; Aliev, M.; Alimonti, G.; Alison, J.; Alkire, S. P.; Allbrooke, B. M. M.; Allen, B. W.; Allport, P. P.; Aloisio, A.; Alonso, A.; Alonso, F.; Alpigiani, C.; Alshehri, A. A.; Alstaty, M.; Alvarez Gonzalez, B.; Álvarez Piqueras, D.; Alviggi, M. G.; Amadio, B. T.; Amako, K.; Amaral Coutinho, Y.; Amelung, C.; Amidei, D.; Amor Dos Santos, S. P.; Amorim, A.; Amoroso, S.; Amundsen, G.; Anastopoulos, C.; Ancu, L. S.; Andari, N.; Andeen, T.; Anders, C. F.; Anders, G.; Anders, J. K.; Anderson, K. J.; Andreazza, A.; Andrei, V.; Angelidakis, S.; Angelozzi, I.; Angerami, A.; Anghinolfi, F.; Anisenkov, A. V.; Anjos, N.; Annovi, A.; Antel, C.; Antonelli, M.; Antonov, A.; Anulli, F.; Aoki, M.; Aperio Bella, L.; Arabidze, G.; Arai, Y.; Araque, J. P.; Arce, A. T. H.; Arduh, F. A.; Arguin, J.-F.; Argyropoulos, S.; Arik, M.; Armbruster, A. J.; Armitage, L. J.; Arnaez, O.; Arnold, H.; Arratia, M.; Arslan, O.; Artamonov, A.; Artoni, G.; Artz, S.; Asai, S.; Asbah, N.; Ashkenazi, A.; Åsman, B.; Asquith, L.; Assamagan, K.; Astalos, R.; Atkinson, M.; Atlay, N. B.; Augsten, K.; Avolio, G.; Axen, B.; Ayoub, M. K.; Azuelos, G.; Baak, M. A.; Baas, A. E.; Baca, M. J.; Bachacou, H.; Bachas, K.; Backes, M.; Backhaus, M.; Bagiacchi, P.; Bagnaia, P.; Bai, Y.; Baines, J. T.; Baker, O. K.; Baldin, E. M.; Balek, P.; Balestri, T.; Balli, F.; Balunas, W. K.; Banas, E.; Banerjee, Sw.; Bannoura, A. A. E.; Barak, L.; Barberio, E. L.; Barberis, D.; Barbero, M.; Barillari, T.; Barisits, M.-S.; Barklow, T.; Barlow, N.; Barnes, S. L.; Barnett, B. M.; Barnett, R. M.; Barnovska-Blenessy, Z.; Baroncelli, A.; Barone, G.; Barr, A. J.; Barranco Navarro, L.; Barreiro, F.; Barreiro Guimarães da Costa, J.; Bartoldus, R.; Barton, A. E.; Bartos, P.; Basalaev, A.; Bassalat, A.; Bates, R. L.; Batista, S. J.; Batley, J. R.; Battaglia, M.; Bauce, M.; Bauer, F.; Bawa, H. S.; Beacham, J. B.; Beattie, M. D.; Beau, T.; Beauchemin, P. H.; Bechtle, P.; Beck, H. P.; Becker, K.; Becker, M.; Beckingham, M.; Becot, C.; Beddall, A. J.; Beddall, A.; Bednyakov, V. A.; Bedognetti, M.; Bee, C. P.; Beemster, L. J.; Beermann, T. A.; Begel, M.; Behr, J. K.; Belanger-Champagne, C.; Bell, A. S.; Bella, G.; Bellagamba, L.; Bellerive, A.; Bellomo, M.; Belotskiy, K.; Beltramello, O.; Belyaev, N. L.; Benary, O.; Benchekroun, D.; Bender, M.; Bendtz, K.; Benekos, N.; Benhammou, Y.; Benhar Noccioli, E.; Benitez, J.; Benjamin, D. P.; Bensinger, J. R.; Bentvelsen, S.; Beresford, L.; Beretta, M.; Berge, D.; Bergeaas Kuutmann, E.; Berger, N.; Beringer, J.; Berlendis, S.; Bernard, N. R.; Bernius, C.; Bernlochner, F. U.; Berry, T.; Berta, P.; Bertella, C.; Bertoli, G.; Bertolucci, F.; Bertram, I. A.; Bertsche, C.; Bertsche, D.; Besjes, G. J.; Bessidskaia Bylund, O.; Bessner, M.; Besson, N.; Betancourt, C.; Bethani, A.; Bethke, S.; Bevan, A. J.; Bianchi, R. M.; Bianchini, L.; Bianco, M.; Biebel, O.; Biedermann, D.; Bielski, R.; Biesuz, N. V.; Biglietti, M.; Bilbao de Mendizabal, J.; Billoud, T. R. V.; Bilokon, H.; Bindi, M.; Binet, S.; Bingul, A.; Bini, C.; Biondi, S.; Bisanz, T.; Bjergaard, D. M.; Black, C. W.; Black, J. E.; Black, K. M.; Blackburn, D.; Blair, R. E.; Blanchard, J.-B.; Blazek, T.; Bloch, I.; Blocker, C.; Blue, A.; Blum, W.; Blumenschein, U.; Blunier, S.; Bobbink, G. J.; Bobrovnikov, V. S.; Bocchetta, S. S.; Bocci, A.; Bock, C.; Boehler, M.; Boerner, D.; Bogaerts, J. A.; Bogavac, D.; Bogdanchikov, A. G.; Bohm, C.; Boisvert, V.; Bokan, P.; Bold, T.; Boldyrev, A. S.; Bomben, M.; Bona, M.; Boonekamp, M.; Borisov, A.; Borissov, G.; Bortfeldt, J.; Bortoletto, D.; Bortolotto, V.; Bos, K.; Boscherini, D.; Bosman, M.; Bossio Sola, J. D.; Boudreau, J.; Bouffard, J.; Bouhova-Thacker, E. V.; Boumediene, D.; Bourdarios, C.; Boutle, S. K.; Boveia, A.; Boyd, J.; Boyko, I. R.; Bracinik, J.; Brandt, A.; Brandt, G.; Brandt, O.; Bratzler, U.; Brau, B.; Brau, J. E.; Breaden Madden, W. D.; Brendlinger, K.; Brennan, A. J.; Brenner, L.; Brenner, R.; Bressler, S.; Bristow, T. M.; Britton, D.; Britzger, D.; Brochu, F. M.; Brock, I.; Brock, R.; Brooijmans, G.; Brooks, T.; Brooks, W. K.; Brosamer, J.; Brost, E.; Broughton, J. H.; Bruckman de Renstrom, P. A.; Bruncko, D.; Bruneliere, R.; Bruni, A.; Bruni, G.; Bruni, L. S.; Brunt, Bh; Bruschi, M.; Bruscino, N.; Bryant, P.; Bryngemark, L.; Buanes, T.; Buat, Q.; Buchholz, P.; Buckley, A. G.; Budagov, I. A.; Buehrer, F.; Bugge, M. K.; Bulekov, O.; Bullock, D.; Burckhart, H.; Burdin, S.; Burgard, C. D.; Burghgrave, B.; Burka, K.; Burke, S.; Burmeister, I.; Burr, J. T. P.; Busato, E.; Büscher, D.; Büscher, V.; Bussey, P.; Butler, J. M.; Buttar, C. M.; Butterworth, J. M.; Butti, P.; Buttinger, W.; Buzatu, A.; Buzykaev, A. R.; Cabrera Urbán, S.; Caforio, D.; Cairo, V. M.; Cakir, O.; Calace, N.; Calafiura, P.; Calandri, A.; Calderini, G.; Calfayan, P.; Callea, G.; Caloba, L. P.; Calvente Lopez, S.; Calvet, D.; Calvet, S.; Calvet, T. P.; Camacho Toro, R.; Camarda, S.; Camarri, P.; Cameron, D.; Caminal Armadans, R.; Camincher, C.; Campana, S.; Campanelli, M.; Camplani, A.; Campoverde, A.; Canale, V.; Canepa, A.; Cano Bret, M.; Cantero, J.; Cao, T.; Capeans Garrido, M. D. M.; Caprini, I.; Caprini, M.; Capua, M.; Carbone, R. M.; Cardarelli, R.; Cardillo, F.; Carli, I.; Carli, T.; Carlino, G.; Carminati, L.; Carney, R. M. D.; Caron, S.; Carquin, E.; Carrillo-Montoya, G. D.; Carter, J. R.; Carvalho, J.; Casadei, D.; Casado, M. P.; Casolino, M.; Casper, D. W.; Castaneda-Miranda, E.; Castelijn, R.; Castelli, A.; Castillo Gimenez, V.; Castro, N. F.; Catinaccio, A.; Catmore, J. R.; Cattai, A.; Caudron, J.; Cavaliere, V.; Cavallaro, E.; Cavalli, D.; Cavalli-Sforza, M.; Cavasinni, V.; Ceradini, F.; Cerda Alberich, L.; Cerqueira, A. S.; Cerri, A.; Cerrito, L.; Cerutti, F.; Cerv, M.; Cervelli, A.; Cetin, S. A.; Chafaq, A.; Chakraborty, D.; Chan, S. K.; Chan, Y. L.; Chang, P.; Chapman, J. D.; Charlton, D. G.; Chatterjee, A.; Chau, C. C.; Chavez Barajas, C. A.; Che, S.; Cheatham, S.; Chegwidden, A.; Chekanov, S.; Chekulaev, S. V.; Chelkov, G. A.; Chelstowska, M. A.; Chen, C.; Chen, H.; Chen, K.; Chen, S.; Chen, S.; Chen, X.; Chen, Y.; Cheng, H. C.; Cheng, H. J.; Cheng, Y.; Cheplakov, A.; Cheremushkina, E.; Cherkaoui El Moursli, R.; Chernyatin, V.; Cheu, E.; Chevalier, L.; Chiarella, V.; Chiarelli, G.; Chiodini, G.; Chisholm, A. S.; Chitan, A.; Chizhov, M. V.; Choi, K.; Chomont, A. R.; Chouridou, S.; Chow, B. K. B.; Christodoulou, V.; Chromek-Burckhart, D.; Chudoba, J.; Chuinard, A. J.; Chwastowski, J. J.; Chytka, L.; Ciapetti, G.; Ciftci, A. K.; Cinca, D.; Cindro, V.; Cioara, I. A.; Ciocca, C.; Ciocio, A.; Cirotto, F.; Citron, Z. H.; Citterio, M.; Ciubancan, M.; Clark, A.; Clark, B. L.; Clark, M. R.; Clark, P. J.; Clarke, R. N.; Clement, C.; Coadou, Y.; Cobal, M.; Coccaro, A.; Cochran, J.; Colasurdo, L.; Cole, B.; Colijn, A. P.; Collot, J.; Colombo, T.; Compostella, G.; Conde Muiño, P.; Coniavitis, E.; Connell, S. H.; Connelly, I. A.; Consorti, V.; Constantinescu, S.; Conti, G.; Conventi, F.; Cooke, M.; Cooper, B. D.; Cooper-Sarkar, A. M.; Cormier, K. J. R.; Cornelissen, T.; Corradi, M.; Corriveau, F.; Cortes-Gonzalez, A.; Cortiana, G.; Costa, G.; Costa, M. J.; Costanzo, D.; Cottin, G.; Cowan, G.; Cox, B. E.; Cranmer, K.; Crawley, S. J.; Cree, G.; Crépé-Renaudin, S.; Crescioli, F.; Cribbs, W. A.; Crispin Ortuzar, M.; Cristinziani, M.; Croft, V.; Crosetti, G.; Cueto, A.; Cuhadar Donszelmann, T.; Cummings, J.; Curatolo, M.; Cúth, J.; Czirr, H.; Czodrowski, P.; D'Amen, G.; D'Auria, S.; D'Onofrio, M.; da Cunha Sargedas de Sousa, M. J.; da Via, C.; Dabrowski, W.; Dado, T.; Dai, T.; Dale, O.; Dallaire, F.; Dallapiccola, C.; Dam, M.; Dandoy, J. R.; Dang, N. P.; Daniells, A. C.; Dann, N. S.; Danninger, M.; Dano Hoffmann, M.; Dao, V.; Darbo, G.; Darmora, S.; Dassoulas, J.; Dattagupta, A.; Davey, W.; David, C.; Davidek, T.; Davies, M.; Davison, P.; Dawe, E.; Dawson, I.; de, K.; de Asmundis, R.; de Benedetti, A.; de Castro, S.; de Cecco, S.; de Groot, N.; de Jong, P.; de la Torre, H.; de Lorenzi, F.; de Maria, A.; de Pedis, D.; de Salvo, A.; de Sanctis, U.; de Santo, A.; de Vivie de Regie, J. B.; Dearnaley, W. J.; Debbe, R.; Debenedetti, C.; Dedovich, D. V.; Dehghanian, N.; Deigaard, I.; Del Gaudio, M.; Del Peso, J.; Del Prete, T.; Delgove, D.; Deliot, F.; Delitzsch, C. M.; Dell'Acqua, A.; Dell'Asta, L.; Dell'Orso, M.; Della Pietra, M.; Della Volpe, D.; Delmastro, M.; Delsart, P. A.; Demarco, D. A.; Demers, S.; Demichev, M.; Demilly, A.; Denisov, S. P.; Denysiuk, D.; Derendarz, D.; Derkaoui, J. E.; Derue, F.; Dervan, P.; Desch, K.; Deterre, C.; Dette, K.; Deviveiros, P. O.; Dewhurst, A.; Dhaliwal, S.; di Ciaccio, A.; di Ciaccio, L.; di Clemente, W. K.; di Donato, C.; di Girolamo, A.; di Girolamo, B.; di Micco, B.; di Nardo, R.; di Simone, A.; di Sipio, R.; di Valentino, D.; Diaconu, C.; Diamond, M.; Dias, F. A.; Diaz, M. A.; Diehl, E. B.; Dietrich, J.; Díez Cornell, S.; Dimitrievska, A.; Dingfelder, J.; Dita, P.; Dita, S.; Dittus, F.; Djama, F.; Djobava, T.; Djuvsland, J. I.; Do Vale, M. A. B.; Dobos, D.; Dobre, M.; Doglioni, C.; Dolejsi, J.; Dolezal, Z.; Donadelli, M.; Donati, S.; Dondero, P.; Donini, J.; Dopke, J.; Doria, A.; Dova, M. T.; Doyle, A. T.; Drechsler, E.; Dris, M.; Du, Y.; Duarte-Campderros, J.; Duchovni, E.; Duckeck, G.; Ducu, O. A.; Duda, D.; Dudarev, A.; Chr. Dudder, A.; Duffield, E. M.; Duflot, L.; Dührssen, M.; Dumancic, M.; Dunford, M.; Duran Yildiz, H.; Düren, M.; Durglishvili, A.; Duschinger, D.; Dutta, B.; Dyndal, M.; Eckardt, C.; Ecker, K. M.; Edgar, R. C.; Edwards, N. C.; Eifert, T.; Eigen, G.; Einsweiler, K.; Ekelof, T.; El Kacimi, M.; Ellajosyula, V.; Ellert, M.; Elles, S.; Ellinghaus, F.; Elliot, A. A.; Ellis, N.; Elmsheuser, J.; Elsing, M.; Emeliyanov, D.; Enari, Y.; Endner, O. C.; Ennis, J. S.; Erdmann, J.; Ereditato, A.; Ernis, G.; Ernst, J.; Ernst, M.; Errede, S.; Ertel, E.; Escalier, M.; Esch, H.; Escobar, C.; Esposito, B.; Etienvre, A. I.; Etzion, E.; Evans, H.; Ezhilov, A.; Ezzi, M.; Fabbri, F.; Fabbri, L.; Facini, G.; Fakhrutdinov, R. M.; Falciano, S.; Falla, R. J.; Faltova, J.; Fang, Y.; Fanti, M.; Farbin, A.; Farilla, A.; Farina, C.; Farina, E. M.; Farooque, T.; Farrell, S.; Farrington, S. M.; Farthouat, P.; Fassi, F.; Fassnacht, P.; Fassouliotis, D.; Faucci Giannelli, M.; Favareto, A.; Fawcett, W. J.; Fayard, L.; Fedin, O. L.; Fedorko, W.; Feigl, S.; Feligioni, L.; Feng, C.; Feng, E. J.; Feng, H.; Fenyuk, A. B.; Feremenga, L.; Fernandez Martinez, P.; Fernandez Perez, S.; Ferrando, J.; Ferrari, A.; Ferrari, P.; Ferrari, R.; Ferreira de Lima, D. E.; Ferrer, A.; Ferrere, D.; Ferretti, C.; Ferretto Parodi, A.; Fiedler, F.; Filipčič, A.; Filipuzzi, M.; Filthaut, F.; Fincke-Keeler, M.; Finelli, K. D.; Fiolhais, M. C. N.; Fiorini, L.; Firan, A.; Fischer, A.; Fischer, C.; Fischer, J.; Fisher, W. C.; Flaschel, N.; Fleck, I.; Fleischmann, P.; Fletcher, G. T.; Fletcher, R. R. M.; Flick, T.; Flores Castillo, L. R.; Flowerdew, M. J.; Forcolin, G. T.; Formica, A.; Forti, A.; Foster, A. G.; Fournier, D.; Fox, H.; Fracchia, S.; Francavilla, P.; Franchini, M.; Francis, D.; Franconi, L.; Franklin, M.; Frate, M.; Fraternali, M.; Freeborn, D.; Fressard-Batraneanu, S. M.; Friedrich, F.; Froidevaux, D.; Frost, J. A.; Fukunaga, C.; Fullana Torregrosa, E.; Fusayasu, T.; Fuster, J.; Gabaldon, C.; Gabizon, O.; Gabrielli, A.; Gabrielli, A.; Gach, G. P.; Gadatsch, S.; Gadomski, S.; Gagliardi, G.; Gagnon, L. G.; Gagnon, P.; Galea, C.; Galhardo, B.; Gallas, E. J.; Gallop, B. J.; Gallus, P.; Galster, G.; Gan, K. K.; Ganguly, S.; Gao, J.; Gao, Y.; Gao, Y. S.; Garay Walls, F. M.; García, C.; García Navarro, J. E.; Garcia-Sciveres, M.; Gardner, R. W.; Garelli, N.; Garonne, V.; Gascon Bravo, A.; Gasnikova, K.; Gatti, C.; Gaudiello, A.; Gaudio, G.; Gauthier, L.; Gavrilenko, I. L.; Gay, C.; Gaycken, G.; Gazis, E. N.; Gecse, Z.; Gee, C. N. P.; Geich-Gimbel, Ch.; Geisen, M.; Geisler, M. P.; Gellerstedt, K.; Gemme, C.; Genest, M. H.; Geng, C.; Gentile, S.; Gentsos, C.; George, S.; Gerbaudo, D.; Gershon, A.; Ghasemi, S.; Ghneimat, M.; Giacobbe, B.; Giagu, S.; Giannetti, P.; Gibbard, B.; Gibson, S. M.; Gignac, M.; Gilchriese, M.; Gillam, T. P. S.; Gillberg, D.; Gilles, G.; Gingrich, D. M.; Giokaris, N.; Giordani, M. P.; Giorgi, F. M.; Giorgi, F. M.; Giraud, P. F.; Giromini, P.; Giugni, D.; Giuli, F.; Giuliani, C.; Giulini, M.; Gjelsten, B. K.; Gkaitatzis, S.; Gkialas, I.; Gkougkousis, E. L.; Gladilin, L. K.; Glasman, C.; Glatzer, J.; Glaysher, P. C. F.; Glazov, A.; Goblirsch-Kolb, M.; Godlewski, J.; Goldfarb, S.; Golling, T.; Golubkov, D.; Gomes, A.; Gonçalo, R.; Goncalves Pinto Firmino da Costa, J.; Gonella, G.; Gonella, L.; Gongadze, A.; González de La Hoz, S.; Gonzalez-Sevilla, S.; Goossens, L.; Gorbounov, P. A.; Gordon, H. A.; Gorelov, I.; Gorini, B.; Gorini, E.; Gorišek, A.; Gornicki, E.; Goshaw, A. T.; Gössling, C.; Gostkin, M. I.; Goudet, C. R.; Goujdami, D.; Goussiou, A. G.; Govender, N.; Gozani, E.; Graber, L.; Grabowska-Bold, I.; Gradin, P. O. J.; Grafström, P.; Gramling, J.; Gramstad, E.; Grancagnolo, S.; Gratchev, V.; Gravila, P. M.; Gray, H. M.; Graziani, E.; Greenwood, Z. D.; Grefe, C.; Gregersen, K.; Gregor, I. M.; Grenier, P.; Grevtsov, K.; Griffiths, J.; Grillo, A. A.; Grimm, K.; Grinstein, S.; Gris, Ph.; Grivaz, J.-F.; Groh, S.; Gross, E.; Grosse-Knetter, J.; Grossi, G. C.; Grout, Z. J.; Guan, L.; Guan, W.; Guenther, J.; Guescini, F.; Guest, D.; Gueta, O.; Gui, B.; Guido, E.; Guillemin, T.; Guindon, S.; Gul, U.; Gumpert, C.; Guo, J.; Guo, Y.; Gupta, R.; Gupta, S.; Gustavino, G.; Gutierrez, P.; Gutierrez Ortiz, N. G.; Gutschow, C.; Guyot, C.; Gwenlan, C.; Gwilliam, C. B.; Haas, A.; Haber, C.; Hadavand, H. K.; Haddad, N.; Hadef, A.; Hageböck, S.; Hagihara, M.; Hajduk, Z.; Hakobyan, H.; Haleem, M.; Haley, J.; Halladjian, G.; Hallewell, G. D.; Hamacher, K.; Hamal, P.; Hamano, K.; Hamilton, A.; Hamity, G. N.; Hamnett, P. G.; Han, L.; Hanagaki, K.; Hanawa, K.; Hance, M.; Haney, B.; Hanke, P.; Hanna, R.; Hansen, J. B.; Hansen, J. D.; Hansen, M. C.; Hansen, P. H.; Hara, K.; Hard, A. S.; Harenberg, T.; Hariri, F.; Harkusha, S.; Harrington, R. D.; Harrison, P. F.; Hartjes, F.; Hartmann, N. M.; Hasegawa, M.; Hasegawa, Y.; Hasib, A.; Hassani, S.; Haug, S.; Hauser, R.; Hauswald, L.; Havranek, M.; Hawkes, C. M.; Hawkings, R. J.; Hayakawa, D.; Hayden, D.; Hays, C. P.; Hays, J. M.; Hayward, H. S.; Haywood, S. J.; Head, S. J.; Heck, T.; Hedberg, V.; Heelan, L.; Heim, S.; Heim, T.; Heinemann, B.; Heinrich, J. J.; Heinrich, L.; Heinz, C.; Hejbal, J.; Helary, L.; Hellman, S.; Helsens, C.; Henderson, J.; Henderson, R. C. W.; Heng, Y.; Henkelmann, S.; Henriques Correia, A. M.; Henrot-Versille, S.; Herbert, G. H.; Herde, H.; Herget, V.; Hernández Jiménez, Y.; Herten, G.; Hertenberger, R.; Hervas, L.; Hesketh, G. G.; Hessey, N. P.; Hetherly, J. W.; Hickling, R.; Higón-Rodriguez, E.; Hill, E.; Hill, J. C.; Hiller, K. H.; Hillier, S. J.; Hinchliffe, I.; Hines, E.; Hinman, R. R.; Hirose, M.; Hirschbuehl, D.; Hobbs, J.; Hod, N.; Hodgkinson, M. C.; Hodgson, P.; Hoecker, A.; Hoeferkamp, M. R.; Hoenig, F.; Hohn, D.; Holmes, T. R.; Homann, M.; Honda, T.; Hong, T. M.; Hooberman, B. H.; Hopkins, W. H.; Horii, Y.; Horton, A. J.; Hostachy, J.-Y.; Hou, S.; Hoummada, A.; Howarth, J.; Hoya, J.; Hrabovsky, M.; Hristova, I.; Hrivnac, J.; Hryn'ova, T.; Hrynevich, A.; Hsu, C.; Hsu, P. J.; Hsu, S.-C.; Hu, Q.; Hu, S.; Huang, Y.; Hubacek, Z.; Hubaut, F.; Huegging, F.; Huffman, T. B.; Hughes, E. W.; Hughes, G.; Huhtinen, M.; Huo, P.; Huseynov, N.; Huston, J.; Huth, J.; Iacobucci, G.; Iakovidis, G.; Ibragimov, I.; Iconomidou-Fayard, L.; Ideal, E.; Idrissi, Z.; Iengo, P.; Igonkina, O.; Iizawa, T.; Ikegami, Y.; Ikeno, M.; Ilchenko, Y.; Iliadis, D.; Ilic, N.; Ince, T.; Introzzi, G.; Ioannou, P.; Iodice, M.; Iordanidou, K.; Ippolito, V.; Ishijima, N.; Ishino, M.; Ishitsuka, M.; Ishmukhametov, R.; Issever, C.; Istin, S.; Ito, F.; Iturbe Ponce, J. M.; Iuppa, R.; Iwanski, W.; Iwasaki, H.; Izen, J. M.; Izzo, V.; Jabbar, S.; Jackson, B.; Jackson, P.; Jain, V.; Jakobi, K. B.; Jakobs, K.; Jakobsen, S.; Jakoubek, T.; Jamin, D. O.; Jana, D. K.; Jansky, R.; Janssen, J.; Janus, M.; Jarlskog, G.; Javadov, N.; Javůrek, T.; Jeanneau, F.; Jeanty, L.; Jeng, G.-Y.; Jennens, D.; Jenni, P.; Jeske, C.; Jézéquel, S.; Ji, H.; Jia, J.; Jiang, H.; Jiang, Y.; Jiang, Z.; Jiggins, S.; Jimenez Pena, J.; Jin, S.; Jinaru, A.; Jinnouchi, O.; Jivan, H.; Johansson, P.; Johns, K. A.; Johnson, W. J.; Jon-And, K.; Jones, G.; Jones, R. W. L.; Jones, S.; Jones, T. J.; Jongmanns, J.; Jorge, P. M.; Jovicevic, J.; Ju, X.; Juste Rozas, A.; Köhler, M. K.; Kaczmarska, A.; Kado, M.; Kagan, H.; Kagan, M.; Kahn, S. J.; Kaji, T.; Kajomovitz, E.; Kalderon, C. W.; Kaluza, A.; Kama, S.; Kamenshchikov, A.; Kanaya, N.; Kaneti, S.; Kanjir, L.; Kantserov, V. A.; Kanzaki, J.; Kaplan, B.; Kaplan, L. S.; Kapliy, A.; Kar, D.; Karakostas, K.; Karamaoun, A.; Karastathis, N.; Kareem, M. J.; Karentzos, E.; Karnevskiy, M.; Karpov, S. N.; Karpova, Z. M.; Karthik, K.; Kartvelishvili, V.; Karyukhin, A. N.; Kasahara, K.; Kashif, L.; Kass, R. D.; Kastanas, A.; Kataoka, Y.; Kato, C.; Katre, A.; Katzy, J.; Kawade, K.; Kawagoe, K.; Kawamoto, T.; Kawamura, G.; Kazanin, V. F.; Keeler, R.; Kehoe, R.; Keller, J. S.; Kempster, J. J.; Keoshkerian, H.; Kepka, O.; Kerševan, B. P.; Kersten, S.; Keyes, R. A.; Khader, M.; Khalil-Zada, F.; Khanov, A.; Kharlamov, A. G.; Kharlamova, T.; Khoo, T. J.; Khovanskiy, V.; Khramov, E.; Khubua, J.; Kido, S.; Kilby, C. R.; Kim, H. Y.; Kim, S. H.; Kim, Y. K.; Kimura, N.; Kind, O. M.; King, B. T.; King, M.; Kirk, J.; Kiryunin, A. E.; Kishimoto, T.; Kisielewska, D.; Kiss, F.; Kiuchi, K.; Kivernyk, O.; Kladiva, E.; Klein, M. H.; Klein, M.; Klein, U.; Kleinknecht, K.; Klimek, P.; Klimentov, A.; Klingenberg, R.; Klinger, J. A.; Klioutchnikova, T.; Kluge, E.-E.; Kluit, P.; Kluth, S.; Knapik, J.; Kneringer, E.; Knoops, E. B. F. G.; Knue, A.; Kobayashi, A.; Kobayashi, D.; Kobayashi, T.; Kobel, M.; Kocian, M.; Kodys, P.; Koehler, N. M.; Koffas, T.; Koffeman, E.; Koi, T.; Kolanoski, H.; Kolb, M.; Koletsou, I.; Komar, A. A.; Komori, Y.; Kondo, T.; Kondrashova, N.; Köneke, K.; König, A. C.; Kono, T.; Konoplich, R.; Konstantinidis, N.; Kopeliansky, R.; Koperny, S.; Köpke, L.; Kopp, A. K.; Korcyl, K.; Kordas, K.; Korn, A.; Korol, A. A.; Korolkov, I.; Korolkova, E. V.; Kortner, O.; Kortner, S.; Kosek, T.; Kostyukhin, V. V.; Kotwal, A.; Koulouris, A.; Kourkoumeli-Charalampidi, A.; Kourkoumelis, C.; Kouskoura, V.; Kowalewska, A. B.; Kowalewski, R.; Kowalski, T. Z.; Kozakai, C.; Kozanecki, W.; Kozhin, A. S.; Kramarenko, V. A.; Kramberger, G.; Krasnopevtsev, D.; Krasny, M. W.; Krasznahorkay, A.; Kravchenko, A.; Kretz, M.; Kretzschmar, J.; Kreutzfeldt, K.; Krieger, P.; Krizka, K.; Kroeninger, K.; Kroha, H.; Kroll, J.; Kroseberg, J.; Krstic, J.; Kruchonak, U.; Krüger, H.; Krumnack, N.; Kruse, M. C.; Kruskal, M.; Kubota, T.; Kucuk, H.; Kuday, S.; Kuechler, J. T.; Kuehn, S.; Kugel, A.; Kuger, F.; Kuhl, A.; Kuhl, T.; Kukhtin, V.; Kukla, R.; Kulchitsky, Y.; Kuleshov, S.; Kuna, M.; Kunigo, T.; Kupco, A.; Kurashige, H.; Kurochkin, Y. A.; Kus, V.; Kuwertz, E. S.; Kuze, M.; Kvita, J.; Kwan, T.; Kyriazopoulos, D.; La Rosa, A.; La Rosa Navarro, J. L.; La Rotonda, L.; Lacasta, C.; Lacava, F.; Lacey, J.; Lacker, H.; Lacour, D.; Lacuesta, V. R.; Ladygin, E.; Lafaye, R.; Laforge, B.; Lagouri, T.; Lai, S.; Lammers, S.; Lampl, W.; Lançon, E.; Landgraf, U.; Landon, M. P. J.; Lanfermann, M. C.; Lang, V. S.; Lange, J. C.; Lankford, A. J.; Lanni, F.; Lantzsch, K.; Lanza, A.; Laplace, S.; Lapoire, C.; Laporte, J. F.; Lari, T.; Lasagni Manghi, F.; Lassnig, M.; Laurelli, P.; Lavrijsen, W.; Law, A. T.; Laycock, P.; Lazovich, T.; Lazzaroni, M.; Le, B.; Le Dortz, O.; Le Guirriec, E.; Le Quilleuc, E. P.; Leblanc, M.; Lecompte, T.; Ledroit-Guillon, F.; Lee, C. A.; Lee, S. C.; Lee, L.; Lefebvre, B.; Lefebvre, G.; Lefebvre, M.; Legger, F.; Leggett, C.; Lehan, A.; Lehmann Miotto, G.; Lei, X.; Leight, W. A.; Leister, A. G.; Leite, M. A. L.; Leitner, R.; Lellouch, D.; Lemmer, B.; Leney, K. J. C.; Lenz, T.; Lenzi, B.; Leone, R.; Leone, S.; Leonidopoulos, C.; Leontsinis, S.; Lerner, G.; Leroy, C.; Lesage, A. A. J.; Lester, C. G.; Levchenko, M.; Levêque, J.; Levin, D.; Levinson, L. J.; Levy, M.; Lewis, D.; Leyko, A. M.; Leyton, M.; Li, B.; Li, C.; Li, H.; Li, H. L.; Li, L.; Li, L.; Li, Q.; Li, S.; Li, X.; Li, Y.; Liang, Z.; Liberti, B.; Liblong, A.; Lichard, P.; Lie, K.; Liebal, J.; Liebig, W.; Limosani, A.; Lin, S. C.; Lin, T. H.; Lindquist, B. E.; Lionti, A. E.; Lipeles, E.; Lipniacka, A.; Lisovyi, M.; Liss, T. M.; Lister, A.; Litke, A. M.; Liu, B.; Liu, D.; Liu, H.; Liu, H.; Liu, J.; Liu, J. B.; Liu, K.; Liu, L.; Liu, M.; Liu, M.; Liu, Y. L.; Liu, Y.; Livan, M.; Lleres, A.; Llorente Merino, J.; Lloyd, S. L.; Lo Sterzo, F.; Lobodzinska, E. M.; Loch, P.; Loebinger, F. K.; Loew, K. M.; Loginov, A.; Lohse, T.; Lohwasser, K.; Lokajicek, M.; Long, B. A.; Long, J. D.; Long, R. E.; Longo, L.; Looper, K. A.; López, J. A.; Lopez Mateos, D.; Lopez Paredes, B.; Lopez Paz, I.; Lopez Solis, A.; Lorenz, J.; Lorenzo Martinez, N.; Losada, M.; Lösel, P. J.; Lou, X.; Lounis, A.; Love, J.; Love, P. A.; Lu, H.; Lu, N.; Lubatti, H. J.; Luci, C.; Lucotte, A.; Luedtke, C.; Luehring, F.; Lukas, W.; Luminari, L.; Lundberg, O.; Lund-Jensen, B.; Luzi, P. M.; Lynn, D.; Lysak, R.; Lytken, E.; Lyubushkin, V.; Ma, H.; Ma, L. L.; Ma, Y.; Maccarrone, G.; Macchiolo, A.; MacDonald, C. M.; Maček, B.; Machado Miguens, J.; Madaffari, D.; Madar, R.; Maddocks, H. J.; Mader, W. F.; Madsen, A.; Maeda, J.; Maeland, S.; Maeno, T.; Maevskiy, A.; Magradze, E.; Mahlstedt, J.; Maiani, C.; Maidantchik, C.; Maier, A. A.; Maier, T.; Maio, A.; Majewski, S.; Makida, Y.; Makovec, N.; Malaescu, B.; Malecki, Pa.; Maleev, V. P.; Malek, F.; Mallik, U.; Malon, D.; Malone, C.; Malone, C.; Maltezos, S.; Malyukov, S.; Mamuzic, J.; Mancini, G.; Mandelli, L.; Mandić, I.; Maneira, J.; Manhaes de Andrade Filho, L.; Manjarres Ramos, J.; Mann, A.; Manousos, A.; Mansoulie, B.; Mansour, J. D.; Mantifel, R.; Mantoani, M.; Manzoni, S.; Mapelli, L.; Marceca, G.; March, L.; Marchiori, G.; Marcisovsky, M.; Marjanovic, M.; Marley, D. E.; Marroquim, F.; Marsden, S. P.; Marshall, Z.; Marti-Garcia, S.; Martin, B.; Martin, T. A.; Martin, V. J.; Martin Dit Latour, B.; Martinez, M.; Martinez Outschoorn, V. I.; Martin-Haugh, S.; Martoiu, V. S.; Martyniuk, A. C.; Marzin, A.; Masetti, L.; Mashimo, T.; Mashinistov, R.; Masik, J.; Maslennikov, A. L.; Massa, I.; Massa, L.; Mastrandrea, P.; Mastroberardino, A.; Masubuchi, T.; Mättig, P.; Mattmann, J.; Maurer, J.; Maxfield, S. J.; Maximov, D. A.; Mazini, R.; Maznas, I.; Mazza, S. M.; Mc Fadden, N. C.; Mc Goldrick, G.; Mc Kee, S. P.; McCarn, A.; McCarthy, R. L.; McCarthy, T. G.; McClymont, L. I.; McDonald, E. F.; McFayden, J. A.; McHedlidze, G.; McMahon, S. J.; McPherson, R. A.; Medinnis, M.; Meehan, S.; Mehlhase, S.; Mehta, A.; Meier, K.; Meineck, C.; Meirose, B.; Melini, D.; Mellado Garcia, B. R.; Melo, M.; Meloni, F.; Meng, X.; Mengarelli, A.; Menke, S.; Meoni, E.; Mergelmeyer, S.; Mermod, P.; Merola, L.; Meroni, C.; Merritt, F. S.; Messina, A.; Metcalfe, J.; Mete, A. S.; Meyer, C.; Meyer, C.; Meyer, J.-P.; Meyer, J.; Meyer Zu Theenhausen, H.; Miano, F.; Middleton, R. P.; Miglioranzi, S.; Mijović, L.; Mikenberg, G.; Mikestikova, M.; Mikuž, M.; Milesi, M.; Milic, A.; Miller, D. W.; Mills, C.; Milov, A.; Milstead, D. A.; Minaenko, A. A.; Minami, Y.; Minashvili, I. A.; Mincer, A. I.; Mindur, B.; Mineev, M.; Minegishi, Y.; Ming, Y.; Mir, L. M.; Mistry, K. P.; Mitani, T.; Mitrevski, J.; Mitsou, V. A.; Miucci, A.; Miyagawa, P. S.; Mjörnmark, J. U.; Mlynarikova, M.; Moa, T.; Mochizuki, K.; Mohapatra, S.; Molander, S.; Moles-Valls, R.; Monden, R.; Mondragon, M. C.; Mönig, K.; Monk, J.; Monnier, E.; Montalbano, A.; Montejo Berlingen, J.; Monticelli, F.; Monzani, S.; Moore, R. W.; Morange, N.; Moreno, D.; Moreno Llácer, M.; Morettini, P.; Morgenstern, S.; Mori, D.; Mori, T.; Morii, M.; Morinaga, M.; Morisbak, V.; Moritz, S.; Morley, A. K.; Mornacchi, G.; Morris, J. D.; Mortensen, S. S.; Morvaj, L.; Mosidze, M.; Moss, J.; Motohashi, K.; Mount, R.; Mountricha, E.; Moyse, E. J. W.; Muanza, S.; Mudd, R. D.; Mueller, F.; Mueller, J.; Mueller, R. S. P.; Mueller, T.; Muenstermann, D.; Mullen, P.; Mullier, G. A.; Munoz Sanchez, F. J.; Murillo Quijada, J. A.; Murray, W. J.; Musheghyan, H.; Muškinja, M.; Myagkov, A. G.; Myska, M.; Nachman, B. P.; Nackenhorst, O.; Nagai, K.; Nagai, R.; Nagano, K.; Nagasaka, Y.; Nagata, K.; Nagel, M.; Nagy, E.; Nairz, A. M.; Nakahama, Y.; Nakamura, K.; Nakamura, T.; Nakano, I.; Naranjo Garcia, R. F.; Narayan, R.; Narrias Villar, D. I.; Naryshkin, I.; Naumann, T.; Navarro, G.; Nayyar, R.; Neal, H. A.; Nechaeva, P. Yu.; Neep, T. J.; Negri, A.; Negrini, M.; Nektarijevic, S.; Nellist, C.; Nelson, A.; Nemecek, S.; Nemethy, P.; Nepomuceno, A. A.; Nessi, M.; Neubauer, M. S.; Neumann, M.; Neves, R. M.; Nevski, P.; Newman, P. R.; Nguyen, D. H.; Nguyen Manh, T.; Nickerson, R. B.; Nicolaidou, R.; Nielsen, J.; Nikiforov, A.; Nikolaenko, V.; Nikolic-Audit, I.; Nikolopoulos, K.; Nilsen, J. K.; Nilsson, P.; Ninomiya, Y.; Nisati, A.; Nisius, R.; Nobe, T.; Nomachi, M.; Nomidis, I.; Nooney, T.; Norberg, S.; Nordberg, M.; Norjoharuddeen, N.; Novgorodova, O.; Nowak, S.; Nozaki, M.; Nozka, L.; Ntekas, K.; Nurse, E.; Nuti, F.; O'Grady, F.; O'Neil, D. C.; O'Rourke, A. A.; O'Shea, V.; Oakham, F. G.; Oberlack, H.; Obermann, T.; Ocariz, J.; Ochi, A.; Ochoa, I.; Ochoa-Ricoux, J. P.; Oda, S.; Odaka, S.; Ogren, H.; Oh, A.; Oh, S. H.; Ohm, C. C.; Ohman, H.; Oide, H.; Okawa, H.; Okumura, Y.; Okuyama, T.; Olariu, A.; Oleiro Seabra, L. F.; Olivares Pino, S. A.; Oliveira Damazio, D.; Olszewski, A.; Olszowska, J.; Onofre, A.; Onogi, K.; Onyisi, P. U. E.; Oreglia, M. J.; Oren, Y.; Orestano, D.; Orlando, N.; Orr, R. S.; Osculati, B.; Ospanov, R.; Otero Y Garzon, G.; Otono, H.; Ouchrif, M.; Ould-Saada, F.; Ouraou, A.; Oussoren, K. P.; Ouyang, Q.; Owen, M.; Owen, R. E.; Ozcan, V. E.; Ozturk, N.; Pachal, K.; Pacheco Pages, A.; Pacheco Rodriguez, L.; Padilla Aranda, C.; Pagáčová, M.; Pagan Griso, S.; Paganini, M.; Paige, F.; Pais, P.; Pajchel, K.; Palacino, G.; Palazzo, S.; Palestini, S.; Palka, M.; Pallin, D.; St. Panagiotopoulou, E.; Pandini, C. E.; Panduro Vazquez, J. G.; Pani, P.; Panitkin, S.; Pantea, D.; Paolozzi, L.; Papadopoulou, Th. D.; Papageorgiou, K.; Paramonov, A.; Paredes Hernandez, D.; Parker, A. J.; Parker, M. A.; Parker, K. A.; Parodi, F.; Parsons, J. A.; Parzefall, U.; Pascuzzi, V. R.; Pasqualucci, E.; Passaggio, S.; Pastore, Fr.; Pásztor, G.; Pataraia, S.; Pater, J. R.; Pauly, T.; Pearce, J.; Pearson, B.; Pedersen, L. E.; Pedersen, M.; Pedraza Lopez, S.; Pedro, R.; Peleganchuk, S. V.; Penc, O.; Peng, C.; Peng, H.; Penwell, J.; Peralva, B. S.; Perego, M. M.; Perepelitsa, D. V.; Perez Codina, E.; Perini, L.; Pernegger, H.; Perrella, S.; Peschke, R.; Peshekhonov, V. D.; Peters, K.; Peters, R. F. Y.; Petersen, B. A.; Petersen, T. C.; Petit, E.; Petridis, A.; Petridou, C.; Petroff, P.; Petrolo, E.; Petrov, M.; Petrucci, F.; Pettersson, N. E.; Peyaud, A.; Pezoa, R.; Phillips, P. W.; Piacquadio, G.; Pianori, E.; Picazio, A.; Piccaro, E.; Piccinini, M.; Pickering, M. A.; Piegaia, R.; Pilcher, J. E.; Pilkington, A. D.; Pin, A. W. J.; Pinamonti, M.; Pinfold, J. L.; Pingel, A.; Pires, S.; Pirumov, H.; Pitt, M.; Plazak, L.; Pleier, M.-A.; Pleskot, V.; Plotnikova, E.; Plucinski, P.; Pluth, D.; Poettgen, R.; Poggioli, L.; Pohl, D.; Polesello, G.; Poley, A.; Policicchio, A.; Polifka, R.; Polini, A.; Pollard, C. S.; Polychronakos, V.; Pommès, K.; Pontecorvo, L.; Pope, B. G.; Popeneciu, G. A.; Poppleton, A.; Pospisil, S.; Potamianos, K.; Potrap, I. N.; Potter, C. J.; Potter, C. T.; Poulard, G.; Poveda, J.; Pozdnyakov, V.; Pozo Astigarraga, M. E.; Pralavorio, P.; Pranko, A.; Prell, S.; Price, D.; Price, L. E.; Primavera, M.; Prince, S.; Prokofiev, K.; Prokoshin, F.; Protopopescu, S.; Proudfoot, J.; Przybycien, M.; Puddu, D.; Purohit, M.; Puzo, P.; Qian, J.; Qin, G.; Qin, Y.; Quadt, A.; Quayle, W. B.; Queitsch-Maitland, M.; Quilty, D.; Raddum, S.; Radeka, V.; Radescu, V.; Radhakrishnan, S. K.; Radloff, P.; Rados, P.; Ragusa, F.; Rahal, G.; Raine, J. A.; Rajagopalan, S.; Rammensee, M.; Rangel-Smith, C.; Ratti, M. G.; Rauch, D. M.; Rauscher, F.; Rave, S.; Ravenscroft, T.; Ravinovich, I.; Raymond, M.; Read, A. L.; Readioff, N. P.; Reale, M.; Rebuzzi, D. M.; Redelbach, A.; Redlinger, G.; Reece, R.; Reed, R. G.; Reeves, K.; Rehnisch, L.; Reichert, J.; Reiss, A.; Rembser, C.; Ren, H.; Rescigno, M.; Resconi, S.; Rezanova, O. L.; Reznicek, P.; Rezvani, R.; Richter, R.; Richter, S.; Richter-Was, E.; Ricken, O.; Ridel, M.; Rieck, P.; Riegel, C. J.; Rieger, J.; Rifki, O.; Rijssenbeek, M.; Rimoldi, A.; Rimoldi, M.; Rinaldi, L.; Ristić, B.; Ritsch, E.; Riu, I.; Rizatdinova, F.; Rizvi, E.; Rizzi, C.; Robertson, S. H.; Robichaud-Veronneau, A.; Robinson, D.; Robinson, J. E. M.; Robson, A.; Roda, C.; Rodina, Y.; Rodriguez Perez, A.; Rodriguez Rodriguez, D.; Roe, S.; Rogan, C. S.; Røhne, O.; Roloff, J.; Romaniouk, A.; Romano, M.; Romano Saez, S. M.; Romero Adam, E.; Rompotis, N.; Ronzani, M.; Roos, L.; Ros, E.; Rosati, S.; Rosbach, K.; Rose, P.; Rosien, N.-A.; Rossetti, V.; Rossi, E.; Rossi, L. P.; Rosten, J. H. N.; Rosten, R.; Rotaru, M.; Roth, I.; Rothberg, J.; Rousseau, D.; Rozanov, A.; Rozen, Y.; Ruan, X.; Rubbo, F.; Rudolph, M. S.; Rühr, F.; Ruiz-Martinez, A.; Rurikova, Z.; Rusakovich, N. A.; Ruschke, A.; Russell, H. L.; Rutherfoord, J. P.; Ruthmann, N.; Ryabov, Y. F.; Rybar, M.; Rybkin, G.; Ryu, S.; Ryzhov, A.; Rzehorz, G. F.; Saavedra, A. F.; Sabato, G.; Sacerdoti, S.; Sadrozinski, H. F.-W.; Sadykov, R.; Safai Tehrani, F.; Saha, P.; Sahinsoy, M.; Saimpert, M.; Saito, T.; Sakamoto, H.; Sakurai, Y.; Salamanna, G.; Salamon, A.; Salazar Loyola, J. E.; Salek, D.; Sales de Bruin, P. H.; Salihagic, D.; Salnikov, A.; Salt, J.; Salvatore, D.; Salvatore, F.; Salvucci, A.; Salzburger, A.; Sammel, D.; Sampsonidis, D.; Sánchez, J.; Sanchez Martinez, V.; Sanchez Pineda, A.; Sandaker, H.; Sandbach, R. L.; Sandhoff, M.; Sandoval, C.; Sankey, D. P. C.; Sannino, M.; Sansoni, A.; Santoni, C.; Santonico, R.; Santos, H.; Santoyo Castillo, I.; Sapp, K.; Sapronov, A.; Saraiva, J. G.; Sarrazin, B.; Sasaki, O.; Sato, K.; Sauvan, E.; Savage, G.; Savard, P.; Savic, N.; Sawyer, C.; Sawyer, L.; Saxon, J.; Sbarra, C.; Sbrizzi, A.; Scanlon, T.; Scannicchio, D. A.; Scarcella, M.; Scarfone, V.; Schaarschmidt, J.; Schacht, P.; Schachtner, B. M.; Schaefer, D.; Schaefer, L.; Schaefer, R.; Schaeffer, J.; Schaepe, S.; Schaetzel, S.; Schäfer, U.; Schaffer, A. C.; Schaile, D.; Schamberger, R. D.; Scharf, V.; Schegelsky, V. A.; Scheirich, D.; Schernau, M.; Schiavi, C.; Schier, S.; Schillo, C.; Schioppa, M.; Schlenker, S.; Schmidt-Sommerfeld, K. R.; Schmieden, K.; Schmitt, C.; Schmitt, S.; Schmitz, S.; Schneider, B.; Schnoor, U.; Schoeffel, L.; Schoening, A.; Schoenrock, B. D.; Schopf, E.; Schott, M.; Schouwenberg, J. F. P.; Schovancova, J.; Schramm, S.; Schreyer, M.; Schuh, N.; Schulte, A.; Schultens, M. J.; Schultz-Coulon, H.-C.; Schulz, H.; Schumacher, M.; Schumm, B. A.; Schune, Ph.; Schwartzman, A.; Schwarz, T. A.; Schweiger, H.; Schwemling, Ph.; Schwienhorst, R.; Schwindling, J.; Schwindt, T.; Sciolla, G.; Scuri, F.; Scutti, F.; Searcy, J.; Seema, P.; Seidel, S. C.; Seiden, A.; Seifert, F.; Seixas, J. M.; Sekhniaidze, G.; Sekhon, K.; Sekula, S. J.; Seliverstov, D. M.; Semprini-Cesari, N.; Serfon, C.; Serin, L.; Serkin, L.; Sessa, M.; Seuster, R.; Severini, H.; Sfiligoj, T.; Sforza, F.; Sfyrla, A.; Shabalina, E.; Shaikh, N. W.; Shan, L. Y.; Shang, R.; Shank, J. T.; Shapiro, M.; Shatalov, P. B.; Shaw, K.; Shaw, S. M.; Shcherbakova, A.; Shehu, C. Y.; Sherwood, P.; Shi, L.; Shimizu, S.; Shimmin, C. O.; Shimojima, M.; Shirabe, S.; Shiyakova, M.; Shmeleva, A.; Shoaleh Saadi, D.; Shochet, M. J.; Shojaii, S.; Shope, D. R.; Shrestha, S.; Shulga, E.; Shupe, M. A.; Sicho, P.; Sickles, A. M.; Sidebo, P. E.; Sideras Haddad, E.; Sidiropoulou, O.; Sidorov, D.; Sidoti, A.; Siegert, F.; Sijacki, Dj.; Silva, J.; Silverstein, S. B.; Simak, V.; Simic, Lj.; Simion, S.; Simioni, E.; Simmons, B.; Simon, D.; Simon, M.; Sinervo, P.; Sinev, N. B.; Sioli, M.; Siragusa, G.; Sivoklokov, S. Yu.; Sjölin, J.; Skinner, M. B.; Skottowe, H. P.; Skubic, P.; Slater, M.; Slavicek, T.; Slawinska, M.; Sliwa, K.; Slovak, R.; Smakhtin, V.; Smart, B. H.; Smestad, L.; Smiesko, J.; Smirnov, S. Yu.; Smirnov, Y.; Smirnova, L. N.; Smirnova, O.; Smith, M. N. K.; Smith, R. W.; Smizanska, M.; Smolek, K.; Snesarev, A. A.; Snyder, I. M.; Snyder, S.; Sobie, R.; Socher, F.; Soffer, A.; Soh, D. A.; Sokhrannyi, G.; Solans Sanchez, C. A.; Solar, M.; Soldatov, E. Yu.; Soldevila, U.; Solodkov, A. A.; Soloshenko, A.; Solovyanov, O. V.; Solovyev, V.; Sommer, P.; Son, H.; Song, H. Y.; Sood, A.; Sopczak, A.; Sopko, V.; Sorin, V.; Sosa, D.; Sotiropoulou, C. L.; Soualah, R.; Soukharev, A. M.; South, D.; Sowden, B. C.; Spagnolo, S.; Spalla, M.; Spangenberg, M.; Spannowsky, M.; Spanò, F.; Sperlich, D.; Spettel, F.; Spighi, R.; Spigo, G.; Spiller, L. A.; Spousta, M.; St. Denis, R. D.; Stabile, A.; Stamen, R.; Stamm, S.; Stanecka, E.; Stanek, R. W.; Stanescu, C.; Stanescu-Bellu, M.; Stanitzki, M. M.; Stapnes, S.; Starchenko, E. A.; Stark, G. H.; Stark, J.; Staroba, P.; Starovoitov, P.; Stärz, S.; Staszewski, R.; Steinberg, P.; Stelzer, B.; Stelzer, H. J.; Stelzer-Chilton, O.; Stenzel, H.; Stewart, G. A.; Stillings, J. A.; Stockton, M. C.; Stoebe, M.; Stoicea, G.; Stolte, P.; Stonjek, S.; Stradling, A. R.; Straessner, A.; Stramaglia, M. E.; Strandberg, J.; Strandberg, S.; Strandlie, A.; Strauss, M.; Strizenec, P.; Ströhmer, R.; Strom, D. M.; Stroynowski, R.; Strubig, A.; Stucci, S. A.; Stugu, B.; Styles, N. A.; Su, D.; Su, J.; Suchek, S.; Sugaya, Y.; Suk, M.; Sulin, V. V.; Sultansoy, S.; Sumida, T.; Sun, S.; Sun, X.; Sundermann, J. E.; Suruliz, K.; Susinno, G.; Sutton, M. R.; Suzuki, S.; Svatos, M.; Swiatlowski, M.; Sykora, I.; Sykora, T.; Ta, D.; Taccini, C.; Tackmann, K.; Taenzer, J.; Taffard, A.; Tafirout, R.; Taiblum, N.; Takai, H.; Takashima, R.; Takeshita, T.; Takubo, Y.; Talby, M.; Talyshev, A. A.; Tan, K. G.; Tanaka, J.; Tanaka, M.; Tanaka, R.; Tanaka, S.; Tanioka, R.; Tannenwald, B. B.; Tapia Araya, S.; Tapprogge, S.; Tarem, S.; Tartarelli, G. F.; Tas, P.; Tasevsky, M.; Tashiro, T.; Tassi, E.; Tavares Delgado, A.; Tayalati, Y.; Taylor, A. C.; Taylor, G. N.; Taylor, P. T. E.; Taylor, W.; Teischinger, F. A.; Teixeira-Dias, P.; Temming, K. K.; Temple, D.; Ten Kate, H.; Teng, P. K.; Teoh, J. J.; Tepel, F.; Terada, S.; Terashi, K.; Terron, J.; Terzo, S.; Testa, M.; Teuscher, R. J.; Theveneaux-Pelzer, T.; Thomas, J. P.; Thomas-Wilsker, J.; Thompson, P. D.; Thompson, A. S.; Thomsen, L. A.; Thomson, E.; Tibbetts, M. J.; Ticse Torres, R. E.; Tikhomirov, V. O.; Tikhonov, Yu. A.; Timoshenko, S.; Tipton, P.; Tisserant, S.; Todome, K.; Todorov, T.; Todorova-Nova, S.; Tojo, J.; Tokár, S.; Tokushuku, K.; Tolley, E.; Tomlinson, L.; Tomoto, M.; Tompkins, L.; Toms, K.; Tong, B.; Tornambe, P.; Torrence, E.; Torres, H.; Torró Pastor, E.; Toth, J.; Touchard, F.; Tovey, D. R.; Trefzger, T.; Tricoli, A.; Trigger, I. M.; Trincaz-Duvoid, S.; Tripiana, M. F.; Trischuk, W.; Trocmé, B.; Trofymov, A.; Troncon, C.; Trottier-McDonald, M.; Trovatelli, M.; Truong, L.; Trzebinski, M.; Trzupek, A.; Tseng, J. C.-L.; Tsiareshka, P. V.; Tsipolitis, G.; Tsirintanis, N.; Tsiskaridze, S.; Tsiskaridze, V.; Tskhadadze, E. G.; Tsui, K. M.; Tsukerman, I. I.; Tsulaia, V.; Tsuno, S.; Tsybychev, D.; Tu, Y.; Tudorache, A.; Tudorache, V.; Tuna, A. N.; Tupputi, S. A.; Turchikhin, S.; Turecek, D.; Turgeman, D.; Turra, R.; Tuts, P. M.; Tyndel, M.; Ucchielli, G.; Ueda, I.; Ughetto, M.; Ukegawa, F.; Unal, G.; Undrus, A.; Unel, G.; Ungaro, F. C.; Unno, Y.; Unverdorben, C.; Urban, J.; Urquijo, P.; Urrejola, P.; Usai, G.; Usui, J.; Vacavant, L.; Vacek, V.; Vachon, B.; Valderanis, C.; Valdes Santurio, E.; Valencic, N.; Valentinetti, S.; Valero, A.; Valery, L.; Valkar, S.; Valls Ferrer, J. A.; van den Wollenberg, W.; van der Deijl, P. C.; van der Graaf, H.; van Eldik, N.; van Gemmeren, P.; van Nieuwkoop, J.; van Vulpen, I.; van Woerden, M. C.; Vanadia, M.; Vandelli, W.; Vanguri, R.; Vaniachine, A.; Vankov, P.; Vardanyan, G.; Vari, R.; Varnes, E. W.; Varol, T.; Varouchas, D.; Vartapetian, A.; Varvell, K. E.; Vasquez, J. G.; Vasquez, G. A.; Vazeille, F.; Vazquez Schroeder, T.; Veatch, J.; Veeraraghavan, V.; Veloce, L. M.; Veloso, F.; Veneziano, S.; Ventura, A.; Venturi, M.; Venturi, N.; Venturini, A.; Vercesi, V.; Verducci, M.; Verkerke, W.; Vermeulen, J. C.; Vest, A.; Vetterli, M. C.; Viazlo, O.; Vichou, I.; Vickey, T.; Vickey Boeriu, O. E.; Viehhauser, G. H. A.; Viel, S.; Vigani, L.; Villa, M.; Villaplana Perez, M.; Vilucchi, E.; Vincter, M. G.; Vinogradov, V. B.; Vittori, C.; Vivarelli, I.; Vlachos, S.; Vlasak, M.; Vogel, M.; Vokac, P.; Volpi, G.; Volpi, M.; von der Schmitt, H.; von Toerne, E.; Vorobel, V.; Vorobev, K.; Vos, M.; Voss, R.; Vossebeld, J. H.; Vranjes, N.; Vranjes Milosavljevic, M.; Vrba, V.; Vreeswijk, M.; Vuillermet, R.; Vukotic, I.; Vykydal, Z.; Wagner, P.; Wagner, W.; Wahlberg, H.; Wahrmund, S.; Wakabayashi, J.; Walder, J.; Walker, R.; Walkowiak, W.; Wallangen, V.; Wang, C.; Wang, C.; Wang, F.; Wang, H.; Wang, H.; Wang, J.; Wang, J.; Wang, K.; Wang, R.; Wang, S. M.; Wang, T.; Wang, T.; Wang, W.; Wanotayaroj, C.; Warburton, A.; Ward, C. P.; Wardrope, D. R.; Washbrook, A.; Watkins, P. M.; Watson, A. T.; Watson, M. F.; Watts, G.; Watts, S.; Waugh, B. M.; Webb, S.; Weber, M. S.; Weber, S. W.; Weber, S. A.; Webster, J. S.; Weidberg, A. R.; Weinert, B.; Weingarten, J.; Weiser, C.; Weits, H.; Wells, P. S.; Wenaus, T.; Wengler, T.; Wenig, S.; Wermes, N.; Werner, M.; Werner, M. D.; Werner, P.; Wessels, M.; Wetter, J.; Whalen, K.; Whallon, N. L.; Wharton, A. M.; White, A.; White, M. J.; White, R.; Whiteson, D.; Wickens, F. J.; Wiedenmann, W.; Wielers, M.; Wiglesworth, C.; Wiik-Fuchs, L. A. M.; Wildauer, A.; Wilk, F.; Wilkens, H. G.; Williams, H. H.; Williams, S.; Willis, C.; Willocq, S.; Wilson, J. A.; Wingerter-Seez, I.; Winklmeier, F.; Winston, O. J.; Winter, B. T.; Wittgen, M.; Wittkowski, J.; Wolf, T. M. H.; Wolter, M. W.; Wolters, H.; Worm, S. D.; Wosiek, B. K.; Wotschack, J.; Woudstra, M. J.; Wozniak, K. W.; Wu, M.; Wu, M.; Wu, S. L.; Wu, X.; Wu, Y.; Wyatt, T. R.; Wynne, B. M.; Xella, S.; Xu, D.; Xu, L.; Yabsley, B.; Yacoob, S.; Yamaguchi, D.; Yamaguchi, Y.; Yamamoto, A.; Yamamoto, S.; Yamanaka, T.; Yamauchi, K.; Yamazaki, Y.; Yan, Z.; Yang, H.; Yang, H.; Yang, Y.; Yang, Z.; Yao, W.-M.; Yap, Y. C.; Yasu, Y.; Yatsenko, E.; Yau Wong, K. H.; Ye, J.; Ye, S.; Yeletskikh, I.; Yildirim, E.; Yorita, K.; Yoshida, R.; Yoshihara, K.; Young, C.; Young, C. J. S.; Youssef, S.; Yu, D. R.; Yu, J.; Yu, J. M.; Yu, J.; Yuan, L.; Yuen, S. P. Y.; Yusuff, I.; Zabinski, B.; Zaidan, R.; Zaitsev, A. M.; Zakharchuk, N.; Zalieckas, J.; Zaman, A.; Zambito, S.; Zanello, L.; Zanzi, D.; Zeitnitz, C.; Zeman, M.; Zemla, A.; Zeng, J. C.; Zeng, Q.; Zenin, O.; Ženiš, T.; Zerwas, D.; Zhang, D.; Zhang, F.; Zhang, G.; Zhang, H.; Zhang, J.; Zhang, L.; Zhang, M.; Zhang, R.; Zhang, R.; Zhang, X.; Zhang, Z.; Zhao, X.; Zhao, Y.; Zhao, Z.; Zhemchugov, A.; Zhong, J.; Zhou, B.; Zhou, C.; Zhou, L.; Zhou, L.; Zhou, M.; Zhou, N.; Zhu, C. G.; Zhu, H.; Zhu, J.; Zhu, Y.; Zhuang, X.; Zhukov, K.; Zibell, A.; Zieminska, D.; Zimine, N. I.; Zimmermann, C.; Zimmermann, S.; Zinonos, Z.; Zinser, M.; Ziolkowski, M.; Živković, L.; Zobernig, G.; Zoccoli, A.; Zur Nedden, M.; Zwalinski, L.; Atlas Collaboration
2017-02-01
The W boson angular distribution in events with high transverse momentum jets is measured using data collected by the ATLAS experiment from proton-proton collisions at a centre-of-mass energy √{ s} = 8 TeV at the Large Hadron Collider, corresponding to an integrated luminosity of 20.3 fb-1. The focus is on the contributions to W +jets processes from real W emission, which is achieved by studying events where a muon is observed close to a high transverse momentum jet. At small angular separations, these contributions are expected to be large. Various theoretical models of this process are compared to the data in terms of the absolute cross-section and the angular distributions of the muon from the leptonic W decay.
Semiautomatic mapping of permafrost in the Yukon Flats, Alaska
NASA Astrophysics Data System (ADS)
Gulbrandsen, Mats Lundh; Minsley, Burke J.; Ball, Lyndsay B.; Hansen, Thomas Mejer
2016-12-01
Thawing of permafrost due to global warming can have major impacts on hydrogeological processes, climate feedback, arctic ecology, and local environments. To understand these effects and processes, it is crucial to know the distribution of permafrost. In this study we exploit the fact that airborne electromagnetic (AEM) data are sensitive to the distribution of permafrost and demonstrate how the distribution of permafrost in the Yukon Flats, Alaska, is mapped in an efficient (semiautomatic) way, using a combination of supervised and unsupervised (machine) learning algorithms, i.e., Smart Interpretation and K-means clustering. Clustering is used to sort unfrozen and frozen regions, and Smart Interpretation is used to predict the depth of permafrost based on expert interpretations. This workflow allows, for the first time, a quantitative and objective approach to efficiently map permafrost based on large amounts of AEM data.
Semiautomatic mapping of permafrost in the Yukon Flats, Alaska
Gulbrandsen, Mats Lundh; Minsley, Burke J.; Ball, Lyndsay B.; Hansen, Thomas Mejer
2016-01-01
Thawing of permafrost due to global warming can have major impacts on hydrogeological processes, climate feedback, arctic ecology, and local environments. To understand these effects and processes, it is crucial to know the distribution of permafrost. In this study we exploit the fact that airborne electromagnetic (AEM) data are sensitive to the distribution of permafrost and demonstrate how the distribution of permafrost in the Yukon Flats, Alaska, is mapped in an efficient (semiautomatic) way, using a combination of supervised and unsupervised (machine) learning algorithms, i.e., Smart Interpretation and K-means clustering. Clustering is used to sort unfrozen and frozen regions, and Smart Interpretation is used to predict the depth of permafrost based on expert interpretations. This workflow allows, for the first time, a quantitative and objective approach to efficiently map permafrost based on large amounts of AEM data.
NASA Astrophysics Data System (ADS)
Zhang, Guangyun; Jia, Xiuping; Pham, Tuan D.; Crane, Denis I.
2010-01-01
The interpretation of the distribution of fluorescence in cells is often by simple visualization of microscope-derived images for qualitative studies. In other cases, however, it is desirable to be able to quantify the distribution of fluorescence using digital image processing techniques. In this paper, the challenges of fluorescence segmentation due to the noise present in the data are addressed. We report that intensity measurements alone do not allow separation of overlapping data between target and background. Consequently, spatial properties derived from neighborhood profile were included. Mathematical Morphological operations were implemented for cell boundary extraction and a window based contrast measure was developed for fluorescence puncta identification. All of these operations were applied in the proposed multistage processing scheme. The testing results show that the spatial measures effectively enhance the target separability.
Ferrero, A; Campos, J; Rabal, A M; Pons, A; Hernanz, M L; Corróns, A
2011-09-26
The Bidirectional Reflectance Distribution Function (BRDF) is essential to characterize an object's reflectance properties. This function depends both on the various illumination-observation geometries as well as on the wavelength. As a result, the comprehensive interpretation of the data becomes rather complex. In this work we assess the use of the multivariable analysis technique of Principal Components Analysis (PCA) applied to the experimental BRDF data of a ceramic colour standard. It will be shown that the result may be linked to the various reflection processes occurring on the surface, assuming that the incoming spectral distribution is affected by each one of these processes in a specific manner. Moreover, this procedure facilitates the task of interpolating a series of BRDF measurements obtained for a particular sample. © 2011 Optical Society of America
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parker, S
2015-06-15
Purpose: To evaluate the ability of statistical process control methods to detect systematic errors when using a two dimensional (2D) detector array for routine electron beam energy verification. Methods: Electron beam energy constancy was measured using an aluminum wedge and a 2D diode array on four linear accelerators. Process control limits were established. Measurements were recorded in control charts and compared with both calculated process control limits and TG-142 recommended specification limits. The data was tested for normality, process capability and process acceptability. Additional measurements were recorded while systematic errors were intentionally introduced. Systematic errors included shifts in the alignmentmore » of the wedge, incorrect orientation of the wedge, and incorrect array calibration. Results: Control limits calculated for each beam were smaller than the recommended specification limits. Process capability and process acceptability ratios were greater than one in all cases. All data was normally distributed. Shifts in the alignment of the wedge were most apparent for low energies. The smallest shift (0.5 mm) was detectable using process control limits in some cases, while the largest shift (2 mm) was detectable using specification limits in only one case. The wedge orientation tested did not affect the measurements as this did not affect the thickness of aluminum over the detectors of interest. Array calibration dependence varied with energy and selected array calibration. 6 MeV was the least sensitive to array calibration selection while 16 MeV was the most sensitive. Conclusion: Statistical process control methods demonstrated that the data distribution was normally distributed, the process was capable of meeting specifications, and that the process was centered within the specification limits. Though not all systematic errors were distinguishable from random errors, process control limits increased the ability to detect systematic errors using routine measurement of electron beam energy constancy.« less
Large Scale Frequent Pattern Mining using MPI One-Sided Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Agarwal, Khushbu
In this paper, we propose a work-stealing runtime --- Library for Work Stealing LibWS --- using MPI one-sided model for designing scalable FP-Growth --- {\\em de facto} frequent pattern mining algorithm --- on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art O(p) to O(f + p/f) for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. Anmore » experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (87\\% efficiency for Power-law and 91% for Poisson). The proposed distributed FP-Tree merging algorithm provides 38x communication speedup on 4096 cores.« less
The McDonald Observatory lunar laser ranging project
NASA Technical Reports Server (NTRS)
Silverberg, E. C.
1978-01-01
A summary of the activities of the McDonald lunar laser ranging station at Fort Davis for the FY 77-78 fiscal year is presented. The lunar laser experiment uses the observatory 2.7m reflecting telescope on a thrice-per-day, 21-day-per-lunation schedule. Data are recorded on magnetic tapes and sent to the University of Texas at Austin where the data is processed. After processing, the data is distributed to interested analysis centers and later to the National Space Science Data Center where it is available for routine distribution. Detailed reports are published on the McDonald operations after every fourth lunation or approximately once every 115 days. These reports contain a day-by-day documentation of the ranging activity, detailed discussions of the equipment development efforts, and an abundance of other information as is needed to document and archive this important data type.
The revolution in data gathering systems. [mini and microcomputers in NASA wind tunnels
NASA Technical Reports Server (NTRS)
Cambra, J. M.; Trover, W. F.
1975-01-01
This paper gives a review of the data-acquisition systems used in NASA's wind tunnels from the 1950's to the present as a basis for assessing the impact of minicomputers and microcomputers on data acquisition and processing. The operation and disadvantages of wind-tunnel data systems are summarized for the period before 1950, the early 1950's, the early and late 1960's, and the early 1970's. Some significant advances discussed include the use or development of solid-state components, minicomputer systems, large central computers, on-line data processing, autoranging DC amplifiers, MOS-FET multiplexers, MSI and LSI logic, computer-controlled programmable amplifiers, solid-state remote multiplexing, integrated circuits, and microprocessors. The distributed system currently in use with the 40-ft by 80-ft wind tunnel at Ames Research Center is described in detail. The expected employment of distributed systems and microprocessors in the next decade is noted.
A Self-Organizing Incremental Neural Network based on local distribution learning.
Xing, Youlu; Shi, Xiaofeng; Shen, Furao; Zhou, Ke; Zhao, Jinxi
2016-12-01
In this paper, we propose an unsupervised incremental learning neural network based on local distribution learning, which is called Local Distribution Self-Organizing Incremental Neural Network (LD-SOINN). The LD-SOINN combines the advantages of incremental learning and matrix learning. It can automatically discover suitable nodes to fit the learning data in an incremental way without a priori knowledge such as the structure of the network. The nodes of the network store rich local information regarding the learning data. The adaptive vigilance parameter guarantees that LD-SOINN is able to add new nodes for new knowledge automatically and the number of nodes will not grow unlimitedly. While the learning process continues, nodes that are close to each other and have similar principal components are merged to obtain a concise local representation, which we call a relaxation data representation. A denoising process based on density is designed to reduce the influence of noise. Experiments show that the LD-SOINN performs well on both artificial and real-word data. Copyright © 2016 Elsevier Ltd. All rights reserved.
Distributed data collection for a database of radiological image interpretations
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.
1997-01-01
The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service
Bao, Shunxing; Plassard, Andrew J.; Landman, Bennett A.; Gokhale, Aniruddha
2017-01-01
Traditional in-house, laboratory-based medical imaging studies use hierarchical data structures (e.g., NFS file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance from these approaches is, however, impeded by standard network switches since they can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. To that end, a cloud-based “medical image processing-as-a-service” offers promise in utilizing the ecosystem of Apache Hadoop, which is a flexible framework providing distributed, scalable, fault tolerant storage and parallel computational modules, and HBase, which is a NoSQL database built atop Hadoop’s distributed file system. Despite this promise, HBase’s load distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). This paper makes two contributions to address these concerns by describing key cloud engineering principles and technology enhancements we made to the Apache Hadoop ecosystem for medical imaging applications. First, we propose a row-key design for HBase, which is a necessary step that is driven by the hierarchical organization of imaging data. Second, we propose a novel data allocation policy within HBase to strongly enforce collocation of hierarchically related imaging data. The proposed enhancements accelerate data processing by minimizing network usage and localizing processing to machines where the data already exist. Moreover, our approach is amenable to the traditional scan, subject, and project-level analysis procedures, and is compatible with standard command line/scriptable image processing software. Experimental results for an illustrative sample of imaging data reveals that our new HBase policy results in a three-fold time improvement in conversion of classic DICOM to NiFTI file formats when compared with the default HBase region split policy, and nearly a six-fold improvement over a commonly available network file system (NFS) approach even for relatively small file sets. Moreover, file access latency is lower than network attached storage. PMID:28884169
Semiautomatic and Automatic Cooperative Inversion of Seismic and Magnetotelluric Data
NASA Astrophysics Data System (ADS)
Le, Cuong V. A.; Harris, Brett D.; Pethick, Andrew M.; Takam Takougang, Eric M.; Howe, Brendan
2016-09-01
Natural source electromagnetic methods have the potential to recover rock property distributions from the surface to great depths. Unfortunately, results in complex 3D geo-electrical settings can be disappointing, especially where significant near-surface conductivity variations exist. In such settings, unconstrained inversion of magnetotelluric data is inexorably non-unique. We believe that: (1) correctly introduced information from seismic reflection can substantially improve MT inversion, (2) a cooperative inversion approach can be automated, and (3) massively parallel computing can make such a process viable. Nine inversion strategies including baseline unconstrained inversion and new automated/semiautomated cooperative inversion approaches are applied to industry-scale co-located 3D seismic and magnetotelluric data sets. These data sets were acquired in one of the Carlin gold deposit districts in north-central Nevada, USA. In our approach, seismic information feeds directly into the creation of sets of prior conductivity model and covariance coefficient distributions. We demonstrate how statistical analysis of the distribution of selected seismic attributes can be used to automatically extract subvolumes that form the framework for prior model 3D conductivity distribution. Our cooperative inversion strategies result in detailed subsurface conductivity distributions that are consistent with seismic, electrical logs and geochemical analysis of cores. Such 3D conductivity distributions would be expected to provide clues to 3D velocity structures that could feed back into full seismic inversion for an iterative practical and truly cooperative inversion process. We anticipate that, with the aid of parallel computing, cooperative inversion of seismic and magnetotelluric data can be fully automated, and we hold confidence that significant and practical advances in this direction have been accomplished.
NASA Astrophysics Data System (ADS)
Werner, Micha; Westerhoff, Rogier; Moore, Catherine
2017-04-01
Quantitative estimates of recharge due to precipitation excess are an important input to determining sustainable abstraction of groundwater resources, as well providing one of the boundary conditions required for numerical groundwater modelling. Simple water balance models are widely applied for calculating recharge. In these models, precipitation is partitioned between different processes and stores; including surface runoff and infiltration, storage in the unsaturated zone, evaporation, capillary processes, and recharge to groundwater. Clearly the estimation of recharge amounts will depend on the estimation of precipitation volumes, which may vary, depending on the source of precipitation data used. However, the partitioning between the different processes is in many cases governed by (variable) intensity thresholds. This means that the estimates of recharge will not only be sensitive to input parameters such as soil type, texture, land use, potential evaporation; but mainly to the precipitation volume and intensity distribution. In this paper we explore the sensitivity of recharge estimates due to difference in precipitation volumes and intensity distribution in the rainfall forcing over the Canterbury region in New Zealand. We compare recharge rates and volumes using a simple water balance model that is forced using rainfall and evaporation data from; the NIWA Virtual Climate Station Network (VCSN) data (which is considered as the reference dataset); the ERA-Interim/WATCH dataset at 0.25 degrees and 0.5 degrees resolution; the TRMM-3B42 dataset; the CHIRPS dataset; and the recently releases MSWEP dataset. Recharge rates are calculated at a daily time step over the 14 year period from the 2000 to 2013 for the full Canterbury region, as well as at eight selected points distributed over the region. Lysimeter data with observed estimates of recharge are available at four of these points, as well as recharge estimates from the NGRM model, an independent model constructed using the same base data and forced with the VCSN precipitation dataset. Results of the comparison of the rainfall products show that there are significant differences in precipitation volume between the forcing products; in the order of 20% at most points. Even more significant differences can be seen, however, in the distribution of precipitation. For the VCSN data wet days (defined as >0.1mm precipitation) occur on some 20-30% of days (depending on location). This is reasonably reflected in the TRMM and CHIRPS data, while for the re-analysis based products some 60%to 80% of days are wet, albeit at lower intensities. These differences are amplified in the recharge estimates. At most points, volumetric differences are in the order of 40-60%, though difference may range into several orders of magnitude. The frequency distributions of recharge also differ significantly, with recharge over 0.1 mm occurring on 4-6% of days for the VCNS, CHIRPS, and TRMM datasets, but up to the order of 12% of days for the re-analysis data. Comparison against the lysimeter data show estimates to be reasonable, in particular for the reference datasets. Surprisingly some estimates of the lower resolution re-analysis datasets are reasonable, though this does seem to be due to lower recharge being compensated by recharge occurring more frequently. These results underline the importance of correct representation of rainfall volumes, as well as of distribution, particularly when evaluating possible changes to for example changes in precipitation intensity and volume. This holds for precipitation data derived from satellite based and re-analysis products, but also for interpolated data from gauges, where the distribution of intensities is strongly influenced by the interpolation process.
Super and parallel computers and their impact on civil engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamat, M.P.
1986-01-01
This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.
Semantic Data Access Services at NASA's Atmospheric Science Data Center
NASA Astrophysics Data System (ADS)
Huffer, E.; Hertz, J.; Kusterer, J.
2012-12-01
The corpus of Earth Science data products at the Atmospheric Science Data Center at NASA's Langley Research Center comprises a widely heterogeneous set of products, even among those whose subject matter is very similar. Two distinct data products may both contain data on the same parameter, for instance, solar irradiance; but the instruments used, and the circumstances under which the data were collected and processed, may differ significantly. Understanding the differences is critical to using the data effectively. Data distribution services must be able to provide prospective users with enough information to allow them to meaningfully compare and evaluate the data products offered. Semantic technologies - ontologies, triple stores, reasoners, linked data - offer functionality for addressing this issue. Ontologies can provide robust, high-fidelity domain models that serve as common schema for discovering, evaluating, comparing and integrating data from disparate products. Reasoning engines and triple stores can leverage ontologies to support intelligent search applications that allow users to discover, query, retrieve, and easily reformat data from a broad spectrum of sources. We argue that because of the extremely complex nature of scientific data, data distribution systems should wholeheartedly embrace semantic technologies in order to make their data accessible to a broad array of prospective end users, and to ensure that the data they provide will be clearly understood and used appropriately by consumers. Toward this end, we propose a distribution system in which formal ontological models that accurately and comprehensively represent the ASDC's data domain, and fully leverage the expressivity and inferential capabilities of first order logic, are used to generate graph-based representations of the relevant relationships among data sets, observational systems, metadata files, and geospatial, temporal and scientific parameters to help prospective data consumers navigate directly to relevant data sets and query, subset, retrieve and compare the measurement and calculation data they contain. A critical part of developing semantically-enabled data distribution capabilities is developing an ontology that adequately describes 1) the data products - their structure, their content, and any supporting documentation; 2) the data domain - the objects and processes that the products denote; and 3) the relationship between the data and the domain. The ontology, in addition, should be machine readable and capable of integrating with the larger data distribution system to provide an interactive user experience. We will demonstrate how a formal, high-fidelity, queriable ontology representing the atmospheric science domain objects and data products, together with a robust set of inference rules for generating interactive graphs, allows researchers to navigate quickly and painlessly through the large volume of data at the ASDC. Scientists will be able to discover data products that exactly meet their particular criteria, link to information about the instruments and processing methods that generated the data; and compare and contrast related products.
USGS remote sensing coordination for the 2010 Haiti earthquake
Duda, Kenneth A.; Jones, Brenda
2011-01-01
In response to the devastating 12 January 2010, earthquake in Haiti, the US Geological Survey (USGS) provided essential coordinating services for remote sensing activities. Communication was rapidly established between the widely distributed response teams and data providers to define imaging requirements and sensor tasking opportunities. Data acquired from a variety of sources were received and archived by the USGS, and these products were subsequently distributed using the Hazards Data Distribution System (HDDS) and other mechanisms. Within six weeks after the earthquake, over 600,000 files representing 54 terabytes of data were provided to the response community. The USGS directly supported a wide variety of groups in their use of these data to characterize post-earthquake conditions and to make comparisons with pre-event imagery. The rapid and continuing response achieved was enabled by existing imaging and ground systems, and skilled personnel adept in all aspects of satellite data acquisition, processing, distribution and analysis. The information derived from image interpretation assisted senior planners and on-site teams to direct assistance where it was most needed.
Lee, H.R.
1997-11-18
A three-dimensional image reconstruction method comprises treating the object of interest as a group of elements with a size that is determined by the resolution of the projection data, e.g., as determined by the size of each pixel. One of the projections is used as a reference projection. A fictitious object is arbitrarily defined that is constrained by such reference projection. The method modifies the known structure of the fictitious object by comparing and optimizing its four projections to those of the unknown structure of the real object and continues to iterate until the optimization is limited by the residual sum of background noise. The method is composed of several sub-processes that acquire four projections from the real data and the fictitious object: generate an arbitrary distribution to define the fictitious object, optimize the four projections, generate a new distribution for the fictitious object, and enhance the reconstructed image. The sub-process for the acquisition of the four projections from the input real data is simply the function of acquiring the four projections from the data of the transmitted intensity. The transmitted intensity represents the density distribution, that is, the distribution of absorption coefficients through the object. 5 figs.
Inferring epidemiological parameters from phylogenetic information for the HIV-1 epidemic among MSM
NASA Astrophysics Data System (ADS)
Quax, Rick; van de Vijver, David A. M. C.; Frentz, Dineke; Sloot, Peter M. A.
2013-09-01
The HIV-1 epidemic in Europe is primarily sustained by a dynamic topology of sexual interactions among MSM who have individual immune systems and behavior. This epidemiological process shapes the phylogeny of the virus population. Both fields of epidemic modeling and phylogenetics have a long history, however it remains difficult to use phylogenetic data to infer epidemiological parameters such as the structure of the sexual network and the per-act infectiousness. This is because phylogenetic data is necessarily incomplete and ambiguous. Here we show that the cluster-size distribution indeed contains information about epidemiological parameters using detailed numberical experiments. We simulate the HIV epidemic among MSM many times using the Monte Carlo method with all parameter values and their ranges taken from literature. For each simulation and the corresponding set of parameter values we calculate the likelihood of reproducing an observed cluster-size distribution. The result is an estimated likelihood distribution of all parameters from the phylogenetic data, in particular the structure of the sexual network, the per-act infectiousness, and the risk behavior reduction upon diagnosis. These likelihood distributions encode the knowledge provided by the observed cluster-size distrbution, which we quantify using information theory. Our work suggests that the growing body of genetic data of patients can be exploited to understand the underlying epidemiological process.
Managing Sustainable Data Infrastructures: The Gestalt of EOSDIS
NASA Technical Reports Server (NTRS)
Behnke, Jeanne; Lowe, Dawn; Lindsay, Francis; Lynnes, Chris; Mitchell, Andrew
2016-01-01
EOSDIS epitomizes a System of Systems, whose many varied and distributed parts are integrated into a single, highly functional organized science data system. A distributed architecture was adopted to ensure discipline-specific support for the science data, while also leveraging standards and establishing policies and tools to enable interdisciplinary research, and analysis across multiple scientific instruments. The EOSDIS is composed of system elements such as geographically distributed archive centers used to manage the stewardship of data. The infrastructure consists of underlying capabilities connections that enable the primary system elements to function together. For example, one key infrastructure component is the common metadata repository, which enables discovery of all data within the EOSDIS system. EOSDIS employs processes and standards to ensure partners can work together effectively, and provide coherent services to users.
Krueger, Ute; Schimmelpfeng, Katja
2013-03-01
A sufficient staffing level in fire and rescue dispatch centers is crucial for saving lives. Therefore, it is important to estimate the expected workload properly. For this purpose, we analyzed whether a dispatch center can be considered as a call center. Current call center publications very often model call arrivals as a non-homogeneous Poisson process. This bases on the underlying assumption of the caller's independent decision to call or not to call. In case of an emergency, however, there are often calls from more than one person reporting the same incident and thus, these calls are not independent. Therefore, this paper focuses on the dependency of calls in a fire and rescue dispatch center. We analyzed and evaluated several distributions in this setting. Results are illustrated using real-world data collected from a typical German dispatch center in Cottbus ("Leitstelle Lausitz"). We identified the Pólya distribution as being superior to the Poisson distribution in describing the call arrival rate and the Weibull distribution to be more suitable than the exponential distribution for interarrival times and service times. However, the commonly used distributions offer acceptable approximations. This is important for estimating a sufficient staffing level in practice using, e.g., the Erlang-C model.
Cheng, Mingjian; Guo, Ya; Li, Jiangting; Zheng, Xiaotong; Guo, Lixin
2018-04-20
We introduce an alternative distribution to the gamma-gamma (GG) distribution, called inverse Gaussian gamma (IGG) distribution, which can efficiently describe moderate-to-strong irradiance fluctuations. The proposed stochastic model is based on a modulation process between small- and large-scale irradiance fluctuations, which are modeled by gamma and inverse Gaussian distributions, respectively. The model parameters of the IGG distribution are directly related to atmospheric parameters. The accuracy of the fit among the IGG, log-normal, and GG distributions with the experimental probability density functions in moderate-to-strong turbulence are compared, and results indicate that the newly proposed IGG model provides an excellent fit to the experimental data. As the receiving diameter is comparable with the atmospheric coherence radius, the proposed IGG model can reproduce the shape of the experimental data, whereas the GG and LN models fail to match the experimental data. The fundamental channel statistics of a free-space optical communication system are also investigated in an IGG-distributed turbulent atmosphere, and a closed-form expression for the outage probability of the system is derived with Meijer's G-function.
Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.
Niemenmaa, Matti; Kallio, Aleksi; Schumacher, André; Klemelä, Petri; Korpelainen, Eija; Heljanko, Keijo
2012-03-15
Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.
MODIS land data at the EROS data center DAAC
Jenkerson, Calli B.; Reed, B.C.
2001-01-01
The US Geological Survey's (USGS) Earth Resources Observation Systems (EROS) Data Center (EDC) in Sioux Falls, SD, USA, is the primary national archive for land processes data and one of the National Aeronautics and Space Administration's (NASA) Distributed Active Archive Centers (DAAC) for the Earth Observing System (EOS). One of EDC's functions as a DAAC is the archival and distribution of Moderate Resolution Spectroradiometer (MODIS) Land Data collected from the Earth Observing System (EOS) satellite Terra. More than 500,000 publicly available MODIS land data granules totaling 25 Terabytes (Tb) are currently stored in the EDC archive. This collection is managed, archived, and distributed by EOS Data and Information System (EOSDIS) Core System (ECS) at EDC. EDC User Services support the use of MODIS Land data, which include land surface reflectance/albedo, temperature/emissivity, vegetation characteristics, and land cover, by responding to user inquiries, constructing user information sites on the EDC web page, and presenting MODIS materials worldwide.
Workflow management in large distributed systems
NASA Astrophysics Data System (ADS)
Legrand, I.; Newman, H.; Voicu, R.; Dobre, C.; Grigoras, C.
2011-12-01
The MonALISA (Monitoring Agents using a Large Integrated Services Architecture) framework provides a distributed service system capable of controlling and optimizing large-scale, data-intensive applications. An essential part of managing large-scale, distributed data-processing facilities is a monitoring system for computing facilities, storage, networks, and the very large number of applications running on these systems in near realtime. All this monitoring information gathered for all the subsystems is essential for developing the required higher-level services—the components that provide decision support and some degree of automated decisions—and for maintaining and optimizing workflow in large-scale distributed systems. These management and global optimization functions are performed by higher-level agent-based services. We present several applications of MonALISA's higher-level services including optimized dynamic routing, control, data-transfer scheduling, distributed job scheduling, dynamic allocation of storage resource to running jobs and automated management of remote services among a large set of grid facilities.
Dynamic Reconfiguration of a RGBD Sensor Based on QoS and QoC Requirements in Distributed Systems.
Munera, Eduardo; Poza-Lujan, Jose-Luis; Posadas-Yagüe, Juan-Luis; Simó-Ten, José-Enrique; Noguera, Juan Fco Blanes
2015-07-24
The inclusion of embedded sensors into a networked system provides useful information for many applications. A Distributed Control System (DCS) is one of the clearest examples where processing and communications are constrained by the client's requirements and the capacity of the system. An embedded sensor with advanced processing and communications capabilities supplies high level information, abstracting from the data acquisition process and objects recognition mechanisms. The implementation of an embedded sensor/actuator as a Smart Resource permits clients to access sensor information through distributed network services. Smart resources can offer sensor services as well as computing, communications and peripheral access by implementing a self-aware based adaptation mechanism which adapts the execution profile to the context. On the other hand, information integrity must be ensured when computing processes are dynamically adapted. Therefore, the processing must be adapted to perform tasks in a certain lapse of time but always ensuring a minimum process quality. In the same way, communications must try to reduce the data traffic without excluding relevant information. The main objective of the paper is to present a dynamic configuration mechanism to adapt the sensor processing and communication to the client's requirements in the DCS. This paper describes an implementation of a smart resource based on a Red, Green, Blue, and Depth (RGBD) sensor in order to test the dynamic configuration mechanism presented.
Electronic Timekeeping: North Dakota State University Improves Payroll Processing.
ERIC Educational Resources Information Center
Vetter, Ronald J.; And Others
1993-01-01
North Dakota State University has adopted automated timekeeping to improve the efficiency and effectiveness of payroll processing. The microcomputer-based system accurately records and computes employee time, tracks labor distribution, accommodates complex labor policies and company pay practices, provides automatic data processing and reporting,…
``Sweetening'' Technical Physics with Hershey's Kisses
NASA Astrophysics Data System (ADS)
Stone, Chuck
2003-04-01
This paper describes an activity in which students measure the mass of each candy in one full bag of Hershey's Kisses and then use a simple spreadsheet program to construct a histogram showing the number of candies as a function of mass. Student measurements indicate that one single bag of 80 Kisses yields enough data to produce a noticeable variation in the candy's mass distribution. The bimodal character of this distribution provides a useful discussion topic. This activity can be performed as a classroom project, a laboratory exercise, or an interactive lecture demonstration. In all these formats, students have the opportunity to collect, organize, process, and analyze real data. In addition to strengthening graphical analysis skills, this activity introduces students to fundamentals of statistics, manufacturing processes in the industrial workplace, and process control techniques.
NASA Astrophysics Data System (ADS)
Ghaemi, Z.; Farnaghi, M.; Alimohammadi, A.
2015-12-01
The critical impact of air pollution on human health and environment in one hand and the complexity of pollutant concentration behavior in the other hand lead the scientists to look for advance techniques for monitoring and predicting the urban air quality. Additionally, recent developments in data measurement techniques have led to collection of various types of data about air quality. Such data is extremely voluminous and to be useful it must be processed at high velocity. Due to the complexity of big data analysis especially for dynamic applications, online forecasting of pollutant concentration trends within a reasonable processing time is still an open problem. The purpose of this paper is to present an online forecasting approach based on Support Vector Machine (SVM) to predict the air quality one day in advance. In order to overcome the computational requirements for large-scale data analysis, distributed computing based on the Hadoop platform has been employed to leverage the processing power of multiple processing units. The MapReduce programming model is adopted for massive parallel processing in this study. Based on the online algorithm and Hadoop framework, an online forecasting system is designed to predict the air pollution of Tehran for the next 24 hours. The results have been assessed on the basis of Processing Time and Efficiency. Quite accurate predictions of air pollutant indicator levels within an acceptable processing time prove that the presented approach is very suitable to tackle large scale air pollution prediction problems.
NASA Astrophysics Data System (ADS)
Hodge, R.; Brasington, J.; Richards, K.
2009-04-01
The ability to collect 3D elevation data at mm-resolution from in-situ natural surfaces, such as fluvial and coastal sediments, rock surfaces, soils and dunes, is beneficial for a range of geomorphological and geological research. From these data the properties of the surface can be measured, and Digital Terrain Models (DTM) can be constructed. Terrestrial Laser Scanning (TLS) can collect quickly such 3D data with mm-precision and mm-spacing. This paper presents a methodology for the collection and processing of such TLS data, and considers how the errors in this TLS data can be quantified. TLS has been used to collect elevation data from fluvial gravel surfaces. Data were collected from areas of approximately 1 m2, with median grain sizes ranging from 18 to 63 mm. Errors are inherent in such data as a result of the precision of the TLS, and the interaction of factors including laser footprint, surface topography, surface reflectivity and scanning geometry. The methodology for the collection and processing of TLS data from complex surfaces like these fluvial sediments aims to minimise the occurrence of, and remove, such errors. The methodology incorporates taking scans from multiple scanner locations, averaging repeat scans, and applying a series of filters to remove erroneous points. Analysis of 2.5D DTMs interpolated from the processed data has identified geomorphic properties of the gravel surfaces, including the distribution of surface elevations, preferential grain orientation and grain imbrication. However, validation of the data and interpolated DTMs is limited by the availability of techniques capable of collecting independent elevation data of comparable quality. Instead, two alternative approaches to data validation are presented. The first consists of careful internal validation to optimise filter parameter values during data processing combined with a series of laboratory experiments. In the experiments, TLS data were collected from a sphere and planes with different reflectivities to measure the accuracy and precision of TLS data of these geometrically simple objects. Whilst this first approach allows the maximum precision of TLS data from complex surfaces to be estimated, it cannot quantify the distribution of errors within the TLS data and across the interpolated DTMs. The second approach enables this by simulating the collection of TLS data from complex surfaces of a known geometry. This simulated scanning has been verified through systematic comparison with laboratory TLS data. Two types of surface geometry have been investigated: simulated regular arrays of uniform spheres used to analyse the effect of sphere size; and irregular beds of spheres with the same grain size distribution as the fluvial gravels, which provide a comparable complex geometry to the field sediment surfaces. A series of simulated scans of these surfaces has enabled the magnitude and spatial distribution of errors in the interpolated DTMs to be quantified, as well as demonstrating the utility of the different processing stages in removing errors from TLS data. As well as demonstrating the application of simulated scanning as a technique to quantify errors, these results can be used to estimate errors in comparable TLS data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anselmino, Mauro; Mariaelena, Boglione; D'Alesio, Umberto
2014-06-01
Some estimates for the transverse Single Spin Asymmetry, A_N, in the inclusive processes l p(transv. Pol.) --> h X, given in a previous paper, are expanded and compared with new experimental data. The predictions are based on the Sivers distributions and the Collins fragmentation functions which fit the azimuthal asymmetries measured in Semi-Inclusive Deep Inelastic Scattering (SIDIS) processes (l p(transv. Pol.) --> l' h X). The factorisation in terms of Transverse Momentum Dependent distribution and fragmentation functions (TMD factorisation) -- i.e., the theoretical framework in which SIDIS azimuthal asymmetries are analysed -- is assumed to hold also for the inclusivemore » process l p --> h X at large P_T. The values of A_N thus obtained agree in sign and shape with the data. Some predictions are given for future experiments.« less
Marital power process of Korean men married to foreign women: a qualitative study.
Kim, Miyoung; Park, Gyeong Sook; Windsor, Carol
2013-03-01
This study explored how Korean men married to migrant women construct meaning around married life. Data were collected through in-depth interviews with 10 men who had had been married to migrant women for ≥ 2 years. Data collection and analysis were performed concurrently using a grounded theory approach. The core category generated was the process of sustaining a family unit. The men came to understand the importance of a distribution of power within the family in sustaining the family unit. Constituting this process were four stages: recognizing an imbalance of power, relinquishing power, empowering, and fine-tuning the balance of power. This study provides important insight into the dynamics of marital power from men's point of view by demonstrating a link between the way people adjust to married life and the process by which married couples adjust through the distribution and redistribution of power. © 2012 Wiley Publishing Asia Pty Ltd.
Evaluating Cloud Computing in the Proposed NASA DESDynI Ground Data System
NASA Technical Reports Server (NTRS)
Tran, John J.; Cinquini, Luca; Mattmann, Chris A.; Zimdars, Paul A.; Cuddy, David T.; Leung, Kon S.; Kwoun, Oh-Ig; Crichton, Dan; Freeborn, Dana
2011-01-01
The proposed NASA Deformation, Ecosystem Structure and Dynamics of Ice (DESDynI) mission would be a first-of-breed endeavor that would fundamentally change the paradigm by which Earth Science data systems at NASA are built. DESDynI is evaluating a distributed architecture where expert science nodes around the country all engage in some form of mission processing and data archiving. This is compared to the traditional NASA Earth Science missions where the science processing is typically centralized. What's more, DESDynI is poised to profoundly increase the amount of data collection and processing well into the 5 terabyte/day and tens of thousands of job range, both of which comprise a tremendous challenge to DESDynI's proposed distributed data system architecture. In this paper, we report on a set of architectural trade studies and benchmarks meant to inform the DESDynI mission and the broader community of the impacts of these unprecedented requirements. In particular, we evaluate the benefits of cloud computing and its integration with our existing NASA ground data system software called Apache Object Oriented Data Technology (OODT). The preliminary conclusions of our study suggest that the use of the cloud and OODT together synergistically form an effective, efficient and extensible combination that could meet the challenges of NASA science missions requiring DESDynI-like data collection and processing volumes at reduced costs.
Water level ingest, archive and processing system - an integral part of NOAA's tsunami database
NASA Astrophysics Data System (ADS)
McLean, S. J.; Mungov, G.; Dunbar, P. K.; Price, D. J.; Mccullough, H.
2013-12-01
The National Oceanic and Atmospheric Administration (NOAA), National Geophysical Data Center (NGDC) and collocated World Data Service for Geophysics (WDS) provides long-term archive, data management, and access to national and global tsunami data. Archive responsibilities include the NOAA Global Historical Tsunami event and runup database, damage photos, as well as other related hazards data. Beginning in 2008, NGDC was given the responsibility of archiving, processing and distributing all tsunami and hazards-related water level data collected from NOAA observational networks in a coordinated and consistent manner. These data include the Deep-ocean Assessment and Reporting of Tsunami (DART) data provided by the National Data Buoy Center (NDBC), coastal-tide-gauge data from the National Ocean Service (NOS) network and tide-gauge data from the two National Weather Service (NWS) Tsunami Warning Centers (TWCs) regional networks. Taken together, this integrated archive supports tsunami forecast, warning, research, mitigation and education efforts of NOAA and the Nation. Due to the variety of the water level data, the automatic ingest system was redesigned, along with upgrading the inventory, archive and delivery capabilities based on modern digital data archiving practices. The data processing system was also upgraded and redesigned focusing on data quality assessment in an operational manner. This poster focuses on data availability highlighting the automation of all steps of data ingest, archive, processing and distribution. Examples are given from recent events such as the October 2012 hurricane Sandy, the Feb 06, 2013 Solomon Islands tsunami, and the June 13, 2013 meteotsunami along the U.S. East Coast.
NASA Astrophysics Data System (ADS)
Rimland, Jeffrey; Ballora, Mark; Shumaker, Wade
2013-05-01
As the sheer volume of data grows exponentially, it becomes increasingly difficult for existing visualization techniques to keep pace. The sonification field attempts to address this issue by enlisting our auditory senses to detect anomalies or complex events that are difficult to detect via visualization alone. Storification attempts to improve analyst understanding by converting data streams into organized narratives describing the data at a higher level of abstraction than the input stream that they area derived from. While these techniques hold a great deal of promise, they also each have a unique set of challenges that must be overcome. Sonification techniques must represent a broad variety of distributed heterogeneous data and present it to the analyst/listener in a manner that doesn't require extended listening - as visual "snapshots" are useful but auditory sounds only exist over time. Storification still faces many human-computer interface (HCI) challenges as well as technical hurdles related to automatically generating a logical narrative from lower-level data streams. This paper proposes a novel approach that utilizes a service oriented architecture (SOA)-based hybrid visualization/ sonification / storification framework to enable distributed human-in-the-loop processing of data in a manner that makes optimized usage of both visual and auditory processing pathways while also leveraging the value of narrative explication of data streams. It addresses the benefits and shortcomings of each processing modality and discusses information infrastructure and data representation concerns required with their utilization in a distributed environment. We present a generalizable approach with a broad range of applications including cyber security, medical informatics, facilitation of energy savings in "smart" buildings, and detection of natural and man-made disasters.
Astro-WISE: Chaining to the Universe
NASA Astrophysics Data System (ADS)
Valentijn, E. A.; McFarland, J. P.; Snigula, J.; Begeman, K. G.; Boxhoorn, D. R.; Rengelink, R.; Helmich, E.; Heraudeau, P.; Verdoes Kleijn, G.; Vermeij, R.; Vriend, W.-J.; Tempelaar, M. J.; Deul, E.; Kuijken, K.; Capaccioli, M.; Silvotti, R.; Bender, R.; Neeser, M.; Saglia, R.; Bertin, E.; Mellier, Y.
2007-10-01
The recent explosion of recorded digital data and its processed derivatives threatens to overwhelm researchers when analysing their experimental data or looking up data items in archives and file systems. While current hardware developments allow the acquisition, processing and storage of hundreds of terabytes of data at the cost of a modern sports car, the software systems to handle these data are lagging behind. This problem is very general and is well recognized by various scientific communities; several large projects have been initiated, e.g., DATAGRID/EGEE {http://www.eu-egee.org/} federates compute and storage power over the high-energy physical community, while the international astronomical community is building an Internet geared Virtual Observatory {http://www.euro-vo.org/pub/} (Padovani 2006) connecting archival data. These large projects either focus on a specific distribution aspect or aim to connect many sub-communities and have a relatively long trajectory for setting standards and a common layer. Here, we report first light of a very different solution (Valentijn & Kuijken 2004) to the problem initiated by a smaller astronomical IT community. It provides an abstract scientific information layer which integrates distributed scientific analysis with distributed processing and federated archiving and publishing. By designing new abstractions and mixing in old ones, a Science Information System with fully scalable cornerstones has been achieved, transforming data systems into knowledge systems. This break-through is facilitated by the full end-to-end linking of all dependent data items, which allows full backward chaining from the observer/researcher to the experiment. Key is the notion that information is intrinsic in nature and thus is the data acquired by a scientific experiment. The new abstraction is that software systems guide the user to that intrinsic information by forcing full backward and forward chaining in the data modelling.
EOS Data Products Latency and Reprocessing Evaluation
NASA Astrophysics Data System (ADS)
Ramapriyan, H. K.; Wanchoo, L.
2012-12-01
NASA's Earth Observing System (EOS) Data and Information System (EOSDIS) program has been processing, archiving, and distributing EOS data since the launch of Terra platform in 1999. The EOSDIS Distributed Active Archive Centers (DAACs) and Science-Investigator-led Processing Systems (SIPSs) are generating over 5000 unique products with a daily average volume of 1.7 Petabytes. Initially EOSDIS had requirements to make process data products within 24 hours of receiving all inputs needed for generating them. Thus, generally, the latency would be slightly over 24 and 48 hours after satellite data acquisition, respectively, for Level 1 and Level 2 products. Due to budgetary constraints these requirements were relaxed, with the requirement being to avoid a growing backlog of unprocessed data. However, the data providers have been generating these products in as timely a manner as possible. The reduction in costs of computing hardware has helped considerably. It is of interest to analyze the actual latencies achieved over the past several years in processing and inserting the data products into the EOSDIS archives for the users to support various scientific studies such as land processes, oceanography, hydrology, atmospheric science, cryospheric science, etc. The instrument science teams have continuously evaluated the data products since the launches of EOS satellites and improved the science algorithms to provide high quality products. Data providers have periodically reprocessed the previously acquired data with these improved algorithms. The reprocessing campaigns run for an extended time period in parallel with forward processing, since all data starting from the beginning of the mission need to be reprocessed. Each reprocessing activity involves more data than the previous reprocessing. The historical record of the reprocessing times would be of interest to future missions, especially those involving large volumes of data and/or computational loads due to complexity of algorithms. Evaluation of latency and reprocessing times requires some of the product metadata information, such as the beginning and ending time of data acquisition, processing date, and version number. This information for each product is made available by data providers to the ESDIS Metrics System (EMS). The EMS replaced the earlier ESDIS Data Gathering and Reporting System (EDGRS) in FY2005. Since then it has collected information about data products' ingest, archive, and distribution. The analysis of latencies and reprocessing times will provide an insight to the data provider process and identify potential areas of weakness in providing timely data to the user community. Delays may be caused by events such as system unavailability, disk failures, delay in level 0 data delivery, availability of input data, network problems, and power failures. Analysis of metrics will highlight areas for focused examination of root causes for delays. The purposes of this study are to: 1) perform a detailed analysis of latency of selected instrument products for last 6 years; 2) analyze the reprocessed data from various data providers to determine the times taken for reprocessing campaigns; 3) identify potential reasons for any anomalies in these metrics.
Propagation of Data Dependency through Distributed Cooperating Processes
1988-09-01
12 The External Data Dependency Analyzer ( EDDA ) .................................................. 12 The new EPL...47 EDDA Patch Files for the Dining Philosophers Example [Figure 23] ................... 49 L im itations...dependencies is evident. The External Data Dependency Analyzer ( EDDA ) The EDDA derives external data dependencies by performing two levels of analysis
VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases
NASA Technical Reports Server (NTRS)
Roussopoulos, N.; Sellis, Timos
1992-01-01
One of biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental database access method, VIEWCACHE, provides such an interface for accessing distributed data sets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image data sets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate distributed database search.
An EMSO data case study within the INDIGO-DC project
NASA Astrophysics Data System (ADS)
Monna, Stephen; Marcucci, Nicola M.; Marinaro, Giuditta; Fiore, Sandro; D'Anca, Alessandro; Antonacci, Marica; Beranzoli, Laura; Favali, Paolo
2017-04-01
We present our experience based on a case study within the INDIGO-DataCloud (INtegrating Distributed data Infrastructures for Global ExplOitation) project (www.indigo-datacloud.eu). The aim of INDIGO-DC is to develop a data and computing platform targeting scientific communities. Our case study is an example of activities performed by INGV using data from seafloor observatories that are nodes of the infrastructure EMSO (European Multidisciplinary Seafloor and water column Observatory)-ERIC (www.emso-eu.org). EMSO is composed of several deep-seafloor and water column observatories, deployed at key sites in the European waters, thus forming a widely distributed pan-European infrastructure. In our case study we consider data collected by the NEMO-SN1 observatory, one of the EMSO nodes used for geohazard monitoring, located in the Western Ionian Sea in proximity of Etna volcano. Starting from the case study, through an agile approach, we defined some requirements for INDIGO developers, and tested some of the proposed INDIGO solutions that are of interest for our research community. Given that EMSO is a distributed infrastructure, we are interested in INDIGO solutions that allow access to distributed data storage. Access should be both user-oriented and machine-oriented, and with the use of a common identity and access system. For this purpose, we have been testing: - ONEDATA (https://onedata.org), as global data management system. - INDIGO-IAM as Identity and Access Management system. Another aspect we are interested in is the efficient data processing, and we have focused on two types of INDIGO products: - Ophidia (http://ophidia.cmcc.it), a big data analytics framework for eScience for the analysis of multidimensional data. - A collection of INDIGO Services to run processes for scientific computing through the INDIGO Orchestrator.
A Disciplined Architectural Approach to Scaling Data Analysis for Massive, Scientific Data
NASA Astrophysics Data System (ADS)
Crichton, D. J.; Braverman, A. J.; Cinquini, L.; Turmon, M.; Lee, H.; Law, E.
2014-12-01
Data collections across remote sensing and ground-based instruments in astronomy, Earth science, and planetary science are outpacing scientists' ability to analyze them. Furthermore, the distribution, structure, and heterogeneity of the measurements themselves pose challenges that limit the scalability of data analysis using traditional approaches. Methods for developing science data processing pipelines, distribution of scientific datasets, and performing analysis will require innovative approaches that integrate cyber-infrastructure, algorithms, and data into more systematic approaches that can more efficiently compute and reduce data, particularly distributed data. This requires the integration of computer science, machine learning, statistics and domain expertise to identify scalable architectures for data analysis. The size of data returned from Earth Science observing satellites and the magnitude of data from climate model output, is predicted to grow into the tens of petabytes challenging current data analysis paradigms. This same kind of growth is present in astronomy and planetary science data. One of the major challenges in data science and related disciplines defining new approaches to scaling systems and analysis in order to increase scientific productivity and yield. Specific needs include: 1) identification of optimized system architectures for analyzing massive, distributed data sets; 2) algorithms for systematic analysis of massive data sets in distributed environments; and 3) the development of software infrastructures that are capable of performing massive, distributed data analysis across a comprehensive data science framework. NASA/JPL has begun an initiative in data science to address these challenges. Our goal is to evaluate how scientific productivity can be improved through optimized architectural topologies that identify how to deploy and manage the access, distribution, computation, and reduction of massive, distributed data, while managing the uncertainties of scientific conclusions derived from such capabilities. This talk will provide an overview of JPL's efforts in developing a comprehensive architectural approach to data science.
Solar nebula heterogeneity in p-process samarium and neodymium isotopes.
Andreasen, Rasmus; Sharma, Mukul
2006-11-03
Bulk carbonaceous chondrites display a deficit of approximately 100 parts per million (ppm) in 144Sm with respect to other meteorites and terrestrial standards, leading to a decrease in their 142Nd/144Nd ratios by approximately 11 ppm. The data require that samarium and neodymium isotopes produced by the p process associated with photodisintegration reactions in supernovae were heterogeneously distributed in the solar nebula. Other samarium and neodymium isotopes produced by rapid neutron capture (r process) in supernovae and by slow neutron capture (s process) in red giants were homogeneously distributed. The supernovae sources supplying the p- and r-process nuclides to the solar nebula were thus disconnected or only weakly connected.
2003-06-27
KENNEDY SPACE CENTER, FLA. - Inside the hangar at Vandenberg Air Force Base, Calif., workers wait for the Pegasus launch vehicle to be moved inside. The Pegasus will carry the SciSat-1 spacecraft in a 400-mile-high polar orbit to investigate processes that control the distribution of ozone in the upper atmosphere. The scientific mission of SciSat-1 is to measure and understand the chemical processes that control the distribution of ozone in the Earth’s atmosphere, particularly at high altitudes. The data from the satellite will provide Canadian and international scientists with improved measurements relating to global ozone processes and help policymakers assess existing environmental policy and develop protective measures for improving the health of our atmosphere, preventing further ozone depletion. The mission is designed to last two years.
An integral conservative gridding--algorithm using Hermitian curve interpolation.
Volken, Werner; Frei, Daniel; Manser, Peter; Mini, Roberto; Born, Ernst J; Fix, Michael K
2008-11-07
The problem of re-sampling spatially distributed data organized into regular or irregular grids to finer or coarser resolution is a common task in data processing. This procedure is known as 'gridding' or 're-binning'. Depending on the quantity the data represents, the gridding-algorithm has to meet different requirements. For example, histogrammed physical quantities such as mass or energy have to be re-binned in order to conserve the overall integral. Moreover, if the quantity is positive definite, negative sampling values should be avoided. The gridding process requires a re-distribution of the original data set to a user-requested grid according to a distribution function. The distribution function can be determined on the basis of the given data by interpolation methods. In general, accurate interpolation with respect to multiple boundary conditions of heavily fluctuating data requires polynomial interpolation functions of second or even higher order. However, this may result in unrealistic deviations (overshoots or undershoots) of the interpolation function from the data. Accordingly, the re-sampled data may overestimate or underestimate the given data by a significant amount. The gridding-algorithm presented in this work was developed in order to overcome these problems. Instead of a straightforward interpolation of the given data using high-order polynomials, a parametrized Hermitian interpolation curve was used to approximate the integrated data set. A single parameter is determined by which the user can control the behavior of the interpolation function, i.e. the amount of overshoot and undershoot. Furthermore, it is shown how the algorithm can be extended to multidimensional grids. The algorithm was compared to commonly used gridding-algorithms using linear and cubic interpolation functions. It is shown that such interpolation functions may overestimate or underestimate the source data by about 10-20%, while the new algorithm can be tuned to significantly reduce these interpolation errors. The accuracy of the new algorithm was tested on a series of x-ray CT-images (head and neck, lung, pelvis). The new algorithm significantly improves the accuracy of the sampled images in terms of the mean square error and a quality index introduced by Wang and Bovik (2002 IEEE Signal Process. Lett. 9 81-4).
Virtual Sensor Web Architecture
NASA Astrophysics Data System (ADS)
Bose, P.; Zimdars, A.; Hurlburt, N.; Doug, S.
2006-12-01
NASA envisions the development of smart sensor webs, intelligent and integrated observation network that harness distributed sensing assets, their associated continuous and complex data sets, and predictive observation processing mechanisms for timely, collaborative hazard mitigation and enhanced science productivity and reliability. This paper presents Virtual Sensor Web Infrastructure for Collaborative Science (VSICS) Architecture for sustained coordination of (numerical and distributed) model-based processing, closed-loop resource allocation, and observation planning. VSICS's key ideas include i) rich descriptions of sensors as services based on semantic markup languages like OWL and SensorML; ii) service-oriented workflow composition and repair for simple and ensemble models; event-driven workflow execution based on event-based and distributed workflow management mechanisms; and iii) development of autonomous model interaction management capabilities providing closed-loop control of collection resources driven by competing targeted observation needs. We present results from initial work on collaborative science processing involving distributed services (COSEC framework) that is being extended to create VSICS.
EDDA: An Efficient Distributed Data Replication Algorithm in VANETs.
Zhu, Junyu; Huang, Chuanhe; Fan, Xiying; Guo, Sipei; Fu, Bin
2018-02-10
Efficient data dissemination in vehicular ad hoc networks (VANETs) is a challenging issue due to the dynamic nature of the network. To improve the performance of data dissemination, we study distributed data replication algorithms in VANETs for exchanging information and computing in an arbitrarily-connected network of vehicle nodes. To achieve low dissemination delay and improve the network performance, we control the number of message copies that can be disseminated in the network and then propose an efficient distributed data replication algorithm (EDDA). The key idea is to let the data carrier distribute the data dissemination tasks to multiple nodes to speed up the dissemination process. We calculate the number of communication stages for the network to enter into a balanced status and show that the proposed distributed algorithm can converge to a consensus in a small number of communication stages. Most of the theoretical results described in this paper are to study the complexity of network convergence. The lower bound and upper bound are also provided in the analysis of the algorithm. Simulation results show that the proposed EDDA can efficiently disseminate messages to vehicles in a specific area with low dissemination delay and system overhead.
EDDA: An Efficient Distributed Data Replication Algorithm in VANETs
Zhu, Junyu; Huang, Chuanhe; Fan, Xiying; Guo, Sipei; Fu, Bin
2018-01-01
Efficient data dissemination in vehicular ad hoc networks (VANETs) is a challenging issue due to the dynamic nature of the network. To improve the performance of data dissemination, we study distributed data replication algorithms in VANETs for exchanging information and computing in an arbitrarily-connected network of vehicle nodes. To achieve low dissemination delay and improve the network performance, we control the number of message copies that can be disseminated in the network and then propose an efficient distributed data replication algorithm (EDDA). The key idea is to let the data carrier distribute the data dissemination tasks to multiple nodes to speed up the dissemination process. We calculate the number of communication stages for the network to enter into a balanced status and show that the proposed distributed algorithm can converge to a consensus in a small number of communication stages. Most of the theoretical results described in this paper are to study the complexity of network convergence. The lower bound and upper bound are also provided in the analysis of the algorithm. Simulation results show that the proposed EDDA can efficiently disseminate messages to vehicles in a specific area with low dissemination delay and system overhead. PMID:29439443
Smart-DS: Synthetic Models for Advanced, Realistic Testing: Distribution Systems and Scenarios
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnan, Venkat K; Palmintier, Bryan S; Hodge, Brian S
The National Renewable Energy Laboratory (NREL) in collaboration with Massachusetts Institute of Technology (MIT), Universidad Pontificia Comillas (Comillas-IIT, Spain) and GE Grid Solutions, is working on an ARPA-E GRID DATA project, titled Smart-DS, to create: 1) High-quality, realistic, synthetic distribution network models, and 2) Advanced tools for automated scenario generation based on high-resolution weather data and generation growth projections. Through these advancements, the Smart-DS project is envisioned to accelerate the development, testing, and adoption of advanced algorithms, approaches, and technologies for sustainable and resilient electric power systems, especially in the realm of U.S. distribution systems. This talk will present themore » goals and overall approach of the Smart-DS project, including the process of creating the synthetic distribution datasets using reference network model (RNM) and the comprehensive validation process to ensure network realism, feasibility, and applicability to advanced use cases. The talk will provide demonstrations of early versions of synthetic models, along with the lessons learnt from expert engagements to enhance future iterations. Finally, the scenario generation framework, its development plans, and co-ordination with GRID DATA repository teams to house these datasets for public access will also be discussed.« less
NASA Astrophysics Data System (ADS)
Benz, N.; Bartlow, N. M.
2017-12-01
The addition of borehole strainmeter (BSM) to cGPS time series inversions can yield more precise slip distributions at the subduction interface during episodic tremor and slip (ETS) events in the Cascadia subduction zone. Traditionally very noisy BSM data has not been easy to incorporate until recently, but developments in processing noise, re-orientation of strain components, removal of tidal, hydrologic, and atmospheric signals have made this additional source of data viable (Roeloffs, 2010). The major advantage with BSMs is their sensitivity to spatial derivatives in slip, which is valuable for investigating the ETS nucleation process and stress changes on the plate interface due to ETS. Taking advantage of this, we simultaneously invert PBO GPS and cleaned BSM time series with the Network Inversion Filter (Segall and Matthews, 1997) for slip distribution and slip rate during selected Cascadia ETS events. Stress distributions are also calculated for the plate interface using these inversion results to estimate the amount of stress change during an ETS event. These calculations are performed with and without the utilization of BSM time series, highlighting the role of BSM data in constraining slip and stress.
Particle formation in the emulsion-solvent evaporation process.
Staff, Roland H; Schaeffel, David; Turshatov, Andrey; Donadio, Davide; Butt, Hans-Jürgen; Landfester, Katharina; Koynov, Kaloian; Crespy, Daniel
2013-10-25
The mechanism of particle formation from submicrometer emulsion droplets by solvent evaporation is revisited. A combination of dynamic light scattering, fluorescence resonance energy transfer, zeta potential measurements, and fluorescence cross-correlation spectroscopy is used to analyze the colloids during the evaporation process. It is shown that a combination of different methods yields reliable and quantitative data for describing the fate of the droplets during the process. The results indicate that coalescence plays a minor role during the process; the relatively large size distribution of the obtained polymer colloids can be explained by the droplet distribution after their formation. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
About Distributed Simulation-based Optimization of Forming Processes using a Grid Architecture
NASA Astrophysics Data System (ADS)
Grauer, Manfred; Barth, Thomas
2004-06-01
Permanently increasing complexity of products and their manufacturing processes combined with a shorter "time-to-market" leads to more and more use of simulation and optimization software systems for product design. Finding a "good" design of a product implies the solution of computationally expensive optimization problems based on the results of simulation. Due to the computational load caused by the solution of these problems, the requirements on the Information&Telecommunication (IT) infrastructure of an enterprise or research facility are shifting from stand-alone resources towards the integration of software and hardware resources in a distributed environment for high-performance computing. Resources can either comprise software systems, hardware systems, or communication networks. An appropriate IT-infrastructure must provide the means to integrate all these resources and enable their use even across a network to cope with requirements from geographically distributed scenarios, e.g. in computational engineering and/or collaborative engineering. Integrating expert's knowledge into the optimization process is inevitable in order to reduce the complexity caused by the number of design variables and the high dimensionality of the design space. Hence, utilization of knowledge-based systems must be supported by providing data management facilities as a basis for knowledge extraction from product data. In this paper, the focus is put on a distributed problem solving environment (PSE) capable of providing access to a variety of necessary resources and services. A distributed approach integrating simulation and optimization on a network of workstations and cluster systems is presented. For geometry generation the CAD-system CATIA is used which is coupled with the FEM-simulation system INDEED for simulation of sheet-metal forming processes and the problem solving environment OpTiX for distributed optimization.
Middleware for big data processing: test results
NASA Astrophysics Data System (ADS)
Gankevich, I.; Gaiduchok, V.; Korkhov, V.; Degtyarev, A.; Bogdanov, A.
2017-12-01
Dealing with large volumes of data is resource-consuming work which is more and more often delegated not only to a single computer but also to a whole distributed computing system at once. As the number of computers in a distributed system increases, the amount of effort put into effective management of the system grows. When the system reaches some critical size, much effort should be put into improving its fault tolerance. It is difficult to estimate when some particular distributed system needs such facilities for a given workload, so instead they should be implemented in a middleware which works efficiently with a distributed system of any size. It is also difficult to estimate whether a volume of data is large or not, so the middleware should also work with data of any volume. In other words, the purpose of the middleware is to provide facilities that adapt distributed computing system for a given workload. In this paper we introduce such middleware appliance. Tests show that this middleware is well-suited for typical HPC and big data workloads and its performance is comparable with well-known alternatives.
Mohammed, Emad A; Far, Behrouz H; Naugler, Christopher
2014-01-01
The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called "big data" challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. THE MAPREDUCE PROGRAMMING FRAMEWORK USES TWO TASKS COMMON IN FUNCTIONAL PROGRAMMING: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.
NASA Technical Reports Server (NTRS)
Mccutchen, D. K.; Brose, J. F.; Palm, W. E.
1982-01-01
One nemesis of the structural dynamist is the tedious task of reviewing large quantities of data. This data, obtained from various types of instrumentation, may be represented by oscillogram records, root-mean-squared (rms) time histories, power spectral densities, shock spectra, 1/3 octave band analyses, and various statistical distributions. In an attempt to reduce the laborious task of manually reviewing all of the space shuttle orbiter wideband frequency-modulated (FM) analog data, an automated processing system was developed to perform the screening process based upon predefined or predicted threshold criteria.
NASA Technical Reports Server (NTRS)
Coker, A. E.; Marshall, R.; Thomson, N. S.
1977-01-01
Data were collected near Bartow, Florida, for the purpose of studying land collapse phenomena using remote sensing techniques. Data obtained using the multispectral scanner system consisted of various combinations of 18 spectral bands ranging from 0.4-14.0 microns and several types of photography. The multispectral data were processed on a special-purpose analog computer in order to detect moisture-stressed vegetation and to enhance terrain surface temperatures. The processed results were printed on film to show the patterns of distribution of the proposed hydrogeologic indicators.
Football fever: goal distributions and non-Gaussian statistics
NASA Astrophysics Data System (ADS)
Bittner, E.; Nußbaumer, A.; Janke, W.; Weigel, M.
2009-02-01
Analyzing football score data with statistical techniques, we investigate how the not purely random, but highly co-operative nature of the game is reflected in averaged properties such as the probability distributions of scored goals for the home and away teams. As it turns out, especially the tails of the distributions are not well described by the Poissonian or binomial model resulting from the assumption of uncorrelated random events. Instead, a good effective description of the data is provided by less basic distributions such as the negative binomial one or the probability densities of extreme value statistics. To understand this behavior from a microscopical point of view, however, no waiting time problem or extremal process need be invoked. Instead, modifying the Bernoulli random process underlying the Poissonian model to include a simple component of self-affirmation seems to describe the data surprisingly well and allows to understand the observed deviation from Gaussian statistics. The phenomenological distributions used before can be understood as special cases within this framework. We analyzed historical football score data from many leagues in Europe as well as from international tournaments, including data from all past tournaments of the “FIFA World Cup” series, and found the proposed models to be applicable rather universally. In particular, here we analyze the results of the German women’s premier football league and consider the two separate German men’s premier leagues in the East and West during the cold war times as well as the unified league after 1990 to see how scoring in football and the component of self-affirmation depend on cultural and political circumstances.
Gruendling, Till; Guilhaus, Michael; Barner-Kowollik, Christopher
2008-09-15
We report on the successful application of size exclusion chromatography (SEC) combined with electrospray ionization mass spectrometry (ESI-MS) and refractive index (RI) detection for the determination of accurate molecular weight distributions of synthetic polymers, corrected for chromatographic band broadening. The presented method makes use of the ability of ESI-MS to accurately depict the peak profiles and retention volumes of individual oligomers eluting from the SEC column, whereas quantitative information on the absolute concentration of oligomers is obtained from the RI-detector only. A sophisticated computational algorithm based on the maximum entropy principle is used to process the data gained by both detectors, yielding an accurate molecular weight distribution, corrected for chromatographic band broadening. Poly(methyl methacrylate) standards with molecular weights up to 10 kDa serve as model compounds. Molecular weight distributions (MWDs) obtained by the maximum entropy procedure are compared to MWDs, which were calculated by a conventional calibration of the SEC-retention time axis with peak retention data obtained from the mass spectrometer. Comparison showed that for the employed chromatographic system, distributions below 7 kDa were only weakly influenced by chromatographic band broadening. However, the maximum entropy algorithm could successfully correct the MWD of a 10 kDa standard for band broadening effects. Molecular weight averages were between 5 and 14% lower than the manufacturer stated data obtained by classical means of calibration. The presented method demonstrates a consistent approach for analyzing data obtained by coupling mass spectrometric detectors and concentration sensitive detectors to polymer liquid chromatography.
Design and implementation of a distributed large-scale spatial database system based on J2EE
NASA Astrophysics Data System (ADS)
Gong, Jianya; Chen, Nengcheng; Zhu, Xinyan; Zhang, Xia
2003-03-01
With the increasing maturity of distributed object technology, CORBA, .NET and EJB are universally used in traditional IT field. However, theories and practices of distributed spatial database need farther improvement in virtue of contradictions between large scale spatial data and limited network bandwidth or between transitory session and long transaction processing. Differences and trends among of CORBA, .NET and EJB are discussed in details, afterwards the concept, architecture and characteristic of distributed large-scale seamless spatial database system based on J2EE is provided, which contains GIS client application, web server, GIS application server and spatial data server. Moreover the design and implementation of components of GIS client application based on JavaBeans, the GIS engine based on servlet, the GIS Application server based on GIS enterprise JavaBeans(contains session bean and entity bean) are explained.Besides, the experiments of relation of spatial data and response time under different conditions are conducted, which proves that distributed spatial database system based on J2EE can be used to manage, distribute and share large scale spatial data on Internet. Lastly, a distributed large-scale seamless image database based on Internet is presented.
NASA Astrophysics Data System (ADS)
Tritscher, Torsten; Koched, Amine; Han, Hee-Siew; Filimundi, Eric; Johnson, Tim; Elzey, Sherrie; Avenido, Aaron; Kykal, Carsten; Bischof, Oliver F.
2015-05-01
Electrical mobility classification (EC) followed by Condensation Particle Counter (CPC) detection is the technique combined in Scanning Mobility Particle Sizers(SMPS) to retrieve nanoparticle size distributions in the range from 2.5 nm to 1 μm. The detectable size range of SMPS systems can be extended by the addition of an Optical Particle Sizer(OPS) that covers larger sizes from 300 nm to 10 μm. This optical sizing method reports an optical equivalent diameter, which is often different from the electrical mobility diameter measured by the standard SMPS technique. Multi-Instrument Manager (MIMTM) software developed by TSI incorporates algorithms that facilitate merging SMPS data sets with data based on optical equivalent diameter to compile single, wide-range size distributions. Here we present MIM 2.0, the next-generation of the data merging tool that offers many advanced features for data merging and post-processing. MIM 2.0 allows direct data acquisition with OPS and NanoScan SMPS instruments to retrieve real-time particle size distributions from 10 nm to 10 μm, which we show in a case study at a fireplace. The merged data can be adjusted using one of the merging options, which automatically determines an overall aerosol effective refractive index. As a result an indirect and average characterization of aerosol optical and shape properties is possible. The merging tool allows several pre-settings, data averaging and adjustments, as well as the export of data sets and fitted graphs. MIM 2.0 also features several post-processing options for SMPS data and differences can be visualized in a multi-peak sample over a narrow size range.
Facility Monitoring: A Qualitative Theory for Sensor Fusion
NASA Technical Reports Server (NTRS)
Figueroa, Fernando
2001-01-01
Data fusion and sensor management approaches have largely been implemented with centralized and hierarchical architectures. Numerical and statistical methods are the most common data fusion methods found in these systems. Given the proliferation and low cost of processing power, there is now an emphasis on designing distributed and decentralized systems. These systems use analytical/quantitative techniques or qualitative reasoning methods for date fusion.Based on other work by the author, a sensor may be treated as a highly autonomous (decentralized) unit. Each highly autonomous sensor (HAS) is capable of extracting qualitative behaviours from its data. For example, it detects spikes, disturbances, noise levels, off-limit excursions, step changes, drift, and other typical measured trends. In this context, this paper describes a distributed sensor fusion paradigm and theory where each sensor in the system is a HAS. Hence, given the reach qualitative information from each HAS, a paradigm and formal definitions are given so that sensors and processes can reason and make decisions at the qualitative level. This approach to sensor fusion makes it possible the implementation of intuitive (effective) methods to monitor, diagnose, and compensate processes/systems and their sensors. This paradigm facilitates a balanced distribution of intelligence (code and/or hardware) to the sensor level, the process/system level, and a higher controller level. The primary application of interest is in intelligent health management of rocket engine test stands.
Using Bayesian networks to support decision-focused information retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lehner, P.; Elsaesser, C.; Seligman, L.
This paper has described an approach to controlling the process of pulling data/information from distributed data bases in a way that is specific to a persons specific decision making context. Our prototype implementation of this approach uses a knowledge-based planner to generate a plan, an automatically constructed Bayesian network to evaluate the plan, specialized processing of the network to derive key information items that would substantially impact the evaluation of the plan (e.g., determine that replanning is needed), automated construction of Standing Requests for Information (SRIs) which are automated functions that monitor changes and trends in distributed data base thatmore » are relevant to the key information items. This emphasis of this paper is on how Bayesian networks are used.« less
2014-01-01
The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields. PMID:25383096
The Suomi National Polar-Orbiting Partnership (SNPP): Continuing NASA Research and Applications
NASA Technical Reports Server (NTRS)
Butler, James; Gleason, James; Jedlovec, Gary; Coronado, Patrick
2015-01-01
The Suomi National Polar-orbiting Partnership (SNPP) satellite was successfully launched into a polar orbit on October 28, 2011 carrying 5 remote sensing instruments designed to provide data to improve weather forecasts and to increase understanding of long-term climate change. SNPP provides operational continuity of satellite-based observations for NOAA's Polar-orbiting Operational Environmental Satellites (POES) and continues the long-term record of climate quality observations established by NASA's Earth Observing System (EOS) satellites. In the 2003 to 2011 pre-launch timeframe, NASA's SNPP Science Team assessed the adequacy of the operational Raw Data Records (RDRs), Sensor Data Records (SDRs), and Environmental Data Records (EDRs) from the SNPP instruments for use in NASA Earth Science research, examined the operational algorithms used to produce those data records, and proposed a path forward for the production of climate quality products from SNPP. In order to perform these tasks, a distributed data system, the NASA Science Data Segment (SDS), ingested RDRs, SDRs, and EDRs from the NOAA Archive and Distribution and Interface Data Processing Segments, ADS and IDPS, respectively. The SDS also obtained operational algorithms for evaluation purposes from the NOAA Government Resource for Algorithm Verification, Independent Testing and Evaluation (GRAVITE). Within the NASA SDS, five Product Evaluation and Test Elements (PEATEs) received, ingested, and stored data and performed NASA's data processing, evaluation, and analysis activities. The distributed nature of this data distribution system was established by physically housing each PEATE within one of five Climate Analysis Research Systems (CARS) located at either at a NASA or a university institution. The CARS were organized around 5 key EDRs directly in support of the following NASA Earth Science focus areas: atmospheric sounding, ocean, land, ozone, and atmospheric composition products. The PEATES provided the system level interface with members of the NASA SNPP Science Team and other science investigators within each CARS. A sixth Earth Radiation Budget CARS was established at NASA Langley Research Center (NASA LaRC) to support instrument performance, data evaluation, and analysis for the SNPP Clouds and the Earth's Radiant Budget Energy System (CERES) instrument. Following the 2011 launch of SNPP, spacecraft commissioning, and instrument activation, the NASA SNPP Science Team evaluated the operational RDRs, SDRs, and EDRs produced by the NOAA ADS and IDPS. A key part in that evaluation was the NASA Science Team's independent processing of operational RDRs and SDRs to EDRs using the latest NASA science algorithms. The NASA science evaluation was completed in the December 2012 to April 2014 timeframe with the release of a series of NASA Science Team Discipline Reports. In summary, these reports indicated that the RDRs produced by the SNPP instruments were of sufficiently high quality to be used to create data products suitable for NASA Earth System science and applications. However, the quality of the SDRs and EDRs were found to vary greatly when considering suitability for NASA science. The need for improvements in operational algorithms, adoption of different algorithmic approaches, greater monitoring of on-orbit instrument calibration, greater attention to data product validation, and data reprocessing were prominent findings in the reports. In response to these findings, NASA, in late 2013, directed the NASA SNPP Science Team to use SNPP instrument data to develop data products of sufficiently high quality to enable the continuation of EOS time series data records and to develop innovative, practical applications of SNPP data. This direction necessitated a transition of the SDS data system from its pre-launch assessment mode to one of full data processing and production. To do this, the PEATES, which served as NASA's data product testing environment during the prelaunch and early on-orbit periods, were transitioned to Science Investigator-led Processing Systems (SIPS). The distributed data architecture was maintained in this new system by locating the SIPS at the same institutions at which the CARS and PEATES were located. The SIPS acquire raw SNPP instrument Level 0 (i.e. RDR) data over the full SNPP mission from the NOAA ADS and IDPS through the NASA SDS Data Distribution and Depository Element (SD3E). The SIPS process those data into NASA Level 1, Level 2, and global, gridded Level 3 standard products using peer-reviewed algorithms provided by members of the NASA Science Team. The SIPS work with the NASA SNPP Science Team in obtaining enhanced, refined, or alternate real-time algorithms to support the capabilities of the Direct Readout Laboratory (DRL). All data products, algorithm source codes, coefficients, and auxiliary data used in product generation are archived in an assigned NASA Distributed Active Archive Center (DAAC).
MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control
NASA Astrophysics Data System (ADS)
Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming
2017-09-01
Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.
NASA Astrophysics Data System (ADS)
van Lew, Baldur; Botha, Charl P.; Milles, Julien R.; Vrooman, Henri A.; van de Giessen, Martijn; Lelieveldt, Boudewijn P. F.
2015-03-01
The cohort size required in epidemiological imaging genetics studies often mandates the pooling of data from multiple hospitals. Patient data, however, is subject to strict privacy protection regimes, and physical data storage may be legally restricted to a hospital network. To enable biomarker discovery, fast data access and interactive data exploration must be combined with high-performance computing resources, while respecting privacy regulations. We present a system using fast and inherently secure light-paths to access distributed data, thereby obviating the need for a central data repository. A secure private cloud computing framework facilitates interactive, computationally intensive exploration of this geographically distributed, privacy sensitive data. As a proof of concept, MRI brain imaging data hosted at two remote sites were processed in response to a user command at a third site. The system was able to automatically start virtual machines, run a selected processing pipeline and write results to a user accessible database, while keeping data locally stored in the hospitals. Individual tasks took approximately 50% longer compared to a locally hosted blade server but the cloud infrastructure reduced the total elapsed time by a factor of 40 using 70 virtual machines in the cloud. We demonstrated that the combination light-path and private cloud is a viable means of building an analysis infrastructure for secure data analysis. The system requires further work in the areas of error handling, load balancing and secure support of multiple users.
Li, Der-Chiang; Hu, Susan C; Lin, Liang-Sian; Yeh, Chun-Wu
2017-01-01
It is difficult for learning models to achieve high classification performances with imbalanced data sets, because with imbalanced data sets, when one of the classes is much larger than the others, most machine learning and data mining classifiers are overly influenced by the larger classes and ignore the smaller ones. As a result, the classification algorithms often have poor learning performances due to slow convergence in the smaller classes. To balance such data sets, this paper presents a strategy that involves reducing the sizes of the majority data and generating synthetic samples for the minority data. In the reducing operation, we use the box-and-whisker plot approach to exclude outliers and the Mega-Trend-Diffusion method to find representative data from the majority data. To generate the synthetic samples, we propose a counterintuitive hypothesis to find the distributed shape of the minority data, and then produce samples according to this distribution. Four real datasets were used to examine the performance of the proposed approach. We used paired t-tests to compare the Accuracy, G-mean, and F-measure scores of the proposed data pre-processing (PPDP) method merging in the D3C method (PPDP+D3C) with those of the one-sided selection (OSS), the well-known SMOTEBoost (SB) study, and the normal distribution-based oversampling (NDO) approach, and the proposed data pre-processing (PPDP) method. The results indicate that the classification performance of the proposed approach is better than that of above-mentioned methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baum, B.A.; Barkstrom, B.R.
1993-04-01
The Earth Observing System (EOS) will collect data from a large number of satellite-borne instruments, beginning later in this decade, to make data accessible to the scientific community, NASA will build an EOS Data and Information System (EOSDIS). As an initial effort to accelerate the development of EOSDIS and to gain experience with such an information system, NASA and other agencies are working on a prototype system called Version O (VO). This effort will provide improved access to pre-EOS earth science data throughout the early EOSDIS period. Based on recommendations from the EOSDIS Science Advisory Panel, EOSDIS will have severalmore » distributed active archive centers (DAACs). Each DAAC will specialize in particular data sets. This paper describes work at the NASA Langley Research Center's (LaRC) DAAC. The Version 0 Langley DAAC began archiving and distributing existing data sets pertaining to the earth's radiation budget, clouds, aerosols, and tropospheric chemistry in late 1992. The primary goals of the LaRC VO effort are the following: (1) Enhance scientific use of existing data; (2) Develop institutional expertise in maintaining and distributing data; (3) Use institutional capability for processing data from previous missions such as the Earth Radiation Budget Experiment and the Stratospheric Aerosol and Gas Experiment to prepare for processing future EOS satellite data; (4) Encourage cooperative interagency and international involvement with data sets and research; and (5) Incorporate technological hardware and software advances quickly.« less
Performance of the Landsat-Data Collection System in a Total System Context
NASA Technical Reports Server (NTRS)
Paulson, R. W. (Principal Investigator); Merk, C. F.
1975-01-01
The author has identified the following significant results. This experiment was, and continues to be, an integration of the LANDSAT-DCS with the data collection and processing system of the Geological Survey. Although an experimental demonstration, it was a successful integration of a satellite relay system that is capable of continental data collection, and an existing governmental nationwide operational data processing and distributing networks. The Survey's data processing system uses a large general purpose computer with insufficient redundancy for 24-hour a day, 7 day a week operation. This is significant, but soluble obstacle to converting the experimental integration of the system to an operational integration.
NASA Astrophysics Data System (ADS)
Zhang, Yan; Liu, Hong; Chen, Bin; Zheng, Hongmei; Li, Yating
2014-06-01
Discovering ways in which to increase the sustainability of the metabolic processes involved in urbanization has become an urgent task for urban design and management in China. As cities are analogous to living organisms, the disorders of their metabolic processes can be regarded as the cause of "urban disease". Therefore, identification of these causes through metabolic process analysis and ecological element distribution through the urban ecosystem's compartments will be helpful. By using Beijing as an example, we have compiled monetary input-output tables from 1997, 2000, 2002, 2005, and 2007 and calculated the intensities of the embodied ecological elements to compile the corresponding implied physical input-output tables. We then divided Beijing's economy into 32 compartments and analyzed the direct and indirect ecological intensities embodied in the flows of ecological elements through urban metabolic processes. Based on the combination of input-output tables and ecological network analysis, the description of multiple ecological elements transferred among Beijing's industrial compartments and their distribution has been refined. This hybrid approach can provide a more scientific basis for management of urban resource flows. In addition, the data obtained from distribution characteristics of ecological elements may provide a basic data platform for exploring the metabolic mechanism of Beijing.
Layerwise Monitoring of the Selective Laser Melting Process by Thermography
NASA Astrophysics Data System (ADS)
Krauss, Harald; Zeugner, Thomas; Zaeh, Michael F.
Selective Laser Melting is utilized to build parts directly from CAD data. In this study layerwise monitoring of the temperature distribution is used to gather information about the process stability and the resulting part quality. The heat distribution varies with different kinds of parameters including scan vector length, laser power, layer thickness and inter-part distance in the job layout. By integration of an off-axis mounted uncooled thermal detector, the solidification as well as the layer deposition are monitored and evaluated. This enables the identification of hot spots in an early stage during the solidification process and helps to avoid process interrupts. Potential quality indicators are derived from spatially resolved measurement data and are correlated to the resulting part properties. A model of heat dissipation is presented based on the measurement of the material response for varying heat input. Current results show the feasibility of process surveillance by thermography for a limited section of the building platform in a commercial system.
Wieland, Birgit; Ropte, Sven
2017-01-01
The production of rotor blades for wind turbines is still a predominantly manual process. Process simulation is an adequate way of improving blade quality without a significant increase in production costs. This paper introduces a module for tolerance simulation for rotor-blade production processes. The investigation focuses on the simulation of temperature distribution for one-sided, self-heated tooling and thick laminates. Experimental data from rotor-blade production and down-scaled laboratory tests are presented. Based on influencing factors that are identified, a physical model is created and implemented as a simulation. This provides an opportunity to simulate temperature and cure-degree distribution for two-dimensional cross sections. The aim of this simulation is to support production processes. Hence, it is modelled as an in situ simulation with direct input of temperature data and real-time capability. A monolithic part of the rotor blade, the main girder, is used as an example for presenting the results. PMID:28981458
Wieland, Birgit; Ropte, Sven
2017-10-05
The production of rotor blades for wind turbines is still a predominantly manual process. Process simulation is an adequate way of improving blade quality without a significant increase in production costs. This paper introduces a module for tolerance simulation for rotor-blade production processes. The investigation focuses on the simulation of temperature distribution for one-sided, self-heated tooling and thick laminates. Experimental data from rotor-blade production and down-scaled laboratory tests are presented. Based on influencing factors that are identified, a physical model is created and implemented as a simulation. This provides an opportunity to simulate temperature and cure-degree distribution for two-dimensional cross sections. The aim of this simulation is to support production processes. Hence, it is modelled as an in situ simulation with direct input of temperature data and real-time capability. A monolithic part of the rotor blade, the main girder, is used as an example for presenting the results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pisano, Cristian; Bacchetta, Alessandro; Delcarro, Filippo
We present a first attempt at a global fit of unpolarized quark transverse momentum dependent distribution and fragmentation functions from available data on semi-inclusive deep-inelastic scattering, Drell-Yan and $Z$ boson production processes. This analysis is performed in the low transverse momentum region, at leading order in perturbative QCD and with the inclusion of energy scale evolution effects at the next-to-leading logarithmic accuracy.
Parallel computing method for simulating hydrological processesof large rivers under climate change
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, Y.
2016-12-01
Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Designing for Peta-Scale in the LSST Database
NASA Astrophysics Data System (ADS)
Kantor, J.; Axelrod, T.; Becla, J.; Cook, K.; Nikolaev, S.; Gray, J.; Plante, R.; Nieto-Santisteban, M.; Szalay, A.; Thakar, A.
2007-10-01
The Large Synoptic Survey Telescope (LSST), a proposed ground-based 8.4 m telescope with a 10 deg^2 field of view, will generate 15 TB of raw images every observing night. When calibration and processed data are added, the image archive, catalogs, and meta-data will grow 15 PB yr^{-1} on average. The LSST Data Management System (DMS) must capture, process, store, index, replicate, and provide open access to this data. Alerts must be triggered within 30 s of data acquisition. To do this in real-time at these data volumes will require advances in data management, database, and file system techniques. This paper describes the design of the LSST DMS and emphasizes features for peta-scale data. The LSST DMS will employ a combination of distributed database and file systems, with schema, partitioning, and indexing oriented for parallel operations. Image files are stored in a distributed file system with references to, and meta-data from, each file stored in the databases. The schema design supports pipeline processing, rapid ingest, and efficient query. Vertical partitioning reduces disk input/output requirements, horizontal partitioning allows parallel data access using arrays of servers and disks. Indexing is extensive, utilizing both conventional RAM-resident indexes and column-narrow, row-deep tag tables/covering indices that are extracted from tables that contain many more attributes. The DMS Data Access Framework is encapsulated in a middleware framework to provide a uniform service interface to all framework capabilities. This framework will provide the automated work-flow, replication, and data analysis capabilities necessary to make data processing and data quality analysis feasible at this scale.
A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset.
Kamal, Sarwar; Ripon, Shamim Hasnat; Dey, Nilanjan; Ashour, Amira S; Santhi, V
2016-07-01
In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential. In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor (K-NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches. To test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy. The obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Applications of Geomatics in Surface Mining
NASA Astrophysics Data System (ADS)
Blachowski, Jan; Górniak-Zimroz, Justyna; Milczarek, Wojciech; Pactwa, Katarzyna
2017-12-01
In terms of method of extracting mineral from deposit, mining can be classified into: surface, underground, and borehole mining. Surface mining is a form of mining, in which the soil and the rock covering the mineral deposits are removed. Types of surface mining include mainly strip and open-cast methods, as well as quarrying. Tasks associated with surface mining of minerals include: resource estimation and deposit documentation, mine planning and deposit access, mine plant development, extraction of minerals from deposits, mineral and waste processing, reclamation and reclamation of former mining grounds. At each stage of mining, geodata describing changes occurring in space during the entire life cycle of surface mining project should be taken into consideration, i.e. collected, analysed, processed, examined, distributed. These data result from direct (e.g. geodetic) and indirect (i.e. remote or relative) measurements and observations including airborne and satellite methods, geotechnical, geological and hydrogeological data, and data from other types of sensors, e.g. located on mining equipment and infrastructure, mine plans and maps. Management of such vast sources and sets of geodata, as well as information resulting from processing, integrated analysis and examining such data can be facilitated with geomatic solutions. Geomatics is a discipline of gathering, processing, interpreting, storing and delivering spatially referenced information. Thus, geomatics integrates methods and technologies used for collecting, management, processing, visualizing and distributing spatial data. In other words, its meaning covers practically every method and tool from spatial data acquisition to distribution. In this work examples of application of geomatic solutions in surface mining on representative case studies in various stages of mine operation have been presented. These applications include: prospecting and documenting mineral deposits, assessment of land accessibility for a potential large-scale surface mining project, modelling mineral deposit (granite) management, concept of a system for management of conveyor belt network technical condition, project of a geoinformation system of former mining terrains and objects, and monitoring and control of impact of surface mining on mine surroundings with satellite radar interferometry.
Field Ground Truthing Data Collector - a Mobile Toolkit for Image Analysis and Processing
NASA Astrophysics Data System (ADS)
Meng, X.
2012-07-01
Field Ground Truthing Data Collector is one of the four key components of the NASA funded ICCaRS project, being developed in Southeast Michigan. The ICCaRS ground truthing toolkit entertains comprehensive functions: 1) Field functions, including determining locations through GPS, gathering and geo-referencing visual data, laying out ground control points for AEROKAT flights, measuring the flight distance and height, and entering observations of land cover (and use) and health conditions of ecosystems and environments in the vicinity of the flight field; 2) Server synchronization functions, such as, downloading study-area maps, aerial photos and satellite images, uploading and synchronizing field-collected data with the distributed databases, calling the geospatial web services on the server side to conduct spatial querying, image analysis and processing, and receiving the processed results in field for near-real-time validation; and 3) Social network communication functions for direct technical assistance and pedagogical support, e.g., having video-conference calls in field with the supporting educators, scientists, and technologists, participating in Webinars, or engaging discussions with other-learning portals. This customized software package is being built on Apple iPhone/iPad and Google Maps/Earth. The technical infrastructures, data models, coupling methods between distributed geospatial data processing and field data collector tools, remote communication interfaces, coding schema, and functional flow charts will be illustrated and explained at the presentation. A pilot case study will be also demonstrated.
TERRA/MODIS Data Products and Data Management at the GES-DAAC
NASA Astrophysics Data System (ADS)
Sharma, A. K.; Ahmad, S.; Eaton, P.; Koziana, J.; Leptoukh, G.; Ouzounov, D.; Savtchenko, A.; Serafino, G.; Sikder, M.; Zhou, B.
2001-05-01
Since February 2000, the Earth Sciences Distributed Active Archive Center (GES-DAAC) at the NASA/Goddard Space Flight Center has been successfully ingesting, processing, archiving, and distributing the Moderate Resolution Imaging Spectroradiometer (MODIS) data. MODIS is the key instrument aboard the Terra satellite, viewing the entire Earth's surface every 1 to 2 days, acquiring data in 36 channels in the visible and infrared spectral bands (0.4 to 14.4 microns). Higher resolution (250m, 500m, and 1km pixel) data are improving our understanding of global dynamics and processes occurring on the land, in the oceans, and in the lower atmosphere and will play a vital role in the future development of validated, global, interactive Earth-system models. MODIS calibrated and uncalibrated radiances, and geolocation products were released to the public in April 2000, and a suite of oceans products and an entire suite of atmospheric products were released by early January 2001. The suite of ocean products is grouped into three categories Ocean Color, SST and Primary Productivity. The suite of atmospheric products includes Aerosol, Total Precipitable Water, Cloud Optical and Physical properties, Atmospheric Profiles and Cloud Mask. The MODIS Data Support Team (MDST) at the GES-DAAC has been providing support for enabling basic scientific research and assistance in accessing the scientific data and information to the Earth Science User Community. Support is also provided for data formats (HDF-EOS), information on visualization tools, documentation for data products, information on the scientific content of products and metadata. Visit the MDST website at http://daac.gsfc.nasa.gov/CAMPAIGN_DOCS/MODIS/index.html The task to process archive and distribute enormous volumes of MODIS data to users (more than 0.5 TB a day) has led to the development of an unique world wide web based GES DAAC Search and Order system http://acdisx.gsfc.nasa.gov/data/, data handling software and tools, as well as a FTP site that contains sample of browse images and MODIS data products. This paper is intended to inform the user community about the data system and services available at the GES-DAAC in support of these information-rich data products. MDST provides support to MODIS data users to access and process data and information for research, applications and educational purposes. This paper will present an overview of the MODIS data products released to public including the suite of atmosphere and oceans data products that can be ordered from the GES-DAAC. Different mechanisms for search and ordering the data, determining data product sizes, data distribution policy, User Assistance System (UAS), and data subscription services will be described.
WPS mediation: An approach to process geospatial data on different computing backends
NASA Astrophysics Data System (ADS)
Giuliani, Gregory; Nativi, Stefano; Lehmann, Anthony; Ray, Nicolas
2012-10-01
The OGC Web Processing Service (WPS) specification allows generating information by processing distributed geospatial data made available through Spatial Data Infrastructures (SDIs). However, current SDIs have limited analytical capacities and various problems emerge when trying to use them in data and computing-intensive domains such as environmental sciences. These problems are usually not or only partially solvable using single computing resources. Therefore, the Geographic Information (GI) community is trying to benefit from the superior storage and computing capabilities offered by distributed computing (e.g., Grids, Clouds) related methods and technologies. Currently, there is no commonly agreed approach to grid-enable WPS. No implementation allows one to seamlessly execute a geoprocessing calculation following user requirements on different computing backends, ranging from a stand-alone GIS server up to computer clusters and large Grid infrastructures. Considering this issue, this paper presents a proof of concept by mediating different geospatial and Grid software packages, and by proposing an extension of WPS specification through two optional parameters. The applicability of this approach will be demonstrated using a Normalized Difference Vegetation Index (NDVI) mediated WPS process, highlighting benefits, and issues that need to be further investigated to improve performances.
Moving code - Sharing geoprocessing logic on the Web
NASA Astrophysics Data System (ADS)
Müller, Matthias; Bernard, Lars; Kadner, Daniel
2013-09-01
Efficient data processing is a long-standing challenge in remote sensing. Effective and efficient algorithms are required for product generation in ground processing systems, event-based or on-demand analysis, environmental monitoring, and data mining. Furthermore, the increasing number of survey missions and the exponentially growing data volume in recent years have created demand for better software reuse as well as an efficient use of scalable processing infrastructures. Solutions that address both demands simultaneously have begun to slowly appear, but they seldom consider the possibility to coordinate development and maintenance efforts across different institutions, community projects, and software vendors. This paper presents a new approach to share, reuse, and possibly standardise geoprocessing logic in the field of remote sensing. Drawing from the principles of service-oriented design and distributed processing, this paper introduces moving-code packages as self-describing software components that contain algorithmic code and machine-readable descriptions of the provided functionality, platform, and infrastructure, as well as basic information about exploitation rights. Furthermore, the paper presents a lean publishing mechanism by which to distribute these packages on the Web and to integrate them in different processing environments ranging from monolithic workstations to elastic computational environments or "clouds". The paper concludes with an outlook toward community repositories for reusable geoprocessing logic and their possible impact on data-driven science in general.
Three Strategies for the Critical Use of Statistical Methods in Psychological Research
ERIC Educational Resources Information Center
Campitelli, Guillermo; Macbeth, Guillermo; Ospina, Raydonal; Marmolejo-Ramos, Fernando
2017-01-01
We present three strategies to replace the null hypothesis statistical significance testing approach in psychological research: (1) visual representation of cognitive processes and predictions, (2) visual representation of data distributions and choice of the appropriate distribution for analysis, and (3) model comparison. The three strategies…
One Way of Testing a Distributed Processor
NASA Technical Reports Server (NTRS)
Edstrom, R.; Kleckner, D.
1982-01-01
Launch processing for Space Shuttle is checked out, controlled, and monitored with new system. Entire system can be exercised by two computer programs--one in master console and other in each of operations consoles. Control program in each operations console detects change in status and begins task initiation. All of front-end processors are exercised from consoles through common data buffer, and all data are logged to processed-data recorder for posttest analysis.
An Intelligent Archive Testbed Incorporating Data Mining
NASA Technical Reports Server (NTRS)
Ramapriyan, H.; Isaac, D.; Yang, W.; Bonnlander, B.; Danks, D.
2009-01-01
Many significant advances have occurred during the last two decades in remote sensing instrumentation, computation, storage, and communication technology. A series of Earth observing satellites have been launched by U.S. and international agencies and have been operating and collecting global data on a regular basis. These advances have created a data rich environment for scientific research and applications. NASA s Earth Observing System (EOS) Data and Information System (EOSDIS) has been operational since August 1994 with support for pre-EOS data. Currently, EOSDIS supports all the EOS missions including Terra (1999), Aqua (2002), ICESat (2002) and Aura (2004). EOSDIS has been effectively capturing, processing and archiving several terabytes of standard data products each day. It has also been distributing these data products at a rate of several terabytes per day to a diverse and globally distributed user community (Ramapriyan et al. 2009). There are other NASA-sponsored data system activities including measurement-based systems such as the Ocean Data Processing System and the Precipitation Processing system, and several projects under the Research, Education and Applications Solutions Network (REASoN), Making Earth Science Data Records for Use in Research Environments (MEaSUREs), and the Advancing Collaborative Connections for Earth-Sun System Science (ACCESS) programs. Together, these activities provide a rich set of resources constituting a value chain for users to obtain data at various levels ranging from raw radiances to interdisciplinary model outputs. The result has been a significant leap in our understanding of the Earth systems that all humans depend on for their enjoyment, livelihood, and survival. The trend in the community today is towards many distributed sets of providers of data and services. Despite this, visions for the future include users being able to locate, fuse and utilize data with location transparency and high degree of interoperability, and being able to convert data to information and usable knowledge in an efficient, convenient manner, aided significantly by automation (Ramapriyan et al. 2004; NASA 2005). We can look upon the distributed provider environment with capabilities to convert data to information and to knowledge as an Intelligent Archive in the Context of a Knowledge Building system (IA-KBS). Some of the key capabilities of an IA-KBS are: Virtual Product Generation, Significant Event Detection, Automated Data Quality Assessment, Large-Scale Data Mining, Dynamic Feedback Loop, and Data Discovery and Efficient Requesting (Ramapriyan et al. 2004).
Streaming data analytics via message passing with application to graph algorithms
Plimpton, Steven J.; Shead, Tim
2014-05-06
The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of eithermore » message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.« less
A wavelet-based statistical analysis of FMRI data: I. motivation and data distribution modeling.
Dinov, Ivo D; Boscardin, John W; Mega, Michael S; Sowell, Elizabeth L; Toga, Arthur W
2005-01-01
We propose a new method for statistical analysis of functional magnetic resonance imaging (fMRI) data. The discrete wavelet transformation is employed as a tool for efficient and robust signal representation. We use structural magnetic resonance imaging (MRI) and fMRI to empirically estimate the distribution of the wavelet coefficients of the data both across individuals and spatial locations. An anatomical subvolume probabilistic atlas is used to tessellate the structural and functional signals into smaller regions each of which is processed separately. A frequency-adaptive wavelet shrinkage scheme is employed to obtain essentially optimal estimations of the signals in the wavelet space. The empirical distributions of the signals on all the regions are computed in a compressed wavelet space. These are modeled by heavy-tail distributions because their histograms exhibit slower tail decay than the Gaussian. We discovered that the Cauchy, Bessel K Forms, and Pareto distributions provide the most accurate asymptotic models for the distribution of the wavelet coefficients of the data. Finally, we propose a new model for statistical analysis of functional MRI data using this atlas-based wavelet space representation. In the second part of our investigation, we will apply this technique to analyze a large fMRI dataset involving repeated presentation of sensory-motor response stimuli in young, elderly, and demented subjects.
Standard deviation and standard error of the mean.
Lee, Dong Kyu; In, Junyong; Lee, Sangseok
2015-06-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.
Standard deviation and standard error of the mean
In, Junyong; Lee, Sangseok
2015-01-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923
Distributed trace using central performance counter memory
Satterfield, David L; Sexton, James C
2013-10-22
A plurality of processing cores, are central storage unit having at least memory connected in a daisy chain manner, forming a daisy chain ring layout on an integrated chip. At least one of the plurality of processing cores places trace data on the daisy chain connection for transmitting the trace data to the central storage unit, and the central storage unit detects the trace data and stores the trace data in the memory co-located in with the central storage unit.
Distributed trace using central performance counter memory
Satterfield, David L.; Sexton, James C.
2013-01-22
A plurality of processing cores, are central storage unit having at least memory connected in a daisy chain manner, forming a daisy chain ring layout on an integrated chip. At least one of the plurality of processing cores places trace data on the daisy chain connection for transmitting the trace data to the central storage unit, and the central storage unit detects the trace data and stores the trace data in the memory co-located in with the central storage unit.
Xu, Di; Chai, Meiyun; Dong, Zhujun; Rahman, Md Maksudur; Yu, Xi; Cai, Junmeng
2018-06-04
The kinetic compensation effect in the logistic distributed activation energy model (DAEM) for lignocellulosic biomass pyrolysis was investigated. The sum of square error (SSE) surface tool was used to analyze two theoretically simulated logistic DAEM processes for cellulose and xylan pyrolysis. The logistic DAEM coupled with the pattern search method for parameter estimation was used to analyze the experimental data of cellulose pyrolysis. The results showed that many parameter sets of the logistic DAEM could fit the data at different heating rates very well for both simulated and experimental processes, and a perfect linear relationship between the logarithm of the frequency factor and the mean value of the activation energy distribution was found. The parameters of the logistic DAEM can be estimated by coupling the optimization method and isoconversional kinetic methods. The results would be helpful for chemical kinetic analysis using DAEM. Copyright © 2018 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aaboud, M.; Aad, G.; Abbott, B.
The W boson angular distribution in events with high transverse momentum jets is measured using data collected by the ATLAS experiment from proton–proton collisions at a centre-of-mass energy √s=8 TeV at the Large Hadron Collider, corresponding to an integrated luminosity of 20.3 fb -1 . The focus is on the contributions to W+jets processes from real W emission, which is achieved by studying events where a muon is observed close to a high transverse momentum jet. At small angular separations, these contributions are expected to be large. Various theoretical models of this process are compared to the data inmore » terms of the absolute cross-section and the angular distributions of the muon from the leptonic W decay.« less
Rockfall travel distances theoretical distributions
NASA Astrophysics Data System (ADS)
Jaboyedoff, Michel; Derron, Marc-Henri; Pedrazzini, Andrea
2017-04-01
The probability of propagation of rockfalls is a key part of hazard assessment, because it permits to extrapolate the probability of propagation of rockfall either based on partial data or simply theoretically. The propagation can be assumed frictional which permits to describe on average the propagation by a line of kinetic energy which corresponds to the loss of energy along the path. But loss of energy can also be assumed as a multiplicative process or a purely random process. The distributions of the rockfall block stop points can be deduced from such simple models, they lead to Gaussian, Inverse-Gaussian, Log-normal or exponential negative distributions. The theoretical background is presented, and the comparisons of some of these models with existing data indicate that these assumptions are relevant. The results are either based on theoretical considerations or by fitting results. They are potentially very useful for rockfall hazard zoning and risk assessment. This approach will need further investigations.
Aaboud, M.; Aad, G.; Abbott, B.; ...
2016-12-06
The W boson angular distribution in events with high transverse momentum jets is measured using data collected by the ATLAS experiment from proton–proton collisions at a centre-of-mass energy √s=8 TeV at the Large Hadron Collider, corresponding to an integrated luminosity of 20.3 fb -1 . The focus is on the contributions to W+jets processes from real W emission, which is achieved by studying events where a muon is observed close to a high transverse momentum jet. At small angular separations, these contributions are expected to be large. Various theoretical models of this process are compared to the data inmore » terms of the absolute cross-section and the angular distributions of the muon from the leptonic W decay.« less
Distributed Processing with a Mainframe-Based Hospital Information System: A Generalized Solution
Kirby, J. David; Pickett, Michael P.; Boyarsky, M. William; Stead, William W.
1987-01-01
Over the last two years the Medical Center Information Systems Department at Duke University Medical Center has been developing a systematic approach to distributing the processing and data involved in computerized applications at DUMC. The resulting system has been named MAPS- the Micro-ADS Processing System. A key characteristic of MAPS is that it makes it easy to execute any existing mainframe ADS application with a request from a PC. This extends the functionality of the mainframe application set to the PC without compromising the maintainability of the PC or mainframe systems.
NASA Astrophysics Data System (ADS)
Petrosyan, A. Sh.
2016-09-01
PanDA (Production and Distributed Analysis System) is a workload management system, widely used for data processing at experiments on Large Hadron Collider and others. COMPASS is a high-energy physics experiment at the Super Proton Synchrotron. Data processing for COMPASS runs locally at CERN, on lxbatch, the data itself stored in CASTOR. In 2014 an idea to start running COMPASS production through PanDA arose. Such transformation in experiment's data processing will allow COMPASS community to use not only CERN resources, but also Grid resources worldwide. During the spring and summer of 2015 installation, validation and migration work is being performed at JINR. Details and results of this process are presented in this paper.
WFIRST: User and mission support at ISOC - IPAC Science Operations Center
NASA Astrophysics Data System (ADS)
Akeson, Rachel; Armus, Lee; Bennett, Lee; Colbert, James; Helou, George; Kirkpatrick, J. Davy; Laine, Seppo; Meshkat, Tiffany; Paladini, Roberta; Ramirez, Solange; Wang, Yun; Xie, Joan; Yan, Lin
2018-01-01
The science center for WFIRST is distributed between the Goddard Space Flight Center, the Infrared Processing and Analysis Center (IPAC) and the Space Telescope Science Institute (STScI). The main functions of the IPAC Science Operations Center (ISOC) are:* Conduct the GO, archival and theory proposal submission and evaluation process* Support the coronagraph instrument, including observation planning, calibration and data processing pipeline, generation of data products, and user support* Microlensing survey data processing pipeline, generation of data products, and user support* Community engagement including conferences, workshops and general support of the WFIRST exoplanet communityWe will describe the components planned to support these functions and the community of WFIRST users.
IDC Re-Engineering Phase 2 System Requirements Document Version 1.4
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harris, James M.; Burns, John F.; Satpathi, Meara Allena
This System Requirements Document (SRD) defines waveform data processing requirements for the International Data Centre (IDC) of the Comprehensive Nuclear Test Ban Treaty Organization (CTBTO). The IDC applies, on a routine basis, automatic processing methods and interactive analysis to raw International Monitoring System (IMS) data in order to produce, archive, and distribute standard IDC products on behalf of all States Parties. The routine processing includes characterization of events with the objective of screening out events considered to be consistent with natural phenomena or non-nuclear, man-made phenomena. This document does not address requirements concerning acquisition, processing and analysis of radionuclide data,more » but includes requirements for the dissemination of radionuclide data and products.« less
Graph processing platforms at scale: practices and experiences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lim, Seung-Hwan; Lee, Sangkeun; Brown, Tyler C
2015-01-01
Graph analysis unveils hidden associations of data in many phenomena and artifacts, such as road network, social networks, genomic information, and scientific collaboration. Unfortunately, a wide diversity in the characteristics of graphs and graph operations make it challenging to find a right combination of tools and implementation of algorithms to discover desired knowledge from the target data set. This study presents an extensive empirical study of three representative graph processing platforms: Pegasus, GraphX, and Urika. Each system represents a combination of options in data model, processing paradigm, and infrastructure. We benchmarked each platform using three popular graph operations, degree distribution,more » connected components, and PageRank over a variety of real-world graphs. Our experiments show that each graph processing platform shows different strength, depending the type of graph operations. While Urika performs the best in non-iterative operations like degree distribution, GraphX outputforms iterative operations like connected components and PageRank. In addition, we discuss challenges to optimize the performance of each platform over large scale real world graphs.« less
NASA Astrophysics Data System (ADS)
Yu, Z. P.; Yue, Z. F.; Liu, W.
2018-05-01
With the development of artificial intelligence, more and more reliability experts have noticed the roles of subjective information in the reliability design of complex system. Therefore, based on the certain numbers of experiment data and expert judgments, we have divided the reliability estimation based on distribution hypothesis into cognition process and reliability calculation. Consequently, for an illustration of this modification, we have taken the information fusion based on intuitional fuzzy belief functions as the diagnosis model of cognition process, and finished the reliability estimation for the open function of cabin door affected by the imprecise judgment corresponding to distribution hypothesis.
NASA Astrophysics Data System (ADS)
Burov, V. A.; Zotov, D. I.; Rumyantseva, O. D.
2014-07-01
A two-step algorithm is used to reconstruct the spatial distributions of the acoustic characteristics of soft biological tissues-the sound velocity and absorption coefficient. Knowing these distributions is urgent for early detection of benign and malignant neoplasms in biological tissues, primarily in the breast. At the first stage, large-scale distributions are estimated; at the second step, they are refined with a high resolution. Results of reconstruction on the base of model initial data are presented. The principal necessity of preliminary reconstruction of large-scale distributions followed by their being taken into account at the second step is illustrated. The use of CUDA technology for processing makes it possible to obtain final images of 1024 × 1024 samples in only a few minutes.
Hamzehpour, Hossein; Rasaei, M Reza; Sahimi, Muhammad
2007-05-01
We describe a method for the development of the optimal spatial distributions of the porosity phi and permeability k of a large-scale porous medium. The optimal distributions are constrained by static and dynamic data. The static data that we utilize are limited data for phi and k, which the method honors in the optimal model and utilizes their correlation functions in the optimization process. The dynamic data include the first-arrival (FA) times, at a number of receivers, of seismic waves that have propagated in the porous medium, and the time-dependent production rates of a fluid that flows in the medium. The method combines the simulated-annealing method with a simulator that solves numerically the three-dimensional (3D) acoustic wave equation and computes the FA times, and a second simulator that solves the 3D governing equation for the fluid's pressure as a function of time. To our knowledge, this is the first time that an optimization method has been developed to determine simultaneously the global minima of two distinct total energy functions. As a stringent test of the method's accuracy, we solve for flow of two immiscible fluids in the same porous medium, without using any data for the two-phase flow problem in the optimization process. We show that the optimal model, in addition to honoring the data, also yields accurate spatial distributions of phi and k, as well as providing accurate quantitative predictions for the single- and two-phase flow problems. The efficiency of the computations is discussed in detail.
NASA Astrophysics Data System (ADS)
Magnoni, L.; Suthakar, U.; Cordeiro, C.; Georgiou, M.; Andreeva, J.; Khan, A.; Smith, D. R.
2015-12-01
Monitoring the WLCG infrastructure requires the gathering and analysis of a high volume of heterogeneous data (e.g. data transfers, job monitoring, site tests) coming from different services and experiment-specific frameworks to provide a uniform and flexible interface for scientists and sites. The current architecture, where relational database systems are used to store, to process and to serve monitoring data, has limitations in coping with the foreseen increase in the volume (e.g. higher LHC luminosity) and the variety (e.g. new data-transfer protocols and new resource-types, as cloud-computing) of WLCG monitoring events. This paper presents a new scalable data store and analytics platform designed by the Support for Distributed Computing (SDC) group, at the CERN IT department, which uses a variety of technologies each one targeting specific aspects of big-scale distributed data-processing (commonly referred as lambda-architecture approach). Results of data processing on Hadoop for WLCG data activities monitoring are presented, showing how the new architecture can easily analyze hundreds of millions of transfer logs in a few minutes. Moreover, a comparison of data partitioning, compression and file format (e.g. CSV, Avro) is presented, with particular attention given to how the file structure impacts the overall MapReduce performance. In conclusion, the evolution of the current implementation, which focuses on data storage and batch processing, towards a complete lambda-architecture is discussed, with consideration of candidate technology for the serving layer (e.g. Elasticsearch) and a description of a proof of concept implementation, based on Apache Spark and Esper, for the real-time part which compensates for batch-processing latency and automates problem detection and failures.
Autonomous Robot Navigation in Human-Centered Environments Based on 3D Data Fusion
NASA Astrophysics Data System (ADS)
Steinhaus, Peter; Strand, Marcus; Dillmann, Rüdiger
2007-12-01
Efficient navigation of mobile platforms in dynamic human-centered environments is still an open research topic. We have already proposed an architecture (MEPHISTO) for a navigation system that is able to fulfill the main requirements of efficient navigation: fast and reliable sensor processing, extensive global world modeling, and distributed path planning. Our architecture uses a distributed system of sensor processing, world modeling, and path planning units. In this arcticle, we present implemented methods in the context of data fusion algorithms for 3D world modeling and real-time path planning. We also show results of the prototypic application of the system at the museum ZKM (center for art and media) in Karlsruhe.
NASA Technical Reports Server (NTRS)
Feinstein, S. P.; Girard, M. A.
1979-01-01
An automated technique for measuring particle diameters and their spatial coordinates from holographic reconstructions is being developed. Preliminary tests on actual cold-flow holograms of impinging jets indicate that a suitable discriminant algorithm consists of a Fourier-Gaussian noise filter and a contour thresholding technique. This process identifies circular as well as noncircular objects. The desired objects (in this case, circular or possibly ellipsoidal) are then selected automatically from the above set and stored with their parametric representations. From this data, dropsize distributions as a function of spatial coordinates can be generated and combustion effects due to hardware and/or physical variables studied.
Ruhl, C.A.; Schoellhamer, D.H.; Stumpf, R.P.; Lindsay, C.L.
2001-01-01
Analysis of suspended-sediment concentration data in San Francisco Bay is complicated by spatial and temporal variability. In situ optical backscatterance sensors provide continuous suspended-sediment concentration data, but inaccessibility, vandalism, and cost limit the number of potential monitoring stations. Satellite imagery reveals the spatial distribution of surficial-suspended sediment concentrations in the Bay; however, temporal resolution is poor. Analysis of the in situ sensor data in conjunction with the satellite reflectance data shows the effects of physical processes on both the spatial and temporal distribution of suspended sediment in San Francisco Bay. Plumes can be created by large freshwater flows. Zones of high suspended-sediment concentrations in shallow subembayments are associated with wind-wave resuspension and the spring-neap cycle. Filaments of clear and turbid water are caused by different transport processes in deep channels, as opposed to adjacent shallow water.
Refined Simulation of Satellite Laser Altimeter Full Echo Waveform
NASA Astrophysics Data System (ADS)
Men, H.; Xing, Y.; Li, G.; Gao, X.; Zhao, Y.; Gao, X.
2018-04-01
The return waveform of satellite laser altimeter plays vital role in the satellite parameters designation, data processing and application. In this paper, a method of refined full waveform simulation is proposed based on the reflectivity of the ground target, the true emission waveform and the Laser Profile Array (LPA). The ICESat/GLAS data is used as the validation data. Finally, we evaluated the simulation accuracy with the correlation coefficient. It was found that the accuracy of echo simulation could be significantly improved by considering the reflectivity of the ground target and the emission waveform. However, the laser intensity distribution recorded by the LPA has little effect on the echo simulation accuracy when compared with the distribution of the simulated laser energy. At last, we proposed a refinement idea by analyzing the experimental results, in the hope of providing references for the waveform data simulation and processing of GF-7 satellite in the future.
2003-09-01
BLANK xv LIST OF ACRONYMS ABC Activity Based Costing ADO ActiveX Data Object ASP Application Server Page BPR Business Process Re...processes uses people and systems (hardware, software, machinery, etc.) and that these people and systems contain the “corporate” knowledge of the...server architecture was also a high maintenance item. Data was no longer contained on one mainframe but was distributed throughout the enterprise
Automated crystallographic system for high-throughput protein structure determination.
Brunzelle, Joseph S; Shafaee, Padram; Yang, Xiaojing; Weigand, Steve; Ren, Zhong; Anderson, Wayne F
2003-07-01
High-throughput structural genomic efforts require software that is highly automated, distributive and requires minimal user intervention to determine protein structures. Preliminary experiments were set up to test whether automated scripts could utilize a minimum set of input parameters and produce a set of initial protein coordinates. From this starting point, a highly distributive system was developed that could determine macromolecular structures at a high throughput rate, warehouse and harvest the associated data. The system uses a web interface to obtain input data and display results. It utilizes a relational database to store the initial data needed to start the structure-determination process as well as generated data. A distributive program interface administers the crystallographic programs which determine protein structures. Using a test set of 19 protein targets, 79% were determined automatically.
NASA Astrophysics Data System (ADS)
Boer, Marie
2017-09-01
Generalized Parton Distributions (GPDs) contain the correlation between the parton's longitudinal momentum and their transverse distribution. They are accessed through hard exclusive processes, such as Deeply Virtual Compton Scattering (DVCS). DVCS has already been measured in several experiments and several models allow for extracting GPDs from these measurements. Timelike Compton Scattering (TCS) is, at leading order, the time-reversal equivalent process to DVCS and accesses GPDs at the same kinematics. Comparing GPDs extracted from DVCS and TCS is a unique way for proving GPD universality. Combining fits from the two processes will also allow for better constraining the GPDs. We will present our method for extracting GPDs from DVCS and TCS pseudo-data. We will compare fit results from the two processes in similar conditions and present what can be expected in term of contraints on GPDs from combined fits.
NASA Astrophysics Data System (ADS)
Ramirez, P.; Mattmann, C. A.; Painter, T. H.; Seidel, F. C.; Trangsrud, A.; Hart, A. F.; Goodale, C. E.; Boardman, J. W.; Heneghan, C.; Verma, R.; Khudikyan, S.; Boustani, M.; Zimdars, P. A.; Horn, J.; Neely, S.
2013-12-01
The JPL Airborne Snow Observatory (ASO) must process 100s of GB of raw data to 100s of Terabytes of derived data in 24 hour Near Real Time (NRT) latency in a geographically distributed mobile compute and data-intensive processing setting. ASO provides meaningful information to water resource managers in the Western US letting them know how much water to maintain; or release, and what the prospectus of the current snow season is in the Sierra Nevadas. Providing decision support products processed from airborne data in a 24 hour timeframe is an emergent field and required the team to develop a novel solution as this process is typically done over months. We've constructed a system that combines Apache OODT; with Apache Tika; with the Interactive Data Analysis (IDL)/ENVI programming environment to rapidly and unobtrusively generate, distribute and archive ASO data as soon as the plane lands near Mammoth Lakes, CA. Our system is flexible, underwent several redeployments and reconfigurations, and delivered this critical information to stakeholders during the recent "Snow On" campaign March 2013 - June 2013. This talk will take you through a day in the life of the compute team from data acquisition, delivery, processing, and dissemination. Within this context, we will discuss the architecture of ASO; the open source software we used; the data we stored; and how it was delivered to its users. Moreover we will discuss the logistics, system engineering, and staffing that went into the developing, deployment, and operation of the mobile compute system.
Statistical transformation and the interpretation of inpatient glucose control data.
Saulnier, George E; Castro, Janna C; Cook, Curtiss B
2014-03-01
To introduce a statistical method of assessing hospital-based non-intensive care unit (non-ICU) inpatient glucose control. Point-of-care blood glucose (POC-BG) data from hospital non-ICUs were extracted for January 1 through December 31, 2011. Glucose data distribution was examined before and after Box-Cox transformations and compared to normality. Different subsets of data were used to establish upper and lower control limits, and exponentially weighted moving average (EWMA) control charts were constructed from June, July, and October data as examples to determine if out-of-control events were identified differently in nontransformed versus transformed data. A total of 36,381 POC-BG values were analyzed. In all 3 monthly test samples, glucose distributions in nontransformed data were skewed but approached a normal distribution once transformed. Interpretation of out-of-control events from EWMA control chart analyses also revealed differences. In the June test data, an out-of-control process was identified at sample 53 with nontransformed data, whereas the transformed data remained in control for the duration of the observed period. Analysis of July data demonstrated an out-of-control process sooner in the transformed (sample 55) than nontransformed (sample 111) data, whereas for October, transformed data remained in control longer than nontransformed data. Statistical transformations increase the normal behavior of inpatient non-ICU glycemic data sets. The decision to transform glucose data could influence the interpretation and conclusions about the status of inpatient glycemic control. Further study is required to determine whether transformed versus nontransformed data influence clinical decisions or evaluation of interventions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sullivan, M.; Anderson, D.P.
1988-01-01
Marionette is a system for distributed parallel programming in an environment of networked heterogeneous computer systems. It is based on a master/slave model. The master process can invoke worker operations (asynchronous remote procedure calls to single slaves) and context operations (updates to the state of all slaves). The master and slaves also interact through shared data structures that can be modified only by the master. The master and slave processes are programmed in a sequential language. The Marionette runtime system manages slave process creation, propagates shared data structures to slaves as needed, queues and dispatches worker and context operations, andmore » manages recovery from slave processor failures. The Marionette system also includes tools for automated compilation of program binaries for multiple architectures, and for distributing binaries to remote fuel systems. A UNIX-based implementation of Marionette is described.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Broderick, Robert Joseph; Quiroz, Jimmy Edward; Reno, Matthew J.
2015-11-01
The third solicitation of the California Solar Initiative (CSI) Research, Development, Demonstration and Deployment (RD&D) Program established by the California Public Utility Commission (CPUC) is supporting the Electric Power Research Institute (EPRI), National Renewable Energy Laboratory (NREL), and Sandia National Laboratories (SNL) with collaboration from Pacific Gas and Electric (PG&E), Southern California Edison (SCE), and San Diego Gas and Electric (SDG&E), in research to improve the Utility Application Review and Approval process for interconnecting distributed energy resources to the distribution system. Currently this process is the most time - consuming of any step on the path to generating power onmore » the distribution system. This CSI RD&D solicitation three project has completed the tasks of collecting data from the three utilities, clustering feeder characteristic data to attain representative feeders, detailed modeling of 16 representative feeders, analysis of PV impacts to those feeders, refinement of current screening processes, and validation of those suggested refinements. In this report each task is summarized to produce a final summary of all components of the overall project.« less
NASA Astrophysics Data System (ADS)
Siniscalchi, Agata; Romano, Gerardo; Barracano, Fabio; Balasco, Marianna; Tripaldi, Simona
2017-04-01
Analyzing a 4 years of a single site MT continuous monitoring data, a systematic variation of the MT transfer function estimates was observed in the [20-100 s] period range that was shown to be connected to the global geomagnetic activity, Ap index (Romano et al., 2014). The monitored period, from 2007 to 2011, includes the global minimum of solar activity which occurred in 2009 (low MT source amplitude). It was shown that the impedance robust estimations tend to stabilize when the Ap index exceed a value of 10. In order to exclude a possible dependence of the observed fluctuation on the presence of a local cultural noise source, for a shorter period ( 2 months) the monitoring data were also processed by using a remote site. Recently Chave (2012) demonstrated that MT data can be described by alpha stable distribution family that is characterized by four-parameters that must be empirically determined. The Gaussian distribution belongs to this family as a special case when one of the four parameter, α the tail thickness, is equal to 2. Following Chave (2016), MT data are typically stably distributed with the empirical observation that 0.8 ≤α ≤1.8. In order to better understand the observed dependence of the MT continuous monitoring on the global geomagnetic activity, here we present the results a re-analysis of the MT monitoring data with a two steps processing. In the first step, we characterize the time series of the Alpha Stable Distribution Parameters (ASDP) as obtained from the whole processing of the dataset with the aim of checking for possible connections between these last and the Ap index. In the second step, we estimate the ASDP by using only the samples which satisfy the mathematical range of existence of the normalized WAL (Weaver et al.,2000) considering these last as a diagnostic tool to detect which segments of the time series in the frequency domain are strongly contaminated by noise (WAL selection criterion). The comparison between the results of the two above mentioned steps, allow us to understand how the WAL based selection criterion performs.
Waiting-time distributions of magnetic discontinuities: clustering or Poisson process?
Greco, A; Matthaeus, W H; Servidio, S; Dmitruk, P
2009-10-01
Using solar wind data from the Advanced Composition Explorer spacecraft, with the support of Hall magnetohydrodynamic simulations, the waiting-time distributions of magnetic discontinuities have been analyzed. A possible phenomenon of clusterization of these discontinuities is studied in detail. We perform a local Poisson's analysis in order to establish if these intermittent events are randomly distributed or not. Possible implications about the nature of solar wind discontinuities are discussed.
Waiting-time distributions of magnetic discontinuities: Clustering or Poisson process?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greco, A.; Matthaeus, W. H.; Servidio, S.
2009-10-15
Using solar wind data from the Advanced Composition Explorer spacecraft, with the support of Hall magnetohydrodynamic simulations, the waiting-time distributions of magnetic discontinuities have been analyzed. A possible phenomenon of clusterization of these discontinuities is studied in detail. We perform a local Poisson's analysis in order to establish if these intermittent events are randomly distributed or not. Possible implications about the nature of solar wind discontinuities are discussed.
BOREAS TE-18, 60-m, Radiometrically Rectified Landsat TM Imagery
NASA Technical Reports Server (NTRS)
Hall, Forrest G. (Editor); Knapp, David
2000-01-01
The BOREAS TE-18 team used a radiometric rectification process to produce standardized DN values for a series of Landsat TM images of the BOREAS SSA and NSA in order to compare images that were collected under different atmospheric conditions. The images for each study area were referenced to an image that had very clear atmospheric qualities. The reference image for the SSA was collected on 02-Sep-1994, while the reference image for the NSA was collected on 2 1 Jun-1995. The 23 rectified images cover the period of 07-Jul-1985 to 18-Sep-1994 in the SSA and 22-Jun-1984 to 09-Jun-1994 in the NSA. Each of the reference scenes had coincident atmospheric optical thickness measurements made by RSS-11. The radiometric rectification process is described in more detail by Hall et al. (1991). The original Landsat TM data were received from CCRS for use in the BOREAS project. Due to the nature of the radiometric rectification process and copyright issues, the full-resolution (30-m) images may not be publicly distributed. However, this spatially degraded 60-m resolution version of the images may be openly distributed and is available on the BOREAS CD-ROM series. After the radiometric rectification processing, the original data were degraded to a 60-m pixel size from the original 30-m pixel size by averaging the data over a 2- by 2-pixel window. The data are stored in binary image-format files. The data files are available on a CD-ROM (see document number 20010000884), or from the Oak Ridge National Laboratory (ORNL) Distributed Activity Archive Center (DAAC).
Qi, Yi; Padiath, Ameena; Zhao, Qun; Yu, Lei
2016-10-01
The Motor Vehicle Emission Simulator (MOVES) quantifies emissions as a function of vehicle modal activities. Hence, the vehicle operating mode distribution is the most vital input for running MOVES at the project level. The preparation of operating mode distributions requires significant efforts with respect to data collection and processing. This study is to develop operating mode distributions for both freeway and arterial facilities under different traffic conditions. For this purpose, in this study, we (1) collected/processed geographic information system (GIS) data, (2) developed a model of CO2 emissions and congestion from observations, (3) implemented the model to evaluate potential emission changes from a hypothetical roadway accident scenario. This study presents a framework by which practitioners can assess emission levels in the development of different strategies for traffic management and congestion mitigation. This paper prepared the primary input, that is, the operating mode ID distribution, required for running MOVES and developed models for estimating emissions for different types of roadways under different congestion levels. The results of this study will provide transportation planners or environmental analysts with the methods for qualitatively assessing the air quality impacts of different transportation operation and demand management strategies.
A wireless data acquisition system for acoustic emission testing
NASA Astrophysics Data System (ADS)
Zimmerman, A. T.; Lynch, J. P.
2013-01-01
As structural health monitoring (SHM) systems have seen increased demand due to lower costs and greater capabilities, wireless technologies have emerged that enable the dense distribution of transducers and the distributed processing of sensor data. In parallel, ultrasonic techniques such as acoustic emission (AE) testing have become increasingly popular in the non-destructive evaluation of materials and structures. These techniques, which involve the analysis of frequency content between 1 kHz and 1 MHz, have proven effective in detecting the onset of cracking and other early-stage failure in active structures such as airplanes in flight. However, these techniques typically involve the use of expensive and bulky monitoring equipment capable of accurately sensing AE signals at sampling rates greater than 1 million samples per second. In this paper, a wireless data acquisition system is presented that is capable of collecting, storing, and processing AE data at rates of up to 20 MHz. Processed results can then be wirelessly transmitted in real-time, creating a system that enables the use of ultrasonic techniques in large-scale SHM systems.
NASA and USGS ASTER Expedited Satellite Data Services for Disaster Situations
NASA Astrophysics Data System (ADS)
Duda, K. A.
2012-12-01
Significant international disasters related to storms, floods, volcanoes, wildfires and numerous other themes reoccur annually, often inflicting widespread human suffering and fatalities with substantial economic consequences. During and immediately after such events it can be difficult to access the affected areas and become aware of the overall impacts, but insight on the spatial extent and effects can be gleaned from above through satellite images. The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) on the Terra spacecraft has offered such views for over a decade. On short notice, ASTER continues to deliver analysts multispectral imagery at 15 m spatial resolution in near real-time to assist participating responders, emergency managers, and government officials in planning for such situations and in developing appropriate responses after they occur. The joint U.S./Japan ASTER Science Team has developed policies and procedures to ensure such ongoing support is accessible when needed. Processing and distribution of data products occurs at the NASA Land Processes Distributed Active Archive Center (LP DAAC) located at the USGS Earth Resources Observation and Science Center in South Dakota. In addition to current imagery, the long-term ASTER mission has generated an extensive collection of nearly 2.5 million global 3,600 km2 scenes since the launch of Terra in late 1999. These are archived and distributed by LP DAAC and affiliates at Japan Space Systems in Tokyo. Advanced processing is performed to create higher level products of use to researchers. These include a global digital elevation model. Such pre-event imagery provides a comparative basis for use in detecting changes associated with disasters and to monitor land use trends to portray areas of increased risk. ASTER imagery acquired via the expedited collection and distribution process illustrates the utility and relevancy of such data in crisis situations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
He, Jin-zhong, E-mail: viewsino@163.com; Yao, Shu-zhen; Zhang, Zhong-ping
2013-03-15
With the help of complexity indices, we quantitatively studied multifractals, frequency distributions, and linear and nonlinear characteristics of geochemical data for exploration of the Daijiazhuang Pb-Zn deposit. Furthermore, we derived productivity differentiation models of elements from thermodynamics and self-organized criticality of metallogenic systems. With respect to frequency distributions and multifractals, only Zn in rocks and most elements except Sb in secondary media, which had been derived mainly from weathering and alluviation, exhibit nonlinear distributions. The relations of productivity to concentrations of metallogenic elements and paragenic elements in rocks and those of elements strongly leached in secondary media can be seenmore » as linear addition of exponential functions with a characteristic weak chaos. The relations of associated elements such as Mo, Sb, and Hg in rocks and other elements in secondary media can be expressed as an exponential function, and the relations of one-phase self-organized geological or metallogenic processes can be represented by a power function, each representing secondary chaos or strong chaos. For secondary media, exploration data of most elements should be processed using nonlinear mathematical methods or should be transformed to linear distributions before processing using linear mathematical methods.« less
Patient Data Synchronization Process in a Continuity of Care Environment
Haras, Consuela; Sauquet, Dominique; Ameline, Philippe; Jaulent, Marie-Christine; Degoulet, Patrice
2005-01-01
In a distributed patient record environment, we analyze the processes needed to ensure exchange and access to EHR data. We propose an adapted method and the tools for data synchronization. Our study takes into account the issues of user rights management for data access and of decreasing the amount of data exchanged over the network. We describe a XML-based synchronization model that is portable and independent of specific medical data models. The implemented platform consists of several servers, of local network clients, of workstations running user’s interfaces and of data exchange and synchronization tools. PMID:16779049
NASA's Long-Term Archive (LTA) of ICESat Data at the National Snow and Ice Data Center (NSIDC)
NASA Astrophysics Data System (ADS)
Fowler, D. K.; Moses, J. F.; Dimarzio, J. P.; Webster, D.
2011-12-01
Data Stewardship, preservation, and reproducibility are becoming principal parts of a data manager's work. In an era of distributed data and information systems, where the host location ought to be transparent to the internet user, it is of vital importance that organizations make a commitment to both current and long-term goals of data management and the preservation of scientific data. NASA's EOS Data and Information System (EOSDIS) is a distributed system of discipline-specific archives and mission-specific science data processing facilities. Satellite missions and instruments go through a lifecycle that involves pre-launch calibration, on-orbit data acquisition and product generation, and final reprocessing. Data products and descriptions flow to the archives for distribution on a regular basis during the active part of the mission. However there is additional information from the product generation and science teams needed to ensure the observations will be useful for long term climate studies. Examples include ancillary input datasets, product generation software, and production history as developed by the team during the course of product generation. These data and information will need to be archived after product data processing is completed. Using inputs from the USGCRP Workshop on Long Term Archive Requirements (1998), discussions with EOS instrument teams, and input from the 2011 ESIPS Federation meeting, NASA is developing a set of Earth science data and information content requirements for long term preservation that will ultimately be used for all the EOS missions as they come to completion. Since the ICESat/GLAS mission is one of the first to come to an end, NASA and NSIDC are preparing for long-term support of the ICESat mission data now. For a long-term archive, it is imperative that there is sufficient information about how products were prepared in order to convince future researchers that the scientific results are accurate, understandable, useable, and reproducible. Our experience suggests data centers know what to preserve in most cases, i.e., the processing algorithms along with the Level 0 or Level 1a input and ancillary products used to create the higher-level products will be archived and made available to users. In other cases the data centers must seek guidance from the science team, e.g., for pre-launch, calibration/validation, and test data. All these data are an important part of product provenance, contributing to and helping establish the integrity of the scientific observations for long term climate studies. In this presentation we will describe application of information content requirements, guidance from the ICESat/GLAS Science Team and the flow of additional information from the ICESat Science team and Science Investigator-Led Processing System to the Distributed Active Archive Center.
Johnson, Timothy C.; Versteeg, Roelof J.; Ward, Andy; Day-Lewis, Frederick D.; Revil, André
2010-01-01
Electrical geophysical methods have found wide use in the growing discipline of hydrogeophysics for characterizing the electrical properties of the subsurface and for monitoring subsurface processes in terms of the spatiotemporal changes in subsurface conductivity, chargeability, and source currents they govern. Presently, multichannel and multielectrode data collections systems can collect large data sets in relatively short periods of time. Practitioners, however, often are unable to fully utilize these large data sets and the information they contain because of standard desktop-computer processing limitations. These limitations can be addressed by utilizing the storage and processing capabilities of parallel computing environments. We have developed a parallel distributed-memory forward and inverse modeling algorithm for analyzing resistivity and time-domain induced polar-ization (IP) data. The primary components of the parallel computations include distributed computation of the pole solutions in forward mode, distributed storage and computation of the Jacobian matrix in inverse mode, and parallel execution of the inverse equation solver. We have tested the corresponding parallel code in three efforts: (1) resistivity characterization of the Hanford 300 Area Integrated Field Research Challenge site in Hanford, Washington, U.S.A., (2) resistivity characterization of a volcanic island in the southern Tyrrhenian Sea in Italy, and (3) resistivity and IP monitoring of biostimulation at a Superfund site in Brandywine, Maryland, U.S.A. Inverse analysis of each of these data sets would be limited or impossible in a standard serial computing environment, which underscores the need for parallel high-performance computing to fully utilize the potential of electrical geophysical methods in hydrogeophysical applications.
Implications of movement for species distribution models - Rethinking environmental data tools.
Bruneel, Stijn; Gobeyn, Sacha; Verhelst, Pieterjan; Reubens, Jan; Moens, Tom; Goethals, Peter
2018-07-01
Movement is considered an essential process in shaping the distributions of species. Nevertheless, most species distribution models (SDMs) still focus solely on environment-species relationships to predict the occurrence of species. Furthermore, the currently used indirect estimates of movement allow to assess habitat accessibility, but do not provide an accurate description of movement. Better proxies of movement are needed to assess the dispersal potential of individual species and to gain a more practical insight in the interconnectivity of communities. Telemetry techniques are rapidly evolving and highly capable to provide explicit descriptions of movement, but their usefulness for SDMs will mainly depend on the ability of these models to deal with hitherto unconsidered ecological processes. More specifically, the integration of movement is likely to affect the environmental data requirements as the connection between environmental and biological data is crucial to provide reliable results. Mobility implies the occupancy of a continuum of space, hence an adequate representation of both geographical and environmental space is paramount to study mobile species distributions. In this context, environmental models, remote sensing techniques and animal-borne environmental sensors are discussed as potential techniques to obtain suitable environmental data. In order to provide an in-depth review of the aforementioned methods, we have chosen to use the modelling of fish distributions as a case study. The high mobility of fish and the often highly variable nature of the aquatic environment generally complicate model development, making it an adequate subject for research. Furthermore, insight into the distribution of fish is of great interest for fish stock assessments and water management worldwide, underlining its practical relevance. Copyright © 2018 Elsevier B.V. All rights reserved.